My approach to Devops

Many of you already know that I do 100% of Gogobot’s Devops.

Being in charge of a consumer facing multi-platform product is definitely challenging and it has it’s ups and downs but I wanted to focus more about my approach to Devops and how I approach my daily tasks.

Engineers

The first thing I worry about is engineer happiness.

I realize “happiness” is hard to quantify but my ultimate goal is that for engineers it will “just work”, they don’t need to think about which server it’s being deployed to or what kind of load balancer is responsible for the traffic.

All they do is say gbot deploy production in the chat room and the rest is done.

The reason for this is that they have enough to worry about, like making the feature work, testing it and making sure the users like it. The infrastructure should be 100% transparent. No hassle.

Product

The most common error I see with Devops is focusing on the infrastructure and not on the product. I think this is fatal for a lot of companies.

Don’t tell me what infrastructure you need, tell me what product you need to deliver and I will make it happen for you.

For example:

I need a docker container that runs Java should become I have a Java based micro-service that accepts user reviews and sets language. What’s the best way to run this in production/dev/staging?

Beyond that though, known what your product is and what’s working (or not) can lead to better infrastrucute decisions. For example if a feature spec has been to update a record in 30 seconds after a user registers. The infrastructure can be X, if this feature is no longer working for users you can remove that piece and replace it with a better one for the product. (See MongoDB comment below)

Monitoring and logging

All of my decision rely on hard data, I don’t let guesses (even if they are educated ones) take control of my decisions.

What is “Not working”, what is “too slow”, without data to measure this it will be close to impossible to make the right decision.

So, with any piece of infrastructure deployed to production there’s a monitoring and logging strategy. Even if it’s a one-off service.

I wrote about this in the past with Measure, Monitor, Observe and supervise
. And if you are interested in setting up a logging cluster you can read on: Running ELK stack on docker - full solution
.

Expenses

After salaries, infrastructure is often the most expensive expense for a company. This day and age with cloud you can, with a click of a button create a 50K monthly bill.

Focusing on what is the most efficient way to achieve something is important to me and I often revisit this.

For example, one of our most expensive pieces of infrastructure was a cluster of MongoDB. It was running perfectly in production for a while and the feature it was supporting was running smoothly as a result.

However, looking at new developments in that field, we were able to remove the dependency on MongoDB and with a combination of lambda and S3 completely replace it. This move saved us 250,000$ a year on infrastructure costs.

Focusing on efficiency and squeezing infrastructure to the limit is very important to me (not in the expense of slow performance of course).

Constantly evaluating

Devops tools and solutions are moving in a very fast pace. However, running a stable and current production deployment means you can’t “jump the gun” on everything “cool” that catches your eyes. Constantly evaluating what’s good and stable is very important.

For example only recently (1-2 months) we started having Docker running in production and it’s still not running any of the critical services. (it will likely soon).

With every tool you introduce there’s a learning-cliff, so evaluating smartly helps overcome those and make sure you have the answers for everything.

Off the top of my head

These are just some thoughts off the top of my head that just popped out after an office conversation. What’s your approach? What do you think? Let me know in the comments below.