Musing configuration management

TL;DR this is basically a brain dump of my thought process while working on some solutions for Gogobot.

Lately, I’ve been musing about a true configuration management solution.

I am not actually talking about Chef/Puppet in a broader server configuration solution but only about actually configuration files and how you distribute it to servers.

Most of our configuration files these days are managed by chef.

For example, here’s how we manage Nginx configuration through chef:

template "/etc/nginx/sites-available/gogobot.conf" do
  source 'gogobot-nginx.conf.erb'
  owner 'root'
  group 'root'
  mode '755'
  variables({
    socket_location: node['rails']['deploy']['socket_location'],
    path: node['rails']['deploy']['path'],
    web_dynamic: node['rails']['deploy']['web_dynamic'],
    web_static:  node['rails']['deploy']['web_static'],
    web_health:  node['rails']['deploy']['web_health'],
    web_cdn:     node['rails']['deploy']['web_cdn']
  })
  notifies :restart, 'service[nginx]'
end

You can clearly see that the source of this is gogobot-nginx.conf.erb. This file is managed with chef, through version control (git) and also a more tuned version bumping mechanism called Spork.

However, if you are familiar with any of this, you know that in order to change the configuration on the server you will need to run chef-client on the server.

If you are disciplined, you know that you need to run chef-client on the servers periodically to find decay in the cookbooks, package versions and more.

But chef tends to be a huge hammer when really all you need is to add some configuration or change some configuration and distribute it to all the servers.

Example Use Case

In order to keep this concrete, there are a couple of examples I encountered over the last few months.

Changing YAML configuration

We work with Ruby On Rails and Ruby application for most parts. Those applications are best configured with YAML files.

We store almost everything in a YAML configuration but we don’t distribute the configuration through git, this exposes many security vulnerabilities both in secret management and other. Instead, we put the files on the servers with chef and manage the secrets securely.

One of those configuration is the way we connect to out search cluster.

Recently, I made a huge change to the search cluster, load balancing the cluster through Nginx instead of calling the cluster directly from the code.

I will not go into too much details WHY I did this, but the simple thing I needed to do was to change a YAML file on ALL the servers, restart unicorn and restart nginx to pick up the change.

Changing Nginx configuration

All of our web servers are run with Nginx, no matter if the underlying language is Ruby or Go, there’s always Nginx in the way.

In order to propagate that load balancing change across all the servers I needed to add a configuration to the Nginx site configuration and restart it.

Immutable infrastructure vs tweaking changes

The “trend” right now is to go towards immutable infrastructure. This means that when you have a change like this you want to distribute to all of the servers you need to converge new “instances” with the configuration and hot-swap the servers.

Our solutions fully supports this and we change underlying servers all the time without users noticing. However, I felt that this is an overkill in the situation.

Distributing configuration

All of this lead me to muse about configuration management and what I would like to see happening.

Distribute file to all servers

Why not source control configuration with secrets distribution on ENV

Now, this is a VALID point. you can distribute the file through git with all the important stuff just hidden out in ENV variables.

For example:

database:
	production:
		username: <%= PRODUCTION_DB_USERNAME %>

This has a few advantages and disadvantages.

It’s simple and clean. When you need to change a configuration file you simply change and deploy.

But what about system configuration like Nginx/Unicorn/God?

What about if you are adding a new variable that doesn’t exist in the variables, you will need to again run converge on the system.

Finding the balance

Basically, I am trying to find the balance between converging a big system to new instances because of configuration change that can be summed up with a YAML file.

I am trying to think of an elegant solution that will detect a new SOLR server on the system and change the configuration of the SOLR load balancer to add that one to the ORIGIN list.

What are you using?

I have a couple of ideas on how to solve this. I am not sure all of them are valid.

So… What are you using in order to manage configuration on your application?

Hit me up in the comments. (if you mention Docker you get point penalty)