Decentralize your DevOps with Master-less Puppet and Supply_Drop

Here at Braintree, we are big fans of Puppet for system administration. In our ever-changing infrastructure, Puppet allows us to quickly provision and re-provision servers in multiple environments. We are also big fans of keeping our infrastructure lean and simple. Each piece of infrastructure that we must maintain comes with a cost. We have taken advantage of an under-appreciated feature of Puppet that allows us to manage our servers in a completely decentralized manner.

Benefits of going master-less

  • Fine-grained control We pride ourselves on our ability to keep our site up. By using Puppet without a master, we have tight control over how configuration is applied to a server.
  • Parallelization With a centralized Puppet server, the server maintains a single canonical version of the Puppet configuration. This causes contention when multiple people are attempting to make changes at the same time. Without a master, the source control repository maintains the canonical version, so people can easily work in parallel as long as they are not trying to Puppet the same server.
  • No single point of failure Running a Puppet master is yet another service to maintain and make highly available. Having one less service to apply our typical HA rigor is a big win.

The nuts and bolts

In order to facilitate our master-less setup, we wrote a small gem called supply_drop. It's a set of capistrano tasks that let you provision servers with Puppet. It tries to be small, simple, and stay out of your way. There are two tasks that do the bulk of the work. cap puppet:noop and cap puppet:apply. The noop will show you the set of changes that are about to be applied. As you could guess, the apply task makes those changes on the box. supply_drop uses rsync to push the current Puppet configuration from your box out to the server making it very fast.

We use a setup similar to cap multistage to manage the scope of what boxes we apply changes to. We dynamically create tasks for each server and environment that we have. Here is an example of what that looks like

def tasks_for_datacenter(datacenter, servers)
  task datacenter do
    role :server, *servers
  end

  servers.each do |server|
    task server do
      role :server, server
    end
  end
end

tasks_for_datacenter :sandbox, %w(app1.sand db.sand)
tasks_for_datacenter :production, %w(app1.prod app2.prod db.prod)

These tasks allow us to apply changes to a single server or the entire datacenter. We can also use shell expansions to easily make changes to a subset of the servers in a given environment. Some examples:

  • cap app1.prod puppet:noop shows the changes on app1
  • cap sandbox puppet:apply applies the current changes to all of Sandbox
  • cap app{1,2}.prod puppet:apply applies the changes to app1.prod and app2.prod

The workflow

We are always looking for ways to improve our workflow, but are generally happy with the one we have now. Our goals are:

  • Easily allow multiple people to make changes to an environment
  • Have explicit promotion from QA to Sandbox to Production
  • Store every change in source control without making the logs overly noisy

We use a separate git branch for each environment (QA -> master, Sandbox -> sandbox, Production -> production). This allows us to promote changes by merging branches. Furthermore, if we need to do a quick fix in Production, we can use that branch and not affect Sandbox or QA until we merge. I'll walk through our workflow by adding a new Nagios alert for Apache as an example.

Write the Nagios script and push it out to a web server in QA. Repeat until you are happy with it.

cap web01.qa puppet:noop
cap web01.qa puppet:apply

Push the script out to all the web servers in the datacenter

cap web0{1..n}.qa puppet:noop
cap web0{1..n}.qa puppet:apply

Change the central Nagios server's configuration to know about the new alert

cap monitor.qa puppet:noop
# Some scary changes

Oh snap. There is a diff we don't know about. Someone else is adding a Nagios alert. No worries, they've already checked in. We grab their changes and try again.

git stash
git pull origin master
git stash pop
cap monitor.qa puppet:noop
cap monitor.qa puppet:apply

The alert works. Now we noop the entire environment for changes, and commit our change. If our noop shows that we are going to be removing some other changes in addition to our own, we talk to those devs and let them know what we did and that they will need to pull from git.

git commit -am "new nagios check"

Now we want to push the change to Sandbox. Since we are using git we can either merge all the changes from master or cherry-pick our single commit if there is other stuff that is not ready.

git checkout sandbox
git merge master

Then apply the changes to Sandbox. We can do this on a server-by-server basis or in one fell swoop.

cap sandbox puppet:noop
cap sandbox puppet:apply

Repeat for Production. Declare victory.

In conclusion

supply_drop and Puppet allow us fine grained control of how we apply changes to any server in any one of our datacenters. Pairing it with git and a decent workflow gives you auditable and repeatable configuration management.

***
Tony Pitluga Tony is a Principle Developer at Braintree. He enjoys lisps and scaling things. More posts by this author

You Might Also Like