Safe Operations For High Volume PostgreSQL

Update (February 6, 2019): Since this post was published, PostgreSQL has gone through a few major upgrade cycles -- several of which have added improved support for concurrent DDL. We’ve also further refined our processes. Given how much has changed, we figured it was time for an update. Read the new post on our Product & Technology blog.

We use PostgreSQL extensively at Braintree, and it backs many of our highly available services (including our main payments API).

We are constantly building and refining our products, and this often means evolving our database schema. In general, PostgreSQL is great at this, and we can make many different types of schema changes without downtime. There are some gotchas, however, that this post will cover.

Background

We almost never take scheduled downtime of our payments API. This means we run our database schema migrations while the gateway is up and serving requests. We have to be very careful about what database operations we run. If we run a bad command, it can lock out updates to a table for a long time.

For example, if we create a new index on our customers table, we cannot create new customers while that index is building. Anyone who tries to perform a customer create will block, and possibly time out, causing a partial outage.

In general, we are ok with database operations taking a long time. However, any operation that locks a table for updates for more than a few seconds means downtime for us.

You can learn more about our high availability approaches: Ruby Conf Australia: High Availability at Braintree

We derived the lists below through extensive testing, trial and error.

The good

Here's what we can safely do in a migration without downtime:

Can do this
Add a new column
Drop a column
Add an index concurrently
Drop a constraint (for example, non-nullable)
Add a default value to an existing column

The bad

Here's the stuff we cannot do, and our current workarounds:

Cannot do this on big tables Our workaround
Add an index Add the index using the CONCURRENTLY keyword
Change the type of a column Add a new column, change the code to write to both columns, and backfill the new column
Add a column with a default Add column, add default as a separate command, and backfill the column with the default value
Add a column that is non-nullable Create a new table with the addition of the non-nullable column, write to both tables, backfill, and then switch to the new table [1]
Add a column with a unique constraint Add column, add unique index concurrently, and then add the constraint onto the table [2]
VACUUM FULL[3] We use pg̲repack instead

If you have other workarounds, please share them in the comments.

  1. This workaround is incredibly onerous, and we very rarely do it. We try to be really careful when creating new tables and figuring out the initial set of non-nullable columns.
  2. For example: CREATE UNIQUE INDEX CONCURRENTLY token_is_unique ON large_table(token); ALTER TABLE large_table ADD CONSTRAINT token UNIQUE USING INDEX token_is_unique;
  3. Dropping a column is very quick, but PostgreSQL won’t reclaim the disk space until you run a “VACUUM FULL” or use another tool like pg_repack.
***
Paul Gross Paul Gross is a Lead Developer at Braintree. He previously worked at ThoughtWorks, a global IT consultancy, building custom software in diverse languages, including Java, .NET, Python, and Ruby. More posts by this author

You Might Also Like