Handling Database Schema Changes

reading time ( words)

Most of the applications I work with involve storing, transforming and querying data from relational databases. Over the years I’ve developed a pattern to manage those changes that has proven to be helpful, involving a minimum of universally available tools. This post introduces this pattern in the hopes that it becomes useful for others.

The schema will always change

My example database schema template is the basic layout that I use for applications that have an accompanying database schema. Regardless of the programming language, there’s always a db directory where all the DDL, database tests and scripts to apply changes to the database schema – schema deltas – are located.

This is more or less my response to years of trying different tools for automated schema versioning and maintenance, available in various languages and frameworks. The gist of this is that I like the ability to write my own DDL and the scripts to change said DDL as the applications inevitably evolve.

This option provides an optimum level of control and is definitely my recommendation for those that are not afraid to use their SQL-fu.

Typical workflow

My typical workflow involves working in a change to the database schema. Some changes are relatively simple, some changes are more involved. In all cases, I start by writing a schema delta. This is a fancy name for a script that has two tasks:

  1. Perform a number of sequential, orchestrated changes to the database schema, the data or both.
  2. Record the fact that is has been run over the database though an errata.

These ensure that deltas are applied in a consistent order, no more than once.

Usually I iterate through the delta, tests in my local and QA environments and code using the database, fixing any issues and ensuring that no regressions are introduced. This is also a good time to look at the performance consequences of any changes.

Once the delta and accompanying changes are complete, I take each of the changes (i.e., adding a column to a table) and add this to the regular provisioning DDL scripts (i.e., add the new column to the corresponding CREATE TABLE). Finally, the entry in the schema_errata table for the delta is added to the seed_data.sql script, to note the fact that this particular schema delta does not need to be applied to any database instances deployed from this point on.

Anatomy of a schema delta

Let’s say your application needs a new example table. You can then go ahead and create 000001-example-delta.sql with the relevant statements. Those statements will exist within a transaction, as ensured by lines 1 and 11. Prior to any changes, the INSERT in lines 3-4 ensures that there will be an error if this script is run more than once, causing the database transaction to rollback. This simple mechanism ensures that whatever changes are in your script, those are only performed while deploying the schema delta.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
BEGIN;

INSERT INTO schema_errata ( delta )
VALUES ( '000001-example-delta' );

CREATE TABLE example (
  key SERIAL NOT NULL PRIMARY KEY,
  value TEXT
);

COMMIT;

You’ll find the above delta already under db/schema-deltas. Simply issuing make deploy at the base directory yields the below output. Notice how the block at lines 16-24 takes care of applying the new delta automatically. Line 25 confirms application of the new delta script.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ make deploy
( cd db; PSQLRC=psqlrc-test make all deltas )
psql -f extensions.ddl
psql -f domains.ddl
psql -f functions.ddl
psql -f schema.ddl
psql -f schema-deltas.ddl
psql -f views.ddl
psql -f triggers.ddl
psql -f roles.ddl
psql -f seed_data.sql
rm -f schema_errata.csv
PSQLRC=psqlrc-delta psql -c \
		"\COPY (SELECT delta FROM schema_errata) TO 'schema_errata.csv' WITH CSV;"
echo @ >> schema_errata.csv
find ./schema-deltas -type f -name '*.sql' |\
	 grep --invert-match --fixed-strings --file schema_errata.csv |\
	  while read delta; do \
			echo Applying $delta... ;\
	  	PSQLRC=psqlrc-delta psql -f $delta || \
				( cd schema-deltas; \
				  make --quiet `basename $delta .sql` ) || \
					exit 255; \
	  done
Applying ./schema-deltas/000001-example-delta.sql...

Attempts to re-apply the delta manually will notice that the new delta is already deployed, To test this, just cd db; make deltas.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
$ cd db; make deltas
rm -f schema_errata.csv
PSQLRC=psqlrc-delta psql -c \
		"\COPY (SELECT delta FROM schema_errata) TO 'schema_errata.csv' WITH CSV;"
echo @ >> schema_errata.csv
find ./schema-deltas -type f -name '*.sql' |\
	 grep --invert-match --fixed-strings --file schema_errata.csv |\
	  while read delta; do \
			echo Applying $delta... ;\
	  	PSQLRC=psqlrc-delta psql -f $delta || \
				( cd schema-deltas; \
				  make --quiet `basename $delta .sql` ) || \
					exit 255; \
	  done

More complex deltas

When dealing with your production environment, sometimes the model of deploying all changes inside a transaction will be troublesome. Changes that require long data transformations may require a more nuanced approach. To deal with these scenarios, you can arrange for the .sql delta script to fail. This will trigger a separate make command on line 12 which gives you the opportunity to provide a complex script to perform any required changes.

Merging your deltas

From now on, every time you deploy a new database instance the base DDL will be executed, followed by all schema deltas. This is fine for deployment, but is not exactly ideal from a development point of view:

  • make destroy won’t always work, because the base scripts won’t be able to remove the objects you created via your deltas. You could go around this with a very smart destroy.ddl that simply removed all existing objects, but this can be dangerous; and
  • Database object definitions are now scattered through multiple schema delta files.

Because of this, my recommended procedure in this case is to take the contents of the newly created delta files and add them to the base DDLs. This way, ALTER TABLE in your delta script becomes a change in the corresponding CREATE TABLE of the base script and so forth. Then, add the required steps on your destroy.ddl script to make sure all objects are destroyed as appropriate when make destroy is issued.

This ensures that a make deploy takes the shortest number of steps, and that object definitions (and their documentation, in the form of CREATE COMMENT) are in an easy to find place. make destroy also works, allowing a quicker process for generating and debugging upcoming schema deltas.