Most of the applications I work with involve storing, transforming and querying data from relational databases. Over the years I’ve developed a pattern to manage those changes that has proven to be helpful, involving a minimum of universally available tools. This post introduces this pattern in the hopes that it becomes useful for others.
The schema will always change
My example database schema template is the basic layout that I use for applications that have an accompanying database schema. Regardless of the programming language, there’s always a
db directory where all the DDL, database tests and scripts to apply changes to the database schema – schema deltas – are located.
This is more or less my response to years of trying different tools for automated schema versioning and maintenance, available in various languages and frameworks. The gist of this is that I like the ability to write my own DDL and the scripts to change said DDL as the applications inevitably evolve.
This option provides an optimum level of control and is definitely my recommendation for those that are not afraid to use their SQL-fu.
My typical workflow involves working in a change to the database schema. Some changes are relatively simple, some changes are more involved. In all cases, I start by writing a schema delta. This is a fancy name for a script that has two tasks:
- Perform a number of sequential, orchestrated changes to the database schema, the data or both.
- Record the fact that is has been run over the database though an errata.
These ensure that deltas are applied in a consistent order, no more than once.
Usually I iterate through the delta, tests in my local and QA environments and code using the database, fixing any issues and ensuring that no regressions are introduced. This is also a good time to look at the performance consequences of any changes.
Once the delta and accompanying changes are complete, I take each of the changes (i.e., adding a column to a table) and add this to the regular provisioning DDL scripts (i.e., add the new column to the corresponding
CREATE TABLE). Finally, the entry in the
schema_errata table for the delta is added to the
seed_data.sql script, to note the fact that this particular schema delta does not need to be applied to any database instances deployed from this point on.
Anatomy of a schema delta
Let’s say your application needs a new
example table. You can then go ahead and create
000001-example-delta.sql with the relevant statements. Those statements will exist within a transaction, as ensured by lines 1 and 11. Prior to any changes, the
INSERT in lines 3-4 ensures that there will be an error if this script is run more than once, causing the database transaction to rollback. This simple mechanism ensures that whatever changes are in your script, those are only performed while deploying the schema delta.
You’ll find the above delta already under
db/schema-deltas. Simply issuing
make deploy at the base directory yields the below output. Notice how the block at lines 16-24 takes care of applying the new delta automatically. Line 25 confirms application of the new delta script.
Attempts to re-apply the delta manually will notice that the new delta is already deployed, To test this, just
cd db; make deltas.
More complex deltas
When dealing with your production environment, sometimes the model of deploying all changes inside a transaction will be troublesome. Changes that require long data transformations may require a more nuanced approach. To deal with these scenarios, you can arrange for the
.sql delta script to fail. This will trigger a separate
make command on line 12 which gives you the opportunity to provide a complex script to perform any required changes.
Merging your deltas
From now on, every time you deploy a new database instance the base DDL will be executed, followed by all schema deltas. This is fine for deployment, but is not exactly ideal from a development point of view:
make destroywon’t always work, because the base scripts won’t be able to remove the objects you created via your deltas. You could go around this with a very smart
destroy.ddlthat simply removed all existing objects, but this can be dangerous; and
- Database object definitions are now scattered through multiple schema delta files.
Because of this, my recommended procedure in this case is to take the contents of the newly created delta files and add them to the base DDLs. This way,
ALTER TABLE in your delta script becomes a change in the corresponding
CREATE TABLE of the base script and so forth. Then, add the required steps on your
destroy.ddl script to make sure all objects are destroyed as appropriate when
make destroy is issued.
This ensures that a
make deploy takes the shortest number of steps, and that object definitions (and their documentation, in the form of
CREATE COMMENT) are in an easy to find place.
make destroy also works, allowing a quicker process for generating and debugging upcoming schema deltas.