Notes on Database Design: Example Schema

reading time ( words)

This is the second post in my series Notes on Database Design, where I introduce a simple database schema to support selling event tickets for my fantasy company, TicketDoctor.

Let’s run with a very simple database schema we’ll use through the rest of this series to implement examples, because what’s a series of database related posts without a mock schema to work with? Anyway, here’s very simple audit table

1
2
3
4
CREATE TABLE audit (
  created_by   TEXT NOT NULL DEFAULT current_user,
  created_ts   TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW()
);

This is used to record who and when records are added. The use of the DEFAULT clauses provide the data we need. By issuing separate database users for each interface or even, each user accessing your database through your client applications, you can keep a detailed log of who is doing what.

Having this information inescapably captured by the database has helped me more than once with identifying a misbehaving piece of software adding bad data.

Depending on how much you trust your client applications, you could go one step further and have a TRIGGER forcibly setting these fields. This negates the possibility of the application lying about who’s adding a record, but you would have to write and apply the trigger to the required tables. I might write about ways to automate this at some point.

The INHERITS part of CREATE TABLE in PostgreSQL adds the audit table to the rest of the important tables.

 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
CREATE TABLE venues (
  id           SERIAL NOT NULL PRIMARY KEY,
  name         TEXT NOT NULL UNIQUE
) INHERITS ( audit );

CREATE TABLE shows (
  id           SERIAL NOT NULL PRIMARY KEY,
  show_ts      TIMESTAMP WITHOUT TIME ZONE NOT NULL,
  venue_id     INTEGER NOT NULL,
  name         TEXT NOT NULL
) INHERITS ( audit );

CREATE TABLE tickets (
  id           SERIAL NOT NULL PRIMARY KEY,
  show_id      INTEGER NOT NULL REFERENCES shows(id),
  sold_ts      TIMESTAMP WITHOUT TIME ZONE DEFAULT NULL,
  seat_number  TEXT
) INHERITS ( audit );

This goes beyond simply copying column definitions at CREATE TABLE time. When you modify parent tables, the inherited tables, well, inherit these new columns. This is a very powerful feature to keep in mind because it helps with table structure consistency as your application changes.

Indexes as integrity constraints

You’ll run into your fair share of database developers that are actually surprised by the notion of using database indexes to enforce integrity constraints. Some of them will vehemently defend writing these constraints in user code, sometimes in multiple places, because of reasons. And nobody ever makes mistakes loading data, so everything is always ok.

Actually loading data is a good example to show integrity constraints in action. With this schema, opening a new concert to the public would involve loading a venue, the show and the tickets for sale. Tickets are naturally defined as a seat (seat_number) in a specific show (show_id). For now, because this is something we’ll want to change to provide for faster searches.

Now, think about what would happen in the real world if this statement were to be run, probably as part of loading the thousands of tickets for a specific concert:

1
2
3
4
INSERT INTO tickets ( show_id, seat_number )
VALUES
( 1, 'Section 140, Row 1, Seat 21' ),
( 1, 'Section 140, Row 1, Seat 21' );

Selling two tickets for the same seat would likely not be a good user experience, so we would want to prevent duplicate tickets. This can be trivially achieved using a UNIQUE INDEX:

25
26
CREATE UNIQUE INDEX same_ticket_idx
ON tickets ( show_id, seat_number );

Now, attempting to add conflicting seats with the INSERT above results in an exception:

ERROR:  duplicate key value violates unique constraint "same_ticket_idx"
DETAIL:  Key (show_id, seat_number)=(1, Section 140, Row 1, Seat 21) already exists.

Offending data did not make it to the database!

The beauty of this is that client code does not need to add specific checks to detect conflicting seats. All that’s required is trapping the exception. In fact, by looking at the error itself (”duplicate key value...”), the application can provide a friendlier error to its caller.

In terms of performance, this is faster than looking for a duplicate seat via SELECT, with the client application becoming simpler due to not needing so much validation code. The database can quickly look at the index and determine whether the current record is a duplicate — violating the UNIQUE constraint imposed by the index — and abort the transaction. This happens without the client having to pull a single row through the network, hence the gain in speed.

The overall system also becomes more robust, as the whole operation is transactional in nature. Groups of seats can be managed within a single BEGIN TRANSACTION; ⋯ COMMIT; block, so either all of them can be acted upon or the seats are left as they were.

In the long run, this approach provides a simpler environment: All checks are executing automatically, close to the data. Errors in application code cannot insert invalid data. If proper integrity constraints are in place, it becomes impossible for the applications to violate the business rules deployed as database constraints. This results in higher degrees of data quality.