PostgreSQL provides TYPE
and DOMAIN
abstraction which impose structural and semantic constraint over your data. Wisely using these mechanisms provides solid guarantees on data quality, enforceable at the database itself. Let’s continue the Notes on Database Design series with the use of DOMAIN
to validate your data.
As an example, I’ll provide an implementation of an email address type. For simplicity, let’s define an email address as the tuple of a local part and a fully-qualified domain name concatenated with the @
symbol. I’ll also refer to those parts as the right hand side and left hand side, or RHS and LHS for short.
Email addresses can be way more complex than you think. If you don’t believe me, take a look at RFC-5321 § 4.1.2 for the gory details. Also keep in mind that a syntactically correct email address can still be invalid. The only way to know for sure, is to try and send an email to it.
I’ll use a hierarchy of DOMAIN
definitions to implement a fully validating type to efficiently store email addresses.
The FQDN representation
The right hand side of an email address is generally a fully-qualified domain name. Let’s build our first abstraction, to encode and verify valid domain names. To start, we need a validation function that returns TRUE
when a given string can be considered a valid fqdn
.
|
|
In this case, I’m implementing the essence of RFC-1035 § 2.3.4: Labels of 1 to 63 acceptable characters separated by a single dot, with a maximum length of 255 characters. This is easily accomplished by the regular expression and the length check highlighted.
Notice how this is just a syntax validation. Attempts to discern unresolvable domains or enforcing additional rules are destined to fail, so don’t do it.
With valid_fqdn
defined, we can go ahead and declare our DOMAIN
.
|
|
Now you can easily check whether strings are valid domain names or not, automatically.
Every time you refer to something of type fqdn
, PostgreSQL will make sure it’s indeed a valid domain name, sparing you to add check code. If you declare a column of type fqdn
, the built-in validation provided by the DOMAIN
will make sure that only fully-qualified domain names that satisfy the restriction are added.
This is just a very simple
DOMAIN
definition. PostgreSQL provides support for more powerful definitions. Take a look at the CREATE DOMAIN and CREATE TYPE sections in the PostgreSQL documentation to learn more.
Dealing with the left hand side
To illustrate composition, I’ll define two additional DOMAIN
instances:
|
|
The regular expression on lines 12 and 13 actually comes from RFC-2321, while the length constraint at line 14 comes from RFC-5321 § 4.5.3.1.1. And yes, all those characters can actually be part of an email address, so our schema better be prepared to handle them all.
lem=# SELECT '>'::local_part;
ERROR: value for domain local_part violates check constraint "local_part_check"
lem=# SELECT 'really+valid/always.fun!'::local_part;
local_part | really+valid/always.fun!
As you can see in the examples above, the validation works as expected, rejecting any string that can’t really be the local part or left hand side of an email address.
Putting it all together
I’ll now define a DOMAIN
that will combine our local_part
and fqdn
into a mailbox
, a fully syntactically valid email address. Notice the explicit casts which in turn, cause the validation of that part of the email address.
|
|
And as usual, here are some simple examples of the automated validation in action.
lem=# SELECT 'fun@postgres.help'::mailbox;
mailbox | fun@postgres.help
lem=# SELECT '<fly@quad.click'::mailbox;
ERROR: value for domain local_part violates check constraint "local_part_check"
CONTEXT: SQL function "valid_mailbox" statement 1
lem=# SELECT 'nosuch@domain'::mailbox;
ERROR: value for domain fqdn violates check constraint "fqdn_check"
CONTEXT: SQL function "valid_mailbox" statement 1
Notice the errors signaling which part of the email address is broken. These errors will automatically abort transactions where invalid data is being fed to your database, which is a very powerful integrity control.
When using database-driven integrity constraints such as in these examples, your application code no longer needs code to provide these constraints. They are enforced automatically at the database level. Better yet, if you have three or four different client applications, you still have a single point of validation that works seamlessly — and consistently! — across all applications.
Using our new fqdn domain
To illustrate the benefits of our new fqdn
DOMAIN
, let’s define a new customers
table.
|
|
Now look what happens when we try to add a customer with an invalid email address.
lem=# INSERT INTO customers ( name, email )
VALUES
( 'Jane Doe', 'badmail>@some.domain');
ERROR: value for domain local_part violates check constraint "local_part_check"
CONTEXT: SQL function "valid_mailbox" statement 1
And since this is implemented at the database level, your data integrity is not dependent on the client code having the right check in place: Bad data simply cannot enter the database.
In these simple examples, our domain definitions are basically TEXT
with validation functions automatically attached to them. This means that it’s still possible to compare say, a mailbox
and a fqdn
. Perhaps this will be available at some point in the future of the DOMAIN
construct.