SQL & Optional Data
null, which is used to indicate no value. The problem is that it doesn’t explain why there is no value. Here we look at adding more meaning to
null to allow for optional data.
Here are some possibile interpretions of
- The value is unknown
- The value is not important
- There is no value yet
To test for
null, you cannot use
something=null. Instead, you must use
something is null. This reflects an interpretation of
null as being unknown.
For example, suppose you have a table of authors, and you choose to record the date of birth and the date of death.
CREATE TABLE authors ( id INT PRIMARY KEY, name VARCHAR(40), born varchar, died varchar );
died values are
varchar here to simplify the example; in reality, they should be
date types or similar).
What does it mean for the date of death to be null? What about the date of birth?
For the date of birth, it is easy to interpret a
null as unknown.
For the date of death, however, there are two possible interpretations:
- The author hasn’t died (yet)
- The author has died, but the actual data is unknown (or irrelevant)
SQL does not, of itself, accommodate alternative interpretations of
null, so the reason remains a mystery.
(This is my own term; many call it a one-to-zero-or-one, which is not quite as cool).
Normally, tables are related using a Foreign Key to Primary Key relationship, from a child to parent table:
CREATE TABLE child ( id INT PRIMARY KEY, parentid INT REFERENCES parent(id) -- etc );
Since many child rows can reference the same parent row, we say that there is a one to many relationship between the parent and child.
A one to one relationship is that of a single row in one table to a single row to another. Generally, it is between their primary keys.
Very few true one to one relationships exist, as that would involve creating both rows at the same time. In reality, One table is the main table, while the other is a secondary table, providing additional, possibly optional, columns.
You can create this relationship without changing your main table by creating a secondary table like this:
CREATE TABLE secondary ( id INT PRIMARY KEY REFERENCES main(id) -- etc );
Note that the
id is both a primary key and a foreign key. This is the key, so to speak, of this type of relationship.
If you are tempted to auto-increment your primary key, don’t. The whole point is that it matches one of rows of the main table. The primary key nature ensures that it doesn’t happen twice.
The Example Again
Here is a possible
authors table, without a date of death:
CREATE TABLE authors ( id INT PRIMARY KEY, name VARCHAR(40), born varchar ); INSERT INTO authors(id,name,born) VALUES (1,'Fred','now'), (2,'Wilma','then'), (3,'Betty','whenever'), (4,'Barney','sometime');
Here is a possible secondary table which will record only deaths, if any:
CREATE TABLE deaths ( id INT PRIMARY KEY REFERENCES authors(id), died varchar ); INSERT INTO deaths(id,died) VALUES (1,null), (3,'yesterday'), (4,'anotherday');
1has died, but the date is unknown
2is missing, and can be presumed living
Fetching the data
To fetch the full data, we will need a
JOIN, and since there may be missing rows in the secondary table, we will need an
OUTER JOIN; in this case it will be a
SELECT * FROM authors a LEFT JOIN deaths d ON a.id=d.id;
You will get something like this:
Missing vs Empty
The problem is with the above result set is the
d.died column. For id
d.died column is
null because the date is unknown. For id
2, it is
null because there is no entry in the
How do we distinguish between them? The solution is in the
d.id column. Because the row is missing, it is
null for id
Knowing this, we can implement the following logic:
null, then the row is missing, and the author is living.
- else the author has died:
not null, we can use the date
- else the date is unknown
In SQL, this can be expressed with the
CASE WHEN d.id IS NULL THEN 'living' ELSE WHEN d.died is NOT NULL THEN d.died ELSE 'unknown' END
If a column may or may not be
null, you can use the
coalesce function to provide an alternative:
This means we will use the
d.died value, if any, or
unknown if it is null. This is a neater version of the two
ELSE clauses above
CASE WHEN d.id IS NULL THEN 'living' ELSE coalesce(d.died,'unknown') END
The Finished Statement
Combining the above, we can write:
SELECT a.id,a.name,a.born, CASE WHEN d.id IS NULL THEN 'living' ELSE coalesce(d.died,'unknown') END AS died FROM authors a LEFT JOIN deaths d ON a.id=d.id;