You are here
Using NULL as default values
Abstract:
It is common practice in MySQL table design that fields are declared as NOT NULL
but some non-sense DEFAULT
values are specified for unknown field contents. In this article we show why this behavior is non optimal an why you should better declare a field to allow NULL
values and use NULL
values instead of some dummy values.
What we can see often out in the field
Recently we had a discussion with a customer if it makes more sense to store a default value or NULL
in InnoDB tables when we do not know the value of the field yet. About this 8 byte DATETIME
field we were discussing:
CREATE TABLE test ( ... , dt DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00' ...
The customer mentioned, that he is mostly storing nothing in this field which in fact results in '0000-00-00 00:00:00'
occupying 8 bytes.
This is a situation which happens very often in applications but is ignored or not properly designed. But the MySQL documentation clearly states what are the facts [1].
Showing the situation by example
To make the situation a bit more clear we created this little example:
CREATE TABLE t1 ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , dt DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00' ) ENGINE = InnoDB;
Then we filled about 1 mio rows into this table t1
:
INSERT INTO t1 VALUES (NULL, NOW()); INSERT INTO t1 SELECT NULL, NOW() FROM t1; ... INSERT INTO t1 SELECT NULL, NOW() FROM t1; Query OK, 524288 rows affected (9.25 sec) Records: 524288 Duplicates: 0 Warnings: 0
Then we create a second table but this time with a "correct" declaration of the default value:
CREATE TABLE t2 ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , dt DATETIME NULL DEFAULT NULL ) ENGINE = InnoDB;
Then we filled some rows into this table as well:
INSERT INTO t2 SELECT NULL, NULL FROM t1;
Verifying what comes out
Then we run:
SHOW TABLE STATUS; +------+--------+---------+----------------+-------------+--------------+ | Name | Engine | Rows | Avg_row_length | Data_length | Index_length | +------+--------+---------+----------------+-------------+--------------+ | t1 | InnoDB | 1048576 | 33 | 35192832 | 0 | | t2 | InnoDB | 1048576 | 25 | 26787840 | 0 | +------+--------+---------+----------------+-------------+--------------+
The tables on disk occupy:
-rw-rw---- 1 mysql mysql 46137344 2011-07-19 20:27 t1.ibd -rw-rw---- 1 mysql mysql 37748736 2011-07-19 20:28 t2.ibd
Now we can clearly see that we safe 8 bytes per row or 8 Mbyte in total just by using NULL
in the correct way instead of some dummy default value.
What says the MySQL documentation about it
The MySQL documentation clearly states in Chapter Physical Row Structure:
The record header of InnoDB tables contains a bit vector for indicating NULL
columns. Columns that are NULL
do not occupy space other than the bit in this vector. SQL NULL
value reserves one or two bytes in the record directory. Besides that, an SQL NULL
value reserves zero bytes in the data part of the record if stored in a variable length column. [1]
Conclusion:
When you are defining tables you should clearly declare when a column can contain undefined values with NULL
and you should use NULL
instead of some dummy default values. This occupies less storage for the InnoDB table and thus makes the whole thing faster.
- Shinguz's blog
- Log in or register to post comments
Comments
Hi, Using date values like
Using NULL values is
CREATE TABLE t ( id int(11) NOT NULL DEFAULT 0, city varchar(10) DEFAULT NULL );
The following query is expected to return all the rows in the table:SELECT * FROM t WHERE city=city;
That's true if the city is not NULL but not if the city is NULL:mysql> INSERT INTO t (id,city) VALUES (1,'XXX'); Query OK, 1 row affected (0.00 sec)
mysql> SELECT * FROM t WHERE city=city; +----+------+ | id | city | +----+------+ | 1 | XXX | +----+------+
mysql> UPDATE t SET city=NULL WHERE id=1;
mysql> SELECT * FROM t WHERE city=city; Empty set (0.00 sec)
What's the problem? MySQL returns in both cases a result that is correct in mathematical logic but the introduction of NULLs changes the kind of logic that applies (3-valued logic instead of 2-valued logic). Unfortunately in the real world, you always think with the 2-valued logic. More information on this topic in C.J. Date's SQL and Relational Theory. So I try to avoid NULL values at all costs, even if it means wasting some space.Different opinions
IS NULL
employs indexes if col_name is indexed. [1]Some experiments on a table with a few (= 4)
NULL
values:With 50%
--> Here the Optimizer decides wrong. A full table scan would be faster. --> Here the Optimizer decided correctly!NULL
valuesWith 3.6% non
--> Optimizer is still wrong and should use a Full Table Scan. I admit, that indexed columns or even columns you use for joining are tricky and should be handled carefully. But all the other 80% of columns can easily be used withNULL
values (= 96.4%NULL
values)DEFAULT NULL
if it fits into you business logic.Shinguz, thanks a lot for
generally agree
I, too, prefer NULL, but...