You are here
MariaDB indexing of NULL values
In the recent MariaDB DBA advanced training class the question came up if MariaDB can make use of an index when searching for NULL
values... And to be honest I was not sure any more. So instead of reading boring documentation I did some little tests:
Search for NULL
First I started with a little test data set. Some of you might already know it:
CREATE TABLE null_test ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , data VARCHAR(32) DEFAULT NULL , ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP() ); INSERT INTO null_test VALUES (NULL, 'Some data to show if null works', NULL); INSERT INTO null_test SELECT NULL, 'Some data to show if null works', NULL FROM null_test; ... up to 1 Mio rows
Then I modified the data according to my needs to see if the MariaDB Optimizer can make use of the index:
-- Set 0.1% of the rows to NULL UPDATE null_test SET data = NULL WHERE ID % 1000 = 0; ALTER TABLE null_test ADD INDEX (data); ANALYZE TABLE null_test;
and finally I run the test (MariaDB 10.3.11):
EXPLAIN EXTENDED SELECT * FROM null_test WHERE data IS NULL; +------+-------------+-----------+------+---------------+------+---------+-------+------+----------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-----------+------+---------------+------+---------+-------+------+----------+-----------------------+ | 1 | SIMPLE | null_test | ref | data | data | 35 | const | 1047 | 100.00 | Using index condition | +------+-------------+-----------+------+---------------+------+---------+-------+------+----------+-----------------------+
We can clearly see that the MariaDB Optimizer considers and uses the index and its estimation of about 1047 rows is quite appropriate.
Unfortunately the optimizer chooses the completely wrong strategy (3 times slower) for the opposite query:
EXPLAIN EXTENDED SELECT * FROM null_test WHERE data = 'Some data to show if null works'; +------+-------------+-----------+------+---------------+------+---------+-------+--------+----------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-----------+------+---------------+------+---------+-------+--------+----------+-----------------------+ | 1 | SIMPLE | null_test | ref | data | data | 35 | const | 522351 | 100.00 | Using index condition | +------+-------------+-----------+------+---------------+------+---------+-------+--------+----------+-----------------------+
Search for NOT NULL
Now let us try to test the opposite problem:
CREATE TABLE anti_null_test ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , data VARCHAR(32) DEFAULT NULL , ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP() ); INSERT INTO anti_null_test VALUES (NULL, 'Some data to show if null works', NULL); INSERT INTO anti_null_test SELECT NULL, 'Some data to show if null works', NULL FROM anti_null_test; ... up to 1 Mio rows
Then I modified the data as well but this time in the opposite direction:
-- Set 99.9% of the rows to NULL UPDATE anti_null_test SET data = NULL WHERE ID % 1000 != 0; ALTER TABLE anti_null_test ADD INDEX (data); ANALYZE TABLE anti_null_test;
and then we have to test again the query:
EXPLAIN EXTENDED SELECT * FROM anti_null_test WHERE data IS NOT NULL; +------+-------------+----------------+-------+---------------+------+---------+------+------+----------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+----------------+-------+---------------+------+---------+------+------+----------+-----------------------+ | 1 | SIMPLE | anti_null_test | range | data | data | 35 | NULL | 1047 | 100.00 | Using index condition | +------+-------------+----------------+-------+---------------+------+---------+------+------+----------+-----------------------+
Also in this case the MariaDB Optimizer considers and uses the index and produces a quite fast Query Execution Plan.
Also in this case the optimizer behaves wrong for the opposite query:
EXPLAIN EXTENDED SELECT * FROM anti_null_test WHERE data IS NULL; +------+-------------+----------------+------+---------------+------+---------+-------+--------+----------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+----------------+------+---------------+------+---------+-------+--------+----------+-----------------------+ | 1 | SIMPLE | anti_null_test | ref | data | data | 35 | const | 523506 | 100.00 | Using index condition | +------+-------------+----------------+------+---------------+------+---------+-------+--------+----------+-----------------------+
- Shinguz's blog
- Log in or register to post comments
Comments
MariaDB indexing of NULL values
Hello,
Just wanted to say that the post misses the point. The problem is not
NULL
values as such. the problem is:Here is an example demonstrating the same effect with a non-NULL value: https://gist.github.com/spetrunia/2c6ea05ed6b0c1ffa14fa8eac6b72029 . We still get 400K rows as an estimate (instead if 990K) . The estimate is provided by InnoDB.
Let's try it with MyRocks storage engine: https://gist.github.com/spetrunia/5445ef362aa26207ccbf83c093d7135b . The optimizer here picks a full table scan for the query with "data = 'Some data to show if null works'", which is faster.
BR,
-- Sergei Petrunia
Re: MariaDB indexing of NULL values
Yes, I am aware that this is not a
NULL
issue. I hope my blog did not tell that this is be cause ofNULL
.> Well, its title is "Indexing of NULL values", so if one takes that at face value, I'm afraid they will think it's about special treatment of
NULL
values :-).The problem is, that MariaDB (and MySQL) chooses the wrong query execution plan for columns with low cardinality and chooses a 3 times slower plan (Index Lookup instead of FTS).
> Yes, agree.
(For me it is 2.2x slower. I would say that in general, if the optimizer is within 2x difference of the best query plan, things are ok. There are many factors affecting query execution speed (e.g. OS/engine cache population, correlation of conditions, etc) - it is not realistic to expect the optimizer to always pick the best plan. One should aim for query plans that are not more than N times slower than the best one, where the value of N is about 2 for simple queries and not more than 10 for complex queries).
If this is InnoDB related or not is not interesting for the customer. You are the Optimizer cracks and you deliver the Storage Engine and you have some InnoDB cracks on board! So fix it, please...
Or do you want to tell me, that InnoDB becomes obsolete in MariaDB and you will switch to MyRocks soon (devil)? .oO(ouh, ouh, what about all the read workload patterns???)
> No. It's just a proof that the issue was in the storage engine, not in the optimizer. We might need to talk to our InnoDB guys again about this. I will file an MDEV.
I still have to do other testing. According to theory even 400k rows should be cheaper with FTS instead of Index-Lookup but this I want to test before ranting around...