You are here

Disadvantages of explicitly NOT using InnoDB Primary Keys?

We recently had the case with one of our customers where we got externally generated random hash values (up to 70 bytes) and they were used as Primary Keys in InnoDB.

As we know, this is not a very good idea because the size of all secondary indexes becomes large and because a random hash value as a Primary Key gives us a bad locality of our rows in the table 1.

If we do not specify a Primary Key and have no Unique Key InnoDB generates a hidden Clustered Index based on the Row ID which is a 6 byte field.

This Row ID is increasing monotonically 1. That would fit our needs.

So we were thinking about getting completely rid of any Primary Key of that InnoDB table.

This has the following advantages:

  • It is sorted by time (more or less).
  • The Primary Key is shorter (6 bytes instead of 30-70 bytes + something).
  • We do not run into the AUTO_INCREMENT bottleneck of Bug #16979 2 in MySQL 5.0 2, 3. Yes we are still running on 5.0.
  • We exclude possible other AUTO_INCREMENT contention situations.

To get more opinions about this idea I was writing to some of my colleagues from the vibrant MySQL Community and asked for their opinion.

And I got a lot of feedback. Please find below the summary:

  • An implicit Primary Key (= no PK) causes a Full Table Scan on the Slave for every row that is being deleted or updated when Row Based Replication (RBR) is in use 4. The reason for this behaviour is that the implicit Primary Key is not sent to the Slave. And thus the Slave does not know what has to be changed.
  • MySQL Bug #16979 2: This bug still exists in 5.1 if you use multi row insert. The 5.1 fix for single row inserts is very good. The problem completely goes away for all inserts with Row-based-replication.
  • You lose the ability to use tools like on-line schema change 5.
  • Checking for table drift is harder 6.
  • You have no way to write a WHERE clause that actually identifies a row.

Conclusion

InnoDB tables WITHOUT a Primary Key on a Master/Slave replication and RBR (5.1) seems to be definitely a nogo. For all other use cases one has to try out and benchmark. And one should only use this mean when there are big benefits.

So: Do not do InnoDB tables without a Primary Key except you know exactly what you are doing...

Thanks to all who replied personally to my question!

With the following query you can find the tables not having a Primary Key:

SELECT t.TABLE_SCHEMA, t.TABLE_NAME, t.ENGINE
  FROM information_schema.tables AS t
  LEFT JOIN information_schema.TABLE_CONSTRAINTS AS tc
         ON tc.TABLE_SCHEMA = t.TABLE_SCHEMA
        AND tc.TABLE_NAME = t.TABLE_NAME
        AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
 WHERE t.TABLE_TYPE = 'BASE TABLE'
   AND tc.CONSTRAINT_TYPE IS NULL
   AND t.TABLE_SCHEMA != 'mysql'
;
+--------------+---------------------+--------+
| TABLE_SCHEMA | TABLE_NAME          | ENGINE |
+--------------+---------------------+--------+
| cms          | antispam_spam_marks | MyISAM |
| cms          | search_dataset      | MyISAM |
| cms          | search_index        | MyISAM |
| test         | test                | InnoDB |
| zabbix       | history             | InnoDB |
| zabbix       | history_str         | InnoDB |
| zabbix       | history_uint        | InnoDB |
| zabbix       | node_cksum          | InnoDB |
+--------------+---------------------+--------+

By the way. The execution plan of the above query does not look too good! :-(

Sources


Taxonomy upgrade extras: 

Comments

SELECT t.table_schema, t.table_name, t.engine
  FROM information_schema.tables AS t
  LEFT JOIN information_schema.key_column_usage AS i ON t.table_schema = i.table_schema AND t.table_name = i.table_name AND i.constraint_name = 'PRIMARY'
 WHERE i.constraint_name IS NULL
   AND t.table_schema NOT IN ('information_schema', 'mysql')
;
olicomment