You are here
How the Lack of a Primary Key May Effectively Stop the Slave
Most (relational) DBAs and DB application developers know the concept of a primary key ("PK") and what it is good for. However, much too often one still encounters table definitions without a PK. True, the relational theory based on sets does not need a PK, and all operations (insert, select, update, delete) can also be done on tables for which no PK was defined. If performance doesn't matter (or the data volume is small, a typical situation in tests), the lack of a PK does not immediately cause negative consequences.
But recently, we had several customers who (independent of each other) had big tables without a PK in a replication setup which they wanted to delete, and they all suffered severely from that: Their replication did not progress, the slave lag grew larger and larger, and all this without reporting any error. When I say "larger and larger", I mean it: I'm not talking about minutes or even hours, I'm talking about days!
Observations
In all cases, the situation could be summarized as:
- They were running a traditional, asynchronous replication using the "row" format.
- They had a table without PK with many entries, say a million rows.
- This table wasn't needed any more, and someone issued a "delete from T" statement on the master.
- From this moment, the slave did not show any sign of progress: The output of "show slave status \G"
- listed both the IO and the SQL slave thread as running,
- reported the same log positions without any change,
- did not mention any slave error,
- but the slave lag grew in sync with the passing time.
Looking at the slave's machine load, say using "vmstat", you could see a certain amount of CPU load in user mode, possibly some "waiting for IO", and some read IO (group "io", column "bi" = "block input"). Checking with "top", you could see that the MySQL server process was the only significant CPU load, and closer inspection would reveal that it used roughly one CPU core.
(A side remark: The numbers of "vmstat" and "top" are not directly comparable: While "vmstat" reports the overall CPU usage as a percentage of the total available CPU power, "top" reports the CPU usage of the individual processes as the percentage of a single CPU core. So if you had a 4-core machine with only one active process running an infinite loop, "top" would show this process as using 100%, while "vmstat" would report the CPU as running 25% in user mode and being 75% idle. If you don't know it, look at "/proc/cpuinfo" to find the number of CPU cores.)
What was the Issue?
From "top", we could tell that the MySQL server was using one CPU core at full speed. From "vmstat", we could tell that it might access some data from disk, but didn't write any significant amount. This sure looked like an infinite loop or a close relative of it, executed by only one thread.Some of you may have heard that (under some conditions) replication may do a full table scan: This is it!
When the slave has to apply a change (for this discussion: a delete or an update) which is provided in row format, and the table has no PK (or other suitable index), the SQL thread does a full table scan searching for the matching row. Note that it doesn't stop on the first match, but rather continues the scan throughout the rest of the table.
To be honest: I don't understand why it doesn't stop after the first match. I also ran a modified test where some master rows had two matches on the slave (I had duplicated them by "insert select"): The "delete" was still replicated as before, and these additional slave rows were still present after the delete.
Until now, you may consider that "full table scan" as a claim without proof, and I agree with all demands for a more thorough check.
The Experiment
For tests, I have a setup of two VMs, each running a MySQL server process, and they are configured for a traditional master-slave replication. In fact, I have it for the versions MySQL 5.5.44, 5.6.24, and 5.7.17. (Yes, I might add newer versions.)So I designed an experiment:
- Start both master and slave, make sure replication is running.
- On the master, create a table without any key / index, neither primary nor secondary:
CREATE TABLE for_delete ( inc_num int, # ascending numbers 1 .. 10,000 repeated for larger tables clock timestamp, # now() counted: 129 different values with 200 k rows bubble char(250) # repeat('Abcd', 60) ) DEFAULT CHARACTER SET latin1 ;
- Insert into that table the desired number of rows (I had runs with 10 k, 20 k, 100 k, and 200 k)
- on both master and slave, run "SELECT ... FROM information_schema.global_status WHERE ..." to get status counters,
- Delete all rows of that table
- sleep for a time depending on the row count (you will later see why)
- get new status counters
- compute the status counter differences
This experiment reliably reproduced the symptoms which the customers had reported: When the "delete" had been done on the master, the slave became busy for a long time, depending on the row count, and in this time it did not show any progress indicators. This holds for all three versions I checked.
So what did the slave do? It worked as assumed: For each row, it did a full table scan, continuing even after it had found a matching row. This can be easily seen in the status counters, which you will find in the result tables below.
In this article, I will show the results of tests with 200 k rows only, as they are the most impressive. As described above, the table was created anew for each run. Then, 10,000 rows were inserted, and this was repeated 20 times. Each such group had the value of "inc_num" go up from 1 to 10,000.
In all runs, the status counters "HANDLER_DELETE", "INNODB_ROWS_DELETED", and "INNODB_ROWS_READ" showed the value of 200,000, the table size. So the handler calls did not show any scan, it must be below that interface (= within the handler).
If the server does a table scan, it must access all data pages of the table, so there must be read requests for them to guarantee they are in the buffer pool.
And yes, the status counters showed these read requests:
Table 1: No Key or Index
200,000 rows, status counters on | Master | Slave |
Version 5.5.44 | ||
INNODB_BUFFER_POOL_READ_REQUESTS | 2,174,121 | 380,132,972 |
Ratio "read_requests" Slave/Master | 174.8 | |
"read_requests" per "rows_deleted" | 10.9 | 1,900.7 |
Version 5.6.24 | ||
INNODB_BUFFER_POOL_READ_REQUESTS | 2,318,685 | 379,762,583 |
Ratio "read_requests" Slave/Master | 163.8 | |
"read_requests" per "rows_deleted" | 11.6 | 1,898.8 |
Version 5.7.17 | ||
INNODB_BUFFER_POOL_READ_REQUESTS | 1,598,745 | 379,538,695 |
Ratio "read_requests" Slave/Master | 237.4 | |
"read_requests" per "rows_deleted" | 8.0 | 1,897.7 |
While master and slave (from a logical point of view) do the same work (delete all rows from a 200 k rows table), the slave accesses many more pages than the master - depending on the version, the factor ranges from 164 to 237. (Note that in version 5.7 the slave didn't do worse than in earlier versions, rather the master did better. This shows clearly in the ratio of "innodb buffer pool read requests" to "rows deleted" on the master: This is about 11 in 5.5 and 5.6, but only 8 in 5.7.)
I won't bother you with the detailed results for 100 k rows. It is enough to say that for half as many rows, the master had half as many read requests, while the slave had a quarter. This means that the effort on the master grows proportional to table size, but on the slave it grows with the square of the table size. Accepting the fact that the slave always scans the full table, this is easy to understand: If the table has twice as many rows, there are twice as many pages to scan, and the number of scans also doubles.
Quadratic growth means that an increase of the table size by a factor ten lets the effort grow by a factor hundred! Now imagine I had tested with a million of rows ...
All this isn't new: In the public MySQL bug database, this is listed as bug #53375. The bug was reported on May 3, 2010, for MySQL 5.1. The bug is closed since December 15, 2011. This does not mean the full table scans were stopped - obviously, they are considered unavoidable, so the bug fix is to write an explaining message into the error log. In my tests, I encountered it only once, here it is:
[Note] The slave is applying a ROW event on behalf of a DELETE statement on table for_delete and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning an index while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance.
Avoiding the Issue
Of course, this problem is the consequence of sloppy table design: If there were a primary key, it would not occur. But this may not help those who currently have tables without a PK and dare not add one, because it would require prior testing and a maintenance window to alter the table on the master.
Luckily, there are remedies, and even more than the log message (quoted above) mentions: In my tests, I found that the slave will use any index on that table if one is available. (It even uses an index with low selectivity, which may increase the page request count.) I did 5 tests:
- Create an index on "clock" (129 different values).
- Create an index on "inc_num" (10,000 different values).
- Create an index on both "clock" and "inc_num" (all combinations are distinct).
- As before, and declare it as "unique".
- Define the combination of "clock" and "inc_num" to be the primary key.
The "alter table" statement for this was always issued on the slave only, after the inserts, but before the delete.
In all three versions, the slave used any index available.
The index on "clock" of course had many different rows per value, so it caused many even more page requests than the no-index case: between 2,800 and 3,165 per row deleted.
Obviously, thios makes the problem even worse.
But all other indexes were huge improvements, the primary key of course was the best choice.
Here are the numbers:
Table 2: With Added Key or Index
200,000 rows, status counters on | Master | Slave | Slave | Slave | Slave | Slave |
"Alter table ..." on slave before "Delete": | Index on "clock" 129 values | Index on "inc_num" 10,000 v. | Index on both |
Unique I. on both |
Primary key on both | |
Version 5.5.44 | ||||||
INNODB_BUFFER_POOL_READ_REQUESTS | 2,174,121 | 633,132,032 | 10,197,554 | 4,429,243 | 4,421,357 | 1,810,734 |
Ratio "read_requests" Slave/Master | 291.2 | 4.7 | 2.0 | 2.0 | 0.8 | |
"read_requests" per "rows_deleted" | 10.9 | 3,165.7 | 51.0 | 22.1 | 22.1 | 9.1 |
Version 5.6.24 | ||||||
INNODB_BUFFER_POOL_READ_REQUESTS | 2,318,685 | 610,482,315 | 10,327,132 | 4,375,106 | 4,073,543 | 1,794,524 |
Ratio "read_requests" Slave/Master | 342.1 | 5.0 | 2.1 | 2.0 | 0.9 | |
"read_requests" per "rows_deleted" | 11.6 | 3,052.4 | 51.6 | 21.9 | 20.4 | 9.0 |
Version 5.7.17 | ||||||
INNODB_BUFFER_POOL_READ_REQUESTS | 1,598,745 | 559,398,633 | 9,531,444 | 3,859,343 | 3,836,390 | 1,720,731 |
Ratio "read_requests" Slave/Master | 354.8 | 6.0 | 2.3 | 2.5 | 1.1 | |
"read_requests" per "rows_deleted" | 8.0 | 2,797.0 | 47.7 | 19.3 | 19.2 | 8.6 |
But one very important fact should be mentioned in addition to the numbers: The fact that it worked! By adding an index, I made the slave's schema differ from the master's. The primary key even totally changed the B-tree in which the rows are stored, and it made the slave drop the internal row ID which InnoDB had added on the master. Still, replication using the row format could handle these differences without problems.
The Way Out
So the good message is:
- Even if you have a table without indexes or primary key, you can add these on the slave without breaking the replication.
- If you suffer from slow replication on such a table, adding a good index or (even better) the PK will solve this problem.
- In a replication setup, you can improve the schema on the slave and then do a failover, effectively improving it for all accesses without any maintenance window - just the short failover time.
This might help many DBAs who otherwise don't see a chance to improve a bad schema once it is used in production.
Take care!
- joerg's blog
- Log in or register to post comments