FromDual TechFeed (en)

Galera Load Balancer the underestimated wallflower

Shinguz - Thu, 2017-09-21 15:25

There are some pretty sophisticated Load Balancers for Galera Clusters setups out in the market (ProxySQL, MaxScale, HAproxy, ...). They have many different exotic features. You can nearly do everything with them. But this comes at the cost of complexity. Non of them is simple any more.

A widely underestimated Load Balancer solution for Galera Cluster setups is the Galera Load Balancer from Codership. It is an simple Load Balancer solution which serves all of our daily needs when it comes to Galera Cluster. Unfortunately this product is not much promoted by the software vendor himself.

Installation of Galera Load Balancer

This starts with the installation. There are no packages ready to install. You have to compile Galera Load Balancer yourself. FromDual provides some compiled packages or can help you building and installing it.

You can get the Galera Load Balancer sources from Github. The binaries are built straight forward:

shell> git clone https://github.com/codership/glb shell> cd glb/ shell> ./bootstrap.sh shell> ./configure shell> make shell> make install

If you prefer a binary tar ball as I do, you can run the following commands instead of make install:

shell> TARGET=glb-1.0.1-linux-$(uname -m) shell> mkdir -p ${TARGET}/sbin ${TARGET}/lib ${TARGET}/share/glb shell> cp src/glbd ${TARGET}/sbin/ shell> cp src/.libs/libglb.a src/.libs/libglb.so* ${TARGET}/lib/ shell> cp files/* ${TARGET}/share/glb/ shell> cp README NEWS COPYING CONTRIBUTORS.txt CONTRIBUTOR_AGREEMENT.txt ChangeLog BUGS AUTHORS shell> tar czf ${TARGET}.tar.gz ${TARGET} shell> rm -rf ${TARGET}
Configuration of Galera Load Balancer

The Galera Load Balancer is configured in a file called glbd which must be located under /etc/sysconfig/gldb (Red Hat and its derivatives) or /etc/default/glbd (Debian and its derivatives). I did not find any option to tell Galera Load Balancer where to search for a configuration file.

The Galera Load Balancer parameters are documented here.

Starting and Stopping Galera Load Balancer

This means for me I have to specify all my parameters on the command line:

product/glb/sbin/glbd --threads 8 --max_conn 500 \ --round --fifo /home/mysql/run/glbd.fifo --control 127.0.0.1:3333 \ 127.0.0.1:3306 \ 192.168.1.1:3306:1 192.168.1.2:3306:2 192.168.1.3:3306:1

An equivalent configuration file would look as follows:

# # /etc/sysconfig/glbd.cfg # LISTEN_ADDR="127.0.0.1:3306" CONTROL_ADDR="127.0.0.1:3333" CONTROL_FIFO="/home/mysql/run/glbd.fifo" THREADS="8" MAX_CONN="500" DEFAULT_TARGETS="192.168.1.1:3306:1 192.168.1.2:3306:2 192.168.1.3:3306:1" OTHER_OPTIONS="--round"
Stopping Galera Load Balancer is simple: killall glbd
Galera Load Balancer operations

Beside starting and stopping Galera Load Balancer you also want to look into it. This can be done with the following 2 commands:

echo getinfo | nc -q 1 127.0.0.1 3333 echo getstats | nc -q 1 127.0.0.1 3333

Or if you want to have it in a more top/vmstat like style:

watch -n 1 "echo getstats | nc -q 1 127.0.0.1 3333" watch -n 1 -d "echo getinfo | nc -q 1 127.0.0.1 3333"

More interesting are operations like draining and undraining a Galera Cluster node from the Galera Load Balancer. To drain a Galera Cluster node for example for maintenance (kernel upgrade?) you can run the following command:

echo "192.168.1.2:3306:0" | nc 127.0.0.1 3333

To undrain the node again it works like this:

echo "192.168.1.2:3306:2" | nc 127.0.0.1 3333

Unfortunately Galera Load Balancer does not memorize the weight (:2).

If you want to remove or add a node from/to the Galera Load Balancer this works as follows:

echo "192.168.1.2:3306:-1" | nc 127.0.0.1 3333 echo "192.168.1.2:3306:1" | nc 127.0.0.1 3333

Further Galera Load Balancer operation tasks you can find in the documentation.

Taxonomy upgrade extras: Galera Cluster glb load balancer Operations

Find evil developer habits with log_queries_not_using_indexes

Shinguz - Wed, 2017-09-20 16:00

Recently I switched on the MariaDB slow query logging flag log_queries_not_using_indexes just for curiosity on one of our customers systems:

A tail -f on the MariaDB Slow Query Log caused a huge flickering on my screen.
I got to see about 5 times per second the following statement sequence in the Slow Query Log:

# User@Host: app_admin[app_admin] @ [192.168.1.42] Id: 580195 # Query_time: 0.091731 Lock_time: 0.000028 Rows_sent: 273185 Rows_examined: 273185 SELECT LAST_INSERT_ID() FROM `placeholder`; # Query_time: 0.002858 Lock_time: 0.000043 Rows_sent: 6856 Rows_examined: 6856 SELECT LAST_INSERT_ID() FROM `data`;

So at least 5 times 95 ms (5 x (92 + 3) = 475 ms) per 1000 ms (48%) where spent in these 2 statements which are running quite fast but do not use an index (long_query_time was set to 2 seconds).

So I estimate, that this load job can be speed up at least by factor 2 when using the LAST_INSERT_ID() function correctly not considering the possible reduction of network traffic (throughput and response time).

To show the problem I made a little test case:

mariadb> INSERT INTO test VALUES (NULL, 'Some data', NULL); mariadb> SELECT LAST_INSERT_ID() from test; +------------------+ | LAST_INSERT_ID() | +------------------+ | 1376221 | ... | 1376221 | +------------------+ 1048577 rows in set (0.27 sec)

The response time of this query will linearly grow with the amount of data as long as they fit into memory and the response time will explode as soon as the table does not fit into memory any more. In addition the network traffic would be reduced by about 8 Mbyte (1 Mio rows x BIGINT UNSIGNED (64-bit) + some header per row?) per second (6-8% of the network bandwidth of a 1 Gbit network link).

shell> ifconfig lo | grep bytes RX bytes:2001930826 (2.0 GB) TX bytes:2001930826 (2.0 GB) shell> ifconfig lo | grep bytes RX bytes:2027289745 (2.0 GB) TX bytes:2027289745 (2.0 GB)

The correct way of doing the query would be:

mariadb> SELECT LAST_INSERT_ID(); +------------------+ | last_insert_id() | +------------------+ | 1376221 | +------------------+ 1 row in set (0.00 sec)

The response time is below 10 ms.

So why is the first query taking so long an consuming so many resources? To get an answer to this question the MariaDB Optimizer can tell us more with the Query Execution Plan (QEP):

mariadb> EXPLAIN SELECT LAST_INSERT_ID() FROM test; +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ | 1 | SIMPLE | test | index | NULL | PRIMARY | 4 | NULL | 1048577 | Using index | +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ mariadb> EXPLAIN FORMAT=JSON SELECT LAST_INSERT_ID() FROM test; { "query_block": { "select_id": 1, "table": { "table_name": "test", "access_type": "index", "key": "PRIMARY", "key_length": "4", "used_key_parts": ["id"], "rows": 1048577, "filtered": 100, "using_index": true } } }

The database does a Full Index Scan (FIS, other call it a Index Fast Full Scan (IFFS)) on the Primary Key (column id).

The Query Execution Plan of the second query looks as follows:

mariadb> EXPLAIN SELECT LAST_INSERT_ID(); +------+-------------+-------+------+---------------+------+---------+------+------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+----------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | +------+-------------+-------+------+---------------+------+---------+------+------+----------------+ mariadb> EXPLAIN FORMAT=JSON SELECT LAST_INSERT_ID(); { "query_block": { "select_id": 1, "table": { "message": "No tables used" } } }
Taxonomy upgrade extras: query tuning Optimizer index index scan last_insert_id explain slow log

Kategorien: FromDual all (en), FromDual TechFeed (en), MySQL Tech-Feed (en)

Find evil developer habits with log_queries_not_using_indexes

Shinguz - Wed, 2017-09-20 16:00

Recently I switched on the MariaDB slow query logging flag log_queries_not_using_indexes just for curiosity on one of our customers systems:

A tail -f on the MariaDB Slow Query Log caused a huge flickering on my screen.
I got to see about 5 times per second the following statement sequence in the Slow Query Log:

# User@Host: app_admin[app_admin] @ [192.168.1.42] Id: 580195 # Query_time: 0.091731 Lock_time: 0.000028 Rows_sent: 273185 Rows_examined: 273185 SELECT LAST_INSERT_ID() FROM `placeholder`; # Query_time: 0.002858 Lock_time: 0.000043 Rows_sent: 6856 Rows_examined: 6856 SELECT LAST_INSERT_ID() FROM `data`;

So at least 5 times 95 ms (5 x (92 + 3) = 475 ms) per 1000 ms (48%) where spent in these 2 statements which are running quite fast but do not use an index (long_query_time was set to 2 seconds).

So I estimate, that this load job can be speed up at least by factor 2 when using the LAST_INSERT_ID() function correctly not considering the possible reduction of network traffic (throughput and response time).

To show the problem I made a little test case:

mariadb> INSERT INTO test VALUES (NULL, 'Some data', NULL); mariadb> SELECT LAST_INSERT_ID() from test; +------------------+ | LAST_INSERT_ID() | +------------------+ | 1376221 | ... | 1376221 | +------------------+ 1048577 rows in set (0.27 sec)

The response time of this query will linearly grow with the amount of data as long as they fit into memory and the response time will explode as soon as the table does not fit into memory any more. In addition the network traffic would be reduced by about 8 Mbyte (1 Mio rows x BIGINT UNSIGNED (64-bit) + some header per row?) per second (6-8% of the network bandwidth of a 1 Gbit network link).

shell> ifconfig lo | grep bytes RX bytes:2001930826 (2.0 GB) TX bytes:2001930826 (2.0 GB) shell> ifconfig lo | grep bytes RX bytes:2027289745 (2.0 GB) TX bytes:2027289745 (2.0 GB)

The correct way of doing the query would be:

mariadb> SELECT LAST_INSERT_ID(); +------------------+ | last_insert_id() | +------------------+ | 1376221 | +------------------+ 1 row in set (0.00 sec)

The response time is below 10 ms.

So why is the first query taking so long an consuming so many resources? To get an answer to this question the MariaDB Optimizer can tell us more with the Query Execution Plan (QEP):

mariadb> EXPLAIN SELECT LAST_INSERT_ID() FROM test; +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ | 1 | SIMPLE | test | index | NULL | PRIMARY | 4 | NULL | 1048577 | Using index | +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ mariadb> EXPLAIN FORMAT=JSON SELECT LAST_INSERT_ID() FROM test; { "query_block": { "select_id": 1, "table": { "table_name": "test", "access_type": "index", "key": "PRIMARY", "key_length": "4", "used_key_parts": ["id"], "rows": 1048577, "filtered": 100, "using_index": true } } }

The database does a Full Index Scan (FIS, other call it a Index Fast Full Scan (IFFS)) on the Primary Key (column id).

The Query Execution Plan of the second query looks as follows:

mariadb> EXPLAIN SELECT LAST_INSERT_ID(); +------+-------------+-------+------+---------------+------+---------+------+------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+----------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | +------+-------------+-------+------+---------------+------+---------+------+------+----------------+ mariadb> EXPLAIN FORMAT=JSON SELECT LAST_INSERT_ID(); { "query_block": { "select_id": 1, "table": { "message": "No tables used" } } }
Taxonomy upgrade extras: query tuning Optimizer index index scan last_insert_id explain slow log

Kategorien: FromDual all (en), FromDual TechFeed (en), MySQL Tech-Feed (en)

Storing BLOBs in the database

Shinguz - Fri, 2017-06-30 14:18

We have sometimes discussions with our customers whether to store LOBs (Large Objects) in the database or not. To not rephrase the arguments again and again I have summarized them in the following lines.

The following items are more or less valid for all large data types (BLOB, TEXT and theoretically also for JSON and GIS columns) stored in a MySQL or MariaDB (or any other relational) database.

The idea of a relational table based data-store is to store structured data (numbers, data and short character strings) to have a quick write and read access to them.

And yes, you can also store other things like videos, huge texts (PDF, emails) or similar in a RDBMS but they are principally not designed for such a job and thus non optimal for the task. Software vendors implement such features not mainly because it makes sense but because users want it and the vendors want to attract users (or their managers) with such features (USP, Unique Selling Proposition). Here also one of my Mantras: Use the right tool for the right task:

The main topics to discuss related to LOBs are: Operations, performance, economical reasons and technical limitations.

Disadvantages of storing LOBs in the database

The database will grow fast. Operations will become more costly and complicated.
Backup and restore will become more costly and complicated for the admin because of the increased size caused by LOBs.
Backup and restore will take longer because of the same reason.
Database and table management functions (OPTIMIZE, ALTER, etc.) will take longer on big LOB tables.
Smaller databases need less RAM/disk space and are thus cheaper.
Smaller databases fit better into your RAM and are thus potentially faster (RAM vs disk access).
RDBMS are a relatively slow technology (compared to others). Reading LOBs from the database is significantly slower than reading LOBs from a filer for example.
LOBs stored in the database will spoil your database cache (InnoDB Buffer Pool) and thus possibly slow down other queries (does not necessarily happen with more sophisticated RBDMS).
LOB size limitation of 1 Gbyte in reality (max_allowed_packet, theoretically limit is at 4 Gbyte) for MySQL/MariaDB.
Expensive, fast database store (RAID-10, SSD) is wasted for something which can be stored better on a cheap slow file store (RAID-5, HDD).
It is programmatically often more complicated to get LOBs from a database than from a filer (depends on your libraries).

Advantages of storing LOBs in the database

Atomicity between data and LOB is guaranteed by transactions (is it really in MySQL/MariaDB?).
There are no dangling links (reference from data to LOB) between data and LOB.
Data and LOB are from the same point in time and can be included in the same backup.
Data and LOB can be transferred simultaneously to other machines, by database replication or dump/restore.
Applications can use the same mechanism to get the data and the LOB. Remote access needs no separate file transfer for the LOB.

Conclusion

So basically you have to balance the advantages vs. the disadvantages of storing LOBs in the database and decided what arguments are more important in your case.

If you have some more good arguments pro or contra storing LOBs in the database please let me know.

Literature

Check also various articles on Google.

Nim: Repeat after me: MySQL is not a filesystem

Taxonomy upgrade extras: blob text lob design

Kategorien: FromDual all (en), FromDual TechFeed (en), MySQL Tech-Feed (en)

Storing BLOBs in the database

Shinguz - Fri, 2017-06-30 14:18

We have sometimes discussions with our customers whether to store LOBs (Large Objects) in the database or not. To not rephrase the arguments again and again I have summarized them in the following lines.

The following items are more or less valid for all large data types (BLOB, TEXT and theoretically also for JSON and GIS columns) stored in a MySQL or MariaDB (or any other relational) database.

The idea of a relational table based data-store is to store structured data (numbers, data and short character strings) to have a quick write and read access to them.

And yes, you can also store other things like videos, huge texts (PDF, emails) or similar in a RDBMS but they are principally not designed for such a job and thus non optimal for the task. Software vendors implement such features not mainly because it makes sense but because users want it and the vendors want to attract users (or their managers) with such features (USP, Unique Selling Proposition). Here also one of my Mantras: Use the right tool for the right task:

The main topics to discuss related to LOBs are: Operations, performance, economical reasons and technical limitations.

Disadvantages of storing LOBs in the database

The database will grow fast. Operations will become more costly and complicated.
Backup and restore will become more costly and complicated for the admin because of the increased size caused by LOBs.
Backup and restore will take longer because of the same reason.
Database and table management functions (OPTIMIZE, ALTER, etc.) will take longer on big LOB tables.
Smaller databases need less RAM/disk space and are thus cheaper.
Smaller databases fit better into your RAM and are thus potentially faster (RAM vs disk access).
RDBMS are a relatively slow technology (compared to others). Reading LOBs from the database is significantly slower than reading LOBs from a filer for example.
LOBs stored in the database will spoil your database cache (InnoDB Buffer Pool) and thus possibly slow down other queries (does not necessarily happen with more sophisticated RBDMS).
LOB size limitation of 1 Gbyte in reality (max_allowed_packet, theoretically limit is at 4 Gbyte) for MySQL/MariaDB.
Expensive, fast database store (RAID-10, SSD) is wasted for something which can be stored better on a cheap slow file store (RAID-5, HDD).
It is programmatically often more complicated to get LOBs from a database than from a filer (depends on your libraries).

Advantages of storing LOBs in the database

Atomicity between data and LOB is guaranteed by transactions (is it really in MySQL/MariaDB?).
There are no dangling links (reference from data to LOB) between data and LOB.
Data and LOB are from the same point in time and can be included in the same backup.

Conclusion

So basically you have to balance the advantages vs. the disadvantages of storing LOBs in the database and decided what arguments are more important in your case.

If you have some more good arguments pro or contra storing LOBs in the database please let me know.

Literature

Check also various articles on Google.

Nim: Repeat after me: MySQL is not a filesystem

Taxonomy upgrade extras: blob text lob design

Kategorien: FromDual all (en), FromDual TechFeed (en), MySQL Tech-Feed (en)

MySQL Enterprise Backup Incremental Cumulative and Differential Backup

Shinguz - Thu, 2017-05-11 17:20

Preparing the MySQL Enterprise Administrator Training I found that the MySQL Enterprise Backup Incremental Backup is not described very well. Thus I tried it out and wrote down this how-to:

Incremental Differential Backup

Full Backup mysqlbackup --user=root --backup-dir=/tape/full backup-and-apply-log grep end_lsn /tape/full/meta/backup_variables.txt end_lsn=2583666
Incremental Backups mysqlbackup --user=root --incremental-backup-dir=/tape/inc1 --start-lsn=2583666 --incremental backup grep end_lsn /tape/inc1/meta/backup_variables.txt end_lsn=2586138 mysqlbackup --user=root --incremental-backup-dir=/tape/inc2 --start-lsn=2586138 --incremental backup grep end_lsn /tape/inc2/meta/backup_variables.txt end_lsn=2589328 mysqlbackup --user=root --incremental-backup-dir=/tape/inc3 --start-lsn=2589328 --incremental backup grep end_lsn /tape/inc3/meta/backup_variables.txt end_lsn=2592519
Binary Log Backups cp /var/lib/binlog/binlog.* /tape/binlog/
Restore

This step will modify the original full backup!

mysqlbackup --incremental-backup-dir=/tape/inc1 --backup-dir=/tape/full apply-incremental-backup mysqlbackup --incremental-backup-dir=/tape/inc2 --backup-dir=/tape/full apply-incremental-backup mysqlbackup --incremental-backup-dir=/tape/inc3 --backup-dir=/tape/full apply-incremental-backup mysqlbackup --user=root --datadir=/var/lib/mysql --backup-dir=/tape/full copy-back
Point-in-Time-Recovery grep binlog_position /tape/inc3/meta/backup_variables.txt /tape/inc3/meta/backup_variables.txt:binlog_position=binlog.000001:7731 cd /tape/binlog mysqlbinlog --disable-log-bin --start-position=7731 binlog.000001 | mysql -uroot
Incremental Cumulative Backup

Full Backup mysqlbackup --user=root --backup-dir=/tape/full backup-and-apply-log grep end_lsn /tape/full/meta/backup_variables.txt end_lsn=2602954 Incremental Backups mysqlbackup --user=root --incremental-backup-dir=/tape/inc1 --start-lsn=2602954 --incremental backup mysqlbackup --user=root --incremental-backup-dir=/tape/inc2 --start-lsn=2602954 --incremental backup mysqlbackup --user=root --incremental-backup-dir=/tape/inc3 --start-lsn=2602954 --incremental backup
Binary Log Backups cp /home/mysql/database/mysql-5.7/binlog/binlog.* /tape/binlog/
Restore

This step will modify the original full backup!

mysqlbackup --incremental-backup-dir=/tape/inc3 --backup-dir=/tape/full apply-incremental-backup mysqlbackup --user=root --datadir=/var/lib/mysql --backup-dir=/tape/full copy-back
Point-in-Time-Recovery grep binlog_position /tape/*/meta/backup_variables.txt /tape/inc3/meta/backup_variables.txt:binlog_position=binlog.000001:7731 cd /tape/binlog mysqlbinlog --disable-log-bin --start-position=7731 binlog.000001 | mysql -uroot

I very much dislike that during my restore the backup is modified. So if I do a mistake during restore my backup is gone and I am doomed.

Taxonomy upgrade extras: Backup Restore MySQL Enterprise Backup enterprise incremental cumulative differential

Kategorien: FromDual all (en), FromDual TechFeed (en), MySQL Tech-Feed (en)

MySQL Enterprise Backup Incremental Cumulative and Differential Backup

Shinguz - Thu, 2017-05-11 17:20

Preparing the MySQL Enterprise Administrator Training I found that the MySQL Enterprise Backup Incremental Backup is not described very well. Thus I tried it out and wrote down this how-to:

Incremental Differential Backup

Full Backup mysqlbackup --user=root --backup-dir=/tape/full backup-and-apply-log grep end_lsn /tape/full/meta/backup_variables.txt end_lsn=2583666
Incremental Backups mysqlbackup --user=root --incremental-backup-dir=/tape/inc1 --start-lsn=2583666 --incremental backup grep end_lsn /tape/inc1/meta/backup_variables.txt end_lsn=2586138 mysqlbackup --user=root --incremental-backup-dir=/tape/inc2 --start-lsn=2586138 --incremental backup grep end_lsn /tape/inc2/meta/backup_variables.txt end_lsn=2589328 mysqlbackup --user=root --incremental-backup-dir=/tape/inc3 --start-lsn=2589328 --incremental backup grep end_lsn /tape/inc3/meta/backup_variables.txt end_lsn=2592519
Binary Log Backups cp /var/lib/binlog/binlog.* /tape/binlog/
Restore

This step will modify the original full backup!

mysqlbackup --incremental-backup-dir=/tape/inc1 --backup-dir=/tape/full apply-incremental-backup mysqlbackup --incremental-backup-dir=/tape/inc2 --backup-dir=/tape/full apply-incremental-backup mysqlbackup --incremental-backup-dir=/tape/inc3 --backup-dir=/tape/full apply-incremental-backup mysqlbackup --user=root --datadir=/var/lib/mysql --backup-dir=/tape/full copy-back
Point-in-Time-Recovery grep binlog_position /tape/inc3/meta/backup_variables.txt /tape/inc3/meta/backup_variables.txt:binlog_position=binlog.000001:7731 cd /tape/binlog mysqlbinlog --disable-log-bin --start-position=7731 binlog.000001 | mysql -uroot
Incremental Cumulative Backup

Full Backup mysqlbackup --user=root --backup-dir=/tape/full backup-and-apply-log grep end_lsn /tape/full/meta/backup_variables.txt end_lsn=2602954 Incremental Backups mysqlbackup --user=root --incremental-backup-dir=/tape/inc1 --start-lsn=2602954 --incremental backup mysqlbackup --user=root --incremental-backup-dir=/tape/inc2 --start-lsn=2602954 --incremental backup mysqlbackup --user=root --incremental-backup-dir=/tape/inc3 --start-lsn=2602954 --incremental backup
Binary Log Backups cp /home/mysql/database/mysql-5.7/binlog/binlog.* /tape/binlog/
Restore

This step will modify the original full backup!

mysqlbackup --incremental-backup-dir=/tape/inc3 --backup-dir=/tape/full apply-incremental-backup mysqlbackup --user=root --datadir=/var/lib/mysql --backup-dir=/tape/full copy-back
Point-in-Time-Recovery grep binlog_position /tape/*/meta/backup_variables.txt /tape/inc3/meta/backup_variables.txt:binlog_position=binlog.000001:7731 cd /tape/binlog mysqlbinlog --disable-log-bin --start-position=7731 binlog.000001 | mysql -uroot

I very much dislike that during my restore the backup is modified. So if I do a mistake during restore my backup is gone and I am doomed.

Taxonomy upgrade extras: Backup Restore MySQL Enterprise Backup enterprise incremental cumulative differential

Kategorien: FromDual all (en), FromDual TechFeed (en), MySQL Tech-Feed (en)

How the Lack of a Primary Key May Effectively Stop the Slave

Jörg Brühe - Tue, 2017-05-02 09:50

Most (relational) DBAs and DB application developers know the concept of a primary key ("PK") and what it is good for. However, much too often one still encounters table definitions without a PK. True, the relational theory based on sets does not need a PK, and all operations (insert, select, update, delete) can also be done on tables for which no PK was defined. If performance doesn't matter (or the data volume is small, a typical situation in tests), the lack of a PK does not immediately cause negative consequences.

But recently, we had several customers who (independent of each other) had big tables without a PK in a replication setup which they wanted to delete, and they all suffered severely from that: Their replication did not progress, the slave lag grew larger and larger, and all this without reporting any error. When I say "larger and larger", I mean it: I'm not talking about minutes or even hours, I'm talking about days!

Observations

In all cases, the situation could be summarized as:

They were running a traditional, asynchronous replication using the "row" format.
They had a table without PK with many entries, say a million rows.
This table wasn't needed any more, and someone issued a "delete from T" statement on the master.
From this moment, the slave did not show any sign of progress: The output of "show slave status \G"
- listed both the IO and the SQL slave thread as running,
- reported the same log positions without any change,
- did not mention any slave error,
- but the slave lag grew in sync with the passing time.

Looking at the slave's machine load, say using "vmstat", you could see a certain amount of CPU load in user mode, possibly some "waiting for IO", and some read IO (group "io", column "bi" = "block input"). Checking with "top", you could see that the MySQL server process was the only significant CPU load, and closer inspection would reveal that it used roughly one CPU core.

(A side remark: The numbers of "vmstat" and "top" are not directly comparable: While "vmstat" reports the overall CPU usage as a percentage of the total available CPU power, "top" reports the CPU usage of the individual processes as the percentage of a single CPU core. So if you had a 4-core machine with only one active process running an infinite loop, "top" would show this process as using 100%, while "vmstat" would report the CPU as running 25% in user mode and being 75% idle. If you don't know it, look at "/proc/cpuinfo" to find the number of CPU cores.)

What was the Issue? From "top", we could tell that the MySQL server was using one CPU core at full speed. From "vmstat", we could tell that it might access some data from disk, but didn't write any significant amount. This sure looked like an infinite loop or a close relative of it, executed by only one thread.

Some of you may have heard that (under some conditions) replication may do a full table scan: This is it!

When the slave has to apply a change (for this discussion: a delete or an update) which is provided in row format, and the table has no PK (or other suitable index), the SQL thread does a full table scan searching for the matching row. Note that it doesn't stop on the first match, but rather continues the scan throughout the rest of the table.

To be honest: I don't understand why it doesn't stop after the first match. I also ran a modified test where some master rows had two matches on the slave (I had duplicated them by "insert select"): The "delete" was still replicated as before, and these additional slave rows were still present after the delete.

Until now, you may consider that "full table scan" as a claim without proof, and I agree with all demands for a more thorough check.

The Experiment For tests, I have a setup of two VMs, each running a MySQL server process, and they are configured for a traditional master-slave replication. In fact, I have it for the versions MySQL 5.5.44, 5.6.24, and 5.7.17. (Yes, I might add newer versions.)

So I designed an experiment:

Start both master and slave, make sure replication is running.
On the master, create a table without any key / index, neither primary nor secondary: CREATE TABLE for_delete ( inc_num int, # ascending numbers 1 .. 10,000 repeated for larger tables clock timestamp, # now() counted: 129 different values with 200 k rows bubble char(250) # repeat('Abcd', 60) ) DEFAULT CHARACTER SET latin1 ;
Insert into that table the desired number of rows (I had runs with 10 k, 20 k, 100 k, and 200 k)
on both master and slave, run "SELECT ... FROM information_schema.global_status WHERE ..." to get status counters,
Delete all rows of that table
sleep for a time depending on the row count (you will later see why)
get new status counters
compute the status counter differences

This experiment reliably reproduced the symptoms which the customers had reported: When the "delete" had been done on the master, the slave became busy for a long time, depending on the row count, and in this time it did not show any progress indicators. This holds for all three versions I checked.

So what did the slave do? It worked as assumed: For each row, it did a full table scan, continuing even after it had found a matching row. This can be easily seen in the status counters, which you will find in the result tables below.

In this article, I will show the results of tests with 200 k rows only, as they are the most impressive. As described above, the table was created anew for each run. Then, 10,000 rows were inserted, and this was repeated 20 times. Each such group had the value of "inc_num" go up from 1 to 10,000.

In all runs, the status counters "HANDLER_DELETE", "INNODB_ROWS_DELETED", and "INNODB_ROWS_READ" showed the value of 200,000, the table size. So the handler calls did not show any scan, it must be below that interface (= within the handler).

If the server does a table scan, it must access all data pages of the table, so there must be read requests for them to guarantee they are in the buffer pool. And yes, the status counters showed these read requests: Table 1: No Key or Index

200,000 rows, status counters on Master Slave

Version 5.5.44

INNODB_BUFFER_POOL_READ_REQUESTS 2,174,121 380,132,972 Ratio "read_requests" Slave/Master
174.8 "read_requests" per "rows_deleted" 10.9 1,900.7

Version 5.6.24

INNODB_BUFFER_POOL_READ_REQUESTS 2,318,685 379,762,583 Ratio "read_requests" Slave/Master
163.8 "read_requests" per "rows_deleted" 11.6 1,898.8

Version 5.7.17

INNODB_BUFFER_POOL_READ_REQUESTS 1,598,745 379,538,695 Ratio "read_requests" Slave/Master
237.4 "read_requests" per "rows_deleted" 8.0 1,897.7

While master and slave (from a logical point of view) do the same work (delete all rows from a 200 k rows table), the slave accesses many more pages than the master - depending on the version, the factor ranges from 164 to 237. (Note that in version 5.7 the slave didn't do worse than in earlier versions, rather the master did better. This shows clearly in the ratio of "innodb buffer pool read requests" to "rows deleted" on the master: This is about 11 in 5.5 and 5.6, but only 8 in 5.7.)

I won't bother you with the detailed results for 100 k rows. It is enough to say that for half as many rows, the master had half as many read requests, while the slave had a quarter. This means that the effort on the master grows proportional to table size, but on the slave it grows with the square of the table size. Accepting the fact that the slave always scans the full table, this is easy to understand: If the table has twice as many rows, there are twice as many pages to scan, and the number of scans also doubles.

Quadratic growth means that an increase of the table size by a factor ten lets the effort grow by a factor hundred! Now imagine I had tested with a million of rows ...

All this isn't new: In the public MySQL bug database, this is listed as bug #53375. The bug was reported on May 3, 2010, for MySQL 5.1. The bug is closed since December 15, 2011. This does not mean the full table scans were stopped - obviously, they are considered unavoidable, so the bug fix is to write an explaining message into the error log. In my tests, I encountered it only once, here it is:

[Note] The slave is applying a ROW event on behalf of a DELETE statement on table for_delete and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning an index while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance. Avoiding the Issue

Of course, this problem is the consequence of sloppy table design: If there were a primary key, it would not occur. But this may not help those who currently have tables without a PK and dare not add one, because it would require prior testing and a maintenance window to alter the table on the master.

Luckily, there are remedies, and even more than the log message (quoted above) mentions: In my tests, I found that the slave will use any index on that table if one is available. (It even uses an index with low selectivity, which may increase the page request count.) I did 5 tests:

Create an index on "clock" (129 different values).
Create an index on "inc_num" (10,000 different values).
Create an index on both "clock" and "inc_num" (all combinations are distinct).
As before, and declare it as "unique".
Define the combination of "clock" and "inc_num" to be the primary key.

The "alter table" statement for this was always issued on the slave only, after the inserts, but before the delete.

In all three versions, the slave used any index available. The index on "clock" of course had many different rows per value, so it caused many even more page requests than the no-index case: between 2,800 and 3,165 per row deleted. Obviously, thios makes the problem even worse. But all other indexes were huge improvements, the primary key of course was the best choice. Here are the numbers: Table 2: With Added Key or Index

200,000 rows, status counters on Master Slave Slave Slave Slave Slave

"Alter table ..." on slave before "Delete":
Index on "clock" 129 values Index on "inc_num" 10,000 v. Index
on both Unique I.
on both Primary key on both Version 5.5.44

INNODB_BUFFER_POOL_READ_REQUESTS 2,174,121 633,132,032 10,197,554 4,429,243 4,421,357 1,810,734 Ratio "read_requests" Slave/Master
291.2 4.7 2.0 2.0 0.8 "read_requests" per "rows_deleted" 10.9 3,165.7 51.0 22.1 22.1 9.1

Version 5.6.24

INNODB_BUFFER_POOL_READ_REQUESTS 2,318,685 610,482,315 10,327,132 4,375,106 4,073,543 1,794,524 Ratio "read_requests" Slave/Master
342.1 5.0 2.1 2.0 0.9 "read_requests" per "rows_deleted" 11.6 3,052.4 51.6 21.9 20.4 9.0

Version 5.7.17

INNODB_BUFFER_POOL_READ_REQUESTS 1,598,745 559,398,633 9,531,444 3,859,343 3,836,390 1,720,731 Ratio "read_requests" Slave/Master
354.8 6.0 2.3 2.5 1.1 "read_requests" per "rows_deleted" 8.0 2,797.0 47.7 19.3 19.2 8.6

But one very important fact should be mentioned in addition to the numbers: The fact that it worked! By adding an index, I made the slave's schema differ from the master's. The primary key even totally changed the B-tree in which the rows are stored, and it made the slave drop the internal row ID which InnoDB had added on the master. Still, replication using the row format could handle these differences without problems.

The Way Out

So the good message is:

Even if you have a table without indexes or primary key, you can add these on the slave without breaking the replication.
If you suffer from slow replication on such a table, adding a good index or (even better) the PK will solve this problem.
In a replication setup, you can improve the schema on the slave and then do a failover, effectively improving it for all accesses without any maintenance window - just the short failover time.

This might help many DBAs who otherwise don't see a chance to improve a bad schema once it is used in production.

Take care!

Taxonomy upgrade extras: primary key

Kategorien: FromDual all (en), FromDual TechFeed (en)

How the Lack of a Primary Key May Effectively Stop the Slave

Jörg Brühe - Tue, 2017-05-02 09:50

Most (relational) DBAs and DB application developers know the concept of a primary key ("PK") and what it is good for. However, much too often one still encounters table definitions without a PK. True, the relational theory based on sets does not need a PK, and all operations (insert, select, update, delete) can also be done on tables for which no PK was defined. If performance doesn't matter (or the data volume is small, a typical situation in tests), the lack of a PK does not immediately cause negative consequences.

But recently, we had several customers who (independent of each other) had big tables without a PK in a replication setup which they wanted to delete, and they all suffered severely from that: Their replication did not progress, the slave lag grew larger and larger, and all this without reporting any error. When I say "larger and larger", I mean it: I'm not talking about minutes or even hours, I'm talking about days!

Observations

In all cases, the situation could be summarized as:

They were running a traditional, asynchronous replication using the "row" format.
They had a table without PK with many entries, say a million rows.
This table wasn't needed any more, and someone issued a "delete from T" statement on the master.
From this moment, the slave did not show any sign of progress: The output of "show slave status \G"
- listed both the IO and the SQL slave thread as running,
- reported the same log positions without any change,
- did not mention any slave error,
- but the slave lag grew in sync with the passing time.