You are here
Logging Galera Cluster conflicts
We typically suggest our customers to use our MySQL/Galera Cluster my.cnf configuration template to avoid MySQL configuration and performance problems.
And we are paranoid as well. Thus we enable all useful logging:
wsrep_log_conflicts = 1
But this has also some consequences of more visibility...
If you monitor carefully your Galera Cluster for example with the FromDual Performance Monitor for MySQL and MariaDB, you might probably see some strange values increasing from time to time:
mysql< SHOW GLOBAL STATUS LIKE 'wsrep_local_%r_s'; +---------------------------+-------+ | Variable_name | Value | +---------------------------+-------+ | wsrep_local_cert_failures | 42 | | wsrep_local_bf_aborts | 13 | +---------------------------+-------+
Those values are indicators that some transactions (Galera write sets) did to not succeed and were aborted by Galera. In this case the paranoid logging helps to find, what exactly was aborted and possibly helps to find out, if this can or should be fixed:
150410 1:44:18 [Note] WSREP: cluster conflict due to certification failure for threads: 150410 1:44:18 [Note] WSREP: Victim thread: THD: 151856, mode: local, state: executing, conflict: cert failure, seqno: 30399304 SQL: UPDATE login SET lTsexpire = UNIX_TIMESTAMP(NOW()) + lTimeout WHERE lSessionId = 'va3ta7besku82k56ncv3bnhlj5' *** Priority TRANSACTION: TRANSACTION 464359568, ACTIVE 0 sec starting index read mysql tables in use 1, locked 1 1 lock struct(s), heap size 360, 0 row lock(s) MySQL thread id 4, OS thread handle 0x7f1c0916c700, query id 8190690 Update_rows_log_event::find_row(30399302) *** Victim TRANSACTION: TRANSACTION 464359562, ACTIVE 0 sec mysql tables in use 1, locked 1 2 lock struct(s), heap size 360, 1 row lock(s), undo log entries 1 MySQL thread id 151856, OS thread handle 0x7f1c09091700, query id 8190614 172.20.100.11 sam_angiz query end UPDATE login SET lTsexpire = UNIX_TIMESTAMP(now()) + lTimeout WHERE lSessionId = 'va3ta7besku82k56ncv3bnhlj5' *** WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 835205 page no 3 n bits 72 index `PRIMARY` of table `fromdual`.`login` trx table locks 1 total table locks 2 trx id 464359562 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 150410 1:44:18 [Note] WSREP: cluster conflict due to high priority abort for threads: 150410 1:44:18 [Note] WSREP: Winning thread: THD: 4, mode: applier, state: executing, conflict: no conflict, seqno: 30399302 SQL: (null) 150410 1:44:18 [Note] WSREP: Victim thread: THD: 151856, mode: local, state: committing, conflict: no conflict, seqno: -1 SQL: UPDATE login SET lTsexpire = UNIX_TIMESTAMP(now()) + lTimeout WHERE lSessionId = 'va3ta7besku82k56ncv3bnhlj5'
In the above Galera conflict 2 login transactions where running at the same time. They both come with the same Session ID and want to update the expiry timestamp. Now how to solve or fix this:
- First check, if this table has a Primary Key (tables without a PK causes full table scans which can last for long time, increasing the chance for conflicts).
- Second check, if there is a (
UNIQUE
?) index on lSessionId. A missing index leads to full table scans which increases the chance for conflicts. - Third check WHY 2 logins from the same Session ID can arrive at the same time (within 1 second) on 2 different Galera nodes (Ajax requests, etc...). Try to avoid such situations.
- Shinguz's blog
- Log in or register to post comments
Comments
Galera and tables without a Primary Key
We finally got the results. The table does NOT have a Primary Key and/or an index on
lSessionId
:Please also read: Disadvantages of explicitly NOT using InnoDB Primary Keys and Limitations of MySQL.
The
alter_engine.php
script of our MySQL BasEnv (MyEnv) can help you finding such tables...Other example
Trx # 86153878218 exists in both Galera Cluster conflics which are 46 seconds apart (= long running transaction?). Table
`rt3`.`Tickets`
seems to be involved in both cases. One should investigate also in this query or transaction we cannot see here...