You are here

News

dbstat for MariaDB (and MySQL)

Shinguz - Thu, 2024-03-14 15:36
Table of contents

An idea that I have been thinking about for a long time and have now, thanks to a customer, finally tackled is dbstat for MariaDB/MySQL. The idea is based on sar/sysstat by Sebastien Godard:

sar - Collect, report, or save system activity information.

and Oracle Statspack:

Statspack is a performance tuning tool ... to quickly gather detailed analysis of the performance of that database instance.

Functionality of dbstat

Although we have had the performance schema for some time, it does not cover some points that we see as a problem in practice and that are requested by customers:

  • The table_size module collects data on the growth of tables. This allows statements to be made about the growth of individual tables, databases, future MariaDB Catalogs or the entire instance. This is interesting for users who are using multi-tenant systems or are otherwise struggling with uncontrolled growth.
  • The processlist module takes a snapshot of the process list at regular intervals and saves it. This information is useful for post-mortem analyses if the user was too slow to save his process list or to understand how a problem has built up.
  • The problem is often caused by long-running transactions, row locks or metadata locks. These are recorded and saved by the trx_and_lck and metadata_lock modules. This means that we can see problems that we did not even notice before or we can see what led to the problem after the accident (analogous to a tachograph in a vehicle).
  • Another question that we sometimes encounter in practice is: When was which database variable changed and what did it look like before? This is covered by the global_variables module. Unfortunately, it is not possible to find out who changed the variable or why. Operational processes are required for this.
  • The last module, global_status, actually covers what sar/sysstat does. It collects the values from SHOW GLOBAL STATUS; and saves them for later analysis purposes or to simply create graphs.

How does dbstat work

dbstat uses the database Event Scheduler as a scheduler. This must first be switched on for MariaDB (event_scheduler = ON). With MySQL it is already switched on by default. The Event Scheduler has the advantage that we can activate the jobs at a finer granularity, for example 10 s, which would not be possible with the crontab.

The Event Scheduler then executes SQL/PSM code to collect the data on the one hand and to delete the data on the other, so that the dbstat database does not grow immeasurably.

The following jobs are currently planned:

ModuleCollectDeleteQuantity structureRemarks table_size1/d at 02:0412/h, 1000 rows, > 31 d1000 tab × 31 d = 31k rowsShould work up to 288k tables. processlist1/min1/min, 1000 rows, > 7 d1000 con × 1440 min × 7 d = 10M rowsShould work up to 1000 concurrent connections. trx_and_lck1/min1/min, 1000 rows, > 7 d100 lck × 1440 min × 7 d = 1M rowsDepends very much on the application. metadata_lock1/min12/h, 1000 rows, > 30 d100 mdl × 1440 × 30 d = 4M rowsDepends very much on the application. global_variables1/minnever1000 rowsNormally this table should not grow. global_status1/min1/min, 1000 rows, > 30 d1000 rows × 1440 × 30 d = 40MRows Can become large?
How to install dbstat

dbstat can be downloaded from Github and is licensed under GPLv2.

The installation is simple: First execute the SQL file create_user_and_db.sql. Then execute the corresponding create_*.sql files for the respective modules in the dbstat database. There are currently no direct dependencies between the modules. If you want to use a different user or a different database than dbstat, you have to take care of this yourself.

Query dbstat

Some possible queries on the data have already been prepared. They can be found in the query_*.sql files. Here are a few examples:

table_size SELECT `table_schema`, `table_name`, `ts`, `table_rows`, `data_length`, `index_length` FROM `table_size` WHERE `table_catalog` = 'def' AND `table_schema` = 'dbstat' AND `table_name` = 'table_size' ORDER BY `ts` ASC ; +--------------+------------+---------------------+------------+-------------+--------------+ | table_schema | table_name | ts | table_rows | data_length | index_length | +--------------+------------+---------------------+------------+-------------+--------------+ | dbstat | table_size | 2024-03-09 20:01:00 | 0 | 16384 | 16384 | | dbstat | table_size | 2024-03-10 17:26:33 | 310 | 65536 | 16384 | | dbstat | table_size | 2024-03-11 08:28:12 | 622 | 114688 | 49152 | | dbstat | table_size | 2024-03-12 08:02:38 | 934 | 114688 | 49152 | | dbstat | table_size | 2024-03-13 08:08:55 | 1247 | 278528 | 81920 | +--------------+------------+---------------------+------------+-------------+--------------+
processlist SELECT connection_id, ts, time, state, SUBSTR(REGEXP_REPLACE(REPLACE(query, "\n", ' '), '\ +', ' '), 1, 64) AS query FROM processlist WHERE command != 'Sleep' AND connection_id = @connection_id ORDER BY ts ASC LIMIT 5 ; +---------------+---------------------+---------+---------------------------------+---------------------------------------------+ | connection_id | ts | time | state | query | +---------------+---------------------+---------+---------------------------------+---------------------------------------------+ | 14956 | 2024-03-09 20:21:12 | 13.042 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | | 14956 | 2024-03-09 20:22:12 | 73.045 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | | 14956 | 2024-03-09 20:23:12 | 133.044 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | | 14956 | 2024-03-09 20:24:12 | 193.044 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | | 14956 | 2024-03-09 20:25:12 | 253.041 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | +---------------+---------------------+---------+---------------------------------+---------------------------------------------+
trx_and_lck SELECT * FROM trx_and_lck\G *************************** 1. row *************************** machine_name: connection_id: 14815 trx_id: 269766 ts: 2024-03-09 20:05:57 user: root host: localhost db: test command: Query time: 41.000 running_since: 2024-03-09 20:05:16 state: Statistics info: select * from test where id = 6 for update trx_state: LOCK WAIT trx_started: 2024-03-09 20:05:15 trx_requested_lock_id: 269766:821:5:7 trx_tables_in_use: 1 trx_tables_locked: 1 trx_lock_structs: 2 trx_rows_locked: 1 trx_rows_modified: 0 lock_mode: X lock_type: RECORD lock_table_schema: test lock_table_name: test lock_index: PRIMARY lock_space: 821 lock_page: 5 lock_rec: 7 lock_data: 6 *************************** 2. row *************************** machine_name: connection_id: 14817 trx_id: 269760 ts: 2024-03-09 20:05:57 user: root host: localhost db: test command: Sleep time: 60.000 running_since: 2024-03-09 20:04:57 state: info: trx_state: RUNNING trx_started: 2024-03-09 20:04:56 trx_requested_lock_id: NULL trx_tables_in_use: 0 trx_tables_locked: 1 trx_lock_structs: 2 trx_rows_locked: 1 trx_rows_modified: 1 lock_mode: X lock_type: RECORD lock_table_schema: test lock_table_name: test lock_index: PRIMARY lock_space: 821 lock_page: 5 lock_rec: 7 lock_data: 6
metadata_lock SELECT lock_mode, ts, user, host, lock_type, table_schema, table_name, time, started, state, query FROM metadata_lock WHERE connection_id = 14347 ORDER BY started DESC LIMIT 5 ; +-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+ | lock_mode | ts | user | host | lock_type | table_schema | table_name | time | started | state | query | +-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+ | MDL_SHARED_WRITE | 2024-03-13 10:27:33 | root | localhost | Table metadata lock | test | test | 1.000 | 2024-03-13 10:27:32 | Updating | UPDATE test set data3 = MD5(id) | | MDL_BACKUP_TRANS_DML | 2024-03-13 10:27:33 | root | localhost | Backup lock | | | 1.000 | 2024-03-13 10:27:32 | Updating | UPDATE test set data3 = MD5(id) | | MDL_BACKUP_ALTER_COPY | 2024-03-13 10:22:33 | root | localhost | Backup lock | | | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) | | MDL_SHARED_UPGRADABLE | 2024-03-13 10:22:33 | root | localhost | Table metadata lock | test | test | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) | | MDL_INTENTION_EXCLUSIVE | 2024-03-13 10:22:33 | root | localhost | Schema metadata lock | test | | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) | +-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+
global_variables SELECT variable_name, COUNT(*) AS cnt FROM global_variables GROUP BY variable_name HAVING COUNT(*) > 1 ; +-------------------------+-----+ | variable_name | cnt | +-------------------------+-----+ | innodb_buffer_pool_size | 7 | +-------------------------+-----+ SELECT variable_name, ts, variable_value FROM global_variables WHERE variable_name = 'innodb_buffer_pool_size' ; +-------------------------+---------------------+----------------+ | variable_name | ts | variable_value | +-------------------------+---------------------+----------------+ | innodb_buffer_pool_size | 2024-03-09 21:36:28 | 134217728 | | innodb_buffer_pool_size | 2024-03-09 21:40:25 | 268435456 | | innodb_buffer_pool_size | 2024-03-09 21:48:14 | 134217728 | +-------------------------+---------------------+----------------+
global_status SELECT s1.ts , s1.variable_value AS 'table_open_cache_misses' , s2.variable_value AS 'table_open_cache_hits' FROM global_status AS s1 JOIN global_status AS s2 ON s1.ts = s2.ts WHERE s1.variable_name = 'table_open_cache_misses' AND s2.variable_name = 'table_open_cache_hits' AND s1.ts BETWEEN '2024-03-13 11:55:00' AND '2024-03-13 12:05:00' ORDER BY ts ASC ; +---------------------+-------------------------+-----------------------+ | ts | table_open_cache_misses | table_open_cache_hits | +---------------------+-------------------------+-----------------------+ | 2024-03-13 11:55:47 | 1001 | 60711 | | 2024-03-13 11:56:47 | 1008 | 61418 | | 2024-03-13 11:57:47 | 1015 | 62125 | | 2024-03-13 11:58:47 | 1022 | 62829 | | 2024-03-13 11:59:47 | 1029 | 63533 | | 2024-03-13 12:00:47 | 1036 | 64237 | | 2024-03-13 12:01:47 | 1043 | 64944 | | 2024-03-13 12:02:47 | 1050 | 65651 | | 2024-03-13 12:03:47 | 1057 | 66355 | | 2024-03-13 12:04:47 | 1064 | 67059 | +---------------------+-------------------------+-----------------------+
Testing

We have currently rolled out dbstat on our test and production systems to test it and see whether our assumptions regarding stability and calculations of the quantity structure are correct. In addition, using it ourselves is the best way to find out if something is missing or if the handling is impractical (Eat your own dog food).

Sources
Taxonomy upgrade extras: performancemonitoringperformance monitoringmetadata locklocklockingperformance_schema

MariaDB/MySQL Environment MyEnv 2.1.0 has been released

Shinguz - Wed, 2024-02-28 17:22

FromDual has the pleasure to announce the release of the new version 2.1.0 of its popular MariaDB, Galera Cluster and MySQL multi-instance environment MyEnv.

The new MyEnv can be downloaded here. How to install MyEnv is described in the MyEnv Installation Guide.

In the inconceivable case that you find a bug in the MyEnv please report it to the FromDual bug tracker.

Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.

Upgrade from 1.1.x to 2.0

Please look at the MyEnv 2.0.0 Release Notes.

Upgrade from 2.0.x to 2.1.0 shell> cd ${HOME}/product shell> tar xf /download/myenv-2.1.0.tar.gz shell> rm -f myenv shell> ln -s myenv-2.1.0 myenv
Plug-ins

If you are using plug-ins for showMyEnvStatus create all the links in the new directory structure:

shell> cd ${HOME}/product/myenv shell> ln -s ../../utl/oem_agent.php plg/showMyEnvStatus/
Upgrade of the instance directory structure

From MyEnv 1.0 to 2.0 the directory structure of instances has fundamentally changed. Nevertheless MyEnv 2.0 works fine with MyEnv 1.0 directory structures.

Changes in MyEnv 2.1.0 MyEnv
  • Removed hard coded parts for running MyEnv under O/S user mariadb.
  • Function substitute_path was refactored.
  • Branch guessing improved.
  • Warnings and errors are in color now.
  • MyEnv log file is now touched to avoid problems with O/S user root.
  • O/S user mysql removed in start/stop script.
  • Checks for DB start improved.
  • /var/run replaced by the more modern location /run.
  • Should now be completely MariaDB compatible (mariadbd vs. mysqld).
  • Wrapper mysqld_safe was extended to mariadbd-safe.
  • Replaced getVersionFromMysqld by getVersionAndBranchFromDaemon and extended functionality of this function.
  • LD_LIBRARY_PATH was set to the wrong directory.
  • Reverting Commit: fcc93c5 from v2.0.3 related to CDPATH. Break commands like cd log or cd etc.
  • Database mysql_innodb_cluster_metadata is hidden now.
  • Database #innodb_redo is suppressed now as well for MySQL 8.0, and hideschema is not added to every new instance any more to not overwrite the default.
  • Bug while stopping instance with missing my.cnf fixed.
  • Function getDistribution cleaned-up.
  • MySQL should now also be detected correctly from Ubuntu repository.
  • Function my_exec rewritten.
  • Debian GNU/Linux tag added for distros.
  • Function extractBranch made better to work on Debian and Ubuntu with distribution packages.
  • Oracle Linux is considered as well now.
  • Made scripts ready for new MariaDB behaviour.
  • my.cnf template adapted to newest knowledge.
  • Directory changed from /tmp to /var/tmp, code cleaned-up and renewal, PID file code and message improved in stopInstance.
  • Distributions cleaned-up and cloudlinux, rocky linux and almalinux added as centos compatible distros.

MyEnv Installer
  • Debian 10 and 11 do not support PHP 8.0 yet, fixed.
  • Unit file is copied now correctly.
  • MyEnv instance installation is automatizable now.
  • Instance creation automation added.
  • my.cnf template together with installMyenv should now work without errors or warnings for MariaDB 10.5 - 11.2 and MySQL 8.0 - 8.3.
  • Command yum replaced by dnf.
  • Command apt-get comments replaced by apt.

MyEnv Utilities
  • Client utility adapted in *monitor scripts.
  • InnoDB cluster monitor added.
  • wsrep_last_committed was added in galera_monitor.sh.
  • AWR added, sharding stuff added, lock and trx analysis scripts added.
  • Memory analysis added, NUMA maps output made ready for new variables.
  • connect_maxout utility added.

For subscriptions of commercial use of MyEnv please get in contact with us.

Taxonomy upgrade extras: MyEnvmulti-instancevirtualizationconsolidationSaaSOperationsreleasemysqld_multi

We build a data warehouse from the General Query Log

Shinguz - Wed, 2024-01-31 16:41

The design of a data warehouse differs from relational design. Data warehouses are often designed according to the concept of the star schema.

When building a data warehouse, you usually put the cart before the horse:

  • What questions should my data warehouse be able to answer?
  • How do I have to design my model so that my questions can be answered easily?
  • Where do I get the data to populate the model?
  • How do I fill my model with the data?

For training purposes, we have investigated an issue that arises from time to time with our support team: The system suddenly and unexpectedly starts to behave unusually, nobody has done anything and nobody knows why. Example with a customer last week: The system starts to become unstable at 3 pm, is then restarted hard and then stabilises again from 4 pm...

The easiest thing to do in such a case would be to quickly look at the database with the SHOW PROCESSLIST command and then it often becomes immediately clear where the problem lies. But customers often forget this or they are not fast enough. The General Query Log was already switched on for this customer, so this would be a great case for our General Query Log Data Warehouse!

What questions should my data warehouse be able to answer?

The generic question for this problem should be something like: "Who or what caused my system to behave abnormally?"

In technical terms, the question would be something like:

  • Who: Which user or account was on the database with how many connections at the time in question? What was unusual about it?
  • What: Which queries were running in which schema on the system at the time in question? Which of these queries were unusual?

What should my model look like?

We can already derive some facts and dimensions from the question:

  • User or account (user + host)
  • Time
  • Connections
  • Schema
  • Queries

And this also results in 4 dimensions and the fact table:

Data source

Where the data comes from is relatively easy to answer in this case: The customer provides his General Query Logs or you can also use the General Query Logs of our own systems for testing purposes.

How is the model populated?

Technically, this is known as an ETL process (Extract-Transform-Load). In our case, we have built a General Query Log parser that reads the General Query Log, prepares the data accordingly and saves it in the model.

Checking the model

And then we come to checking the model. We used test data from one of our systems for this:

  • Which user was on the system at the time in question?
  • Which user had how many connections open at the time in question?

SELECT td.time, cd.user, COUNT(*) AS count FROM connection_dim cd JOIN query_fact AS qf ON qf.connection_id = cd.connection_id JOIN time_dim AS td ON td.time_id = qf.time_id JOIN date_dim AS dd ON dd.date_id = qf.date_id WHERE td.time BETWEEN '17:00' AND '18:30' AND dd.date = '2019-08-02' GROUP BY td.time, cd.user ORDER BY td.time ASC, cd.user ; +----------+---------------+-------+ | time | user | count | +----------+---------------+-------+ | 17:58:00 | UNKNOWN USER | 1 | | 17:59:00 | brman | 58 | | 17:59:00 | brman_catalog | 18 | | 17:59:00 | root | 5 | | 18:00:00 | brman | 296 | | 18:00:00 | brman_catalog | 7 | | 18:00:00 | root | 3 | | 18:01:00 | brman_catalog | 18 | | 18:01:00 | root | 3 | | 18:06:00 | brman | 266 | | 18:06:00 | brman_catalog | 6 | | 18:07:00 | brman | 88 | | 18:07:00 | brman_catalog | 7 | | 18:10:00 | brman | 211 | | 18:10:00 | brman_catalog | 18 | | 18:10:00 | root | 4 | | 18:11:00 | brman | 141 | | 18:11:00 | root | 3 | | 18:13:00 | brman | 4 | | 18:14:00 | brman | 348 | | 18:17:00 | brman | 354 | | 18:17:00 | brman_catalog | 12 | | 18:17:00 | root | 1 | +----------+---------------+-------+
  • Which account was on the system at the time in question?
  • Which account had how many connections open at the time in question?

SELECT td.time, cd.user, cd.hostname, COUNT(*) AS count FROM connection_dim cd JOIN query_fact AS qf ON qf.connection_id = cd.connection_id JOIN time_dim AS td ON td.time_id = qf.time_id JOIN date_dim AS dd ON dd.date_id = qf.date_id WHERE td.time BETWEEN '17:00' AND '18:30' AND dd.date = '2019-08-02' GROUP BY td.time, cd.user, cd.hostname ORDER BY td.time ASC, cd.user ; +----------+---------------+--------------+-------+ | time | user | hostname | count | +----------+---------------+--------------+-------+ | 17:58:00 | UNKNOWN USER | UNKNOWN HOST | 1 | | 17:59:00 | brman | localhost | 58 | | 17:59:00 | brman_catalog | localhost | 18 | | 17:59:00 | root | localhost | 5 | | 18:00:00 | brman | localhost | 296 | | 18:00:00 | brman_catalog | localhost | 7 | | 18:00:00 | root | localhost | 3 | | 18:01:00 | brman_catalog | localhost | 18 | | 18:01:00 | root | localhost | 3 | | 18:06:00 | brman | localhost | 266 | | 18:06:00 | brman_catalog | localhost | 6 | | 18:07:00 | brman | localhost | 88 | | 18:07:00 | brman_catalog | localhost | 7 | | 18:10:00 | brman | localhost | 211 | | 18:10:00 | brman_catalog | localhost | 18 | | 18:10:00 | root | localhost | 4 | | 18:11:00 | brman | localhost | 141 | | 18:11:00 | root | localhost | 3 | | 18:13:00 | brman | localhost | 4 | | 18:14:00 | brman | localhost | 348 | | 18:17:00 | brman | localhost | 354 | | 18:17:00 | brman_catalog | localhost | 12 | | 18:17:00 | root | localhost | 1 | +----------+---------------+--------------+-------+
  • What was unusual about it?

SELECT cd.user, td.time, COUNT(*) AS count FROM connection_dim cd JOIN query_fact AS qf ON qf.connection_id = cd.connection_id JOIN time_dim AS td ON td.time_id = qf.time_id JOIN date_dim AS dd ON dd.date_id = qf.date_id WHERE td.time BETWEEN '17:00' AND '18:30' AND dd.date = '2019-08-02' GROUP BY td.time, cd.user ORDER BY cd.user ASC, td.time ASC ; +---------------+----------+-------+ | user | time | count | +---------------+----------+-------+ | brman | 17:59:00 | 58 | | brman | 18:00:00 | 296 | | brman | 18:06:00 | 266 | | brman | 18:07:00 | 88 | | brman | 18:10:00 | 211 | | brman | 18:11:00 | 141 | | brman | 18:13:00 | 4 | | brman | 18:14:00 | 348 | | brman | 18:17:00 | 354 | | brman_catalog | 17:59:00 | 18 | | brman_catalog | 18:00:00 | 7 | | brman_catalog | 18:01:00 | 18 | | brman_catalog | 18:06:00 | 6 | | brman_catalog | 18:07:00 | 7 | | brman_catalog | 18:10:00 | 18 | | brman_catalog | 18:17:00 | 12 | | root | 17:59:00 | 5 | | root | 18:00:00 | 3 | | root | 18:01:00 | 3 | | root | 18:10:00 | 4 | | root | 18:11:00 | 3 | | root | 18:17:00 | 1 | | UNKNOWN USER | 17:58:00 | 1 | +---------------+----------+-------+

One could deduce here, for example, that the user brman had a relatively large number of open connections during the period in question. Whether this is unusual, we have too little data or the time period is too short.

  • Which queries were running on the system at the time in question and in which schema?
  • Which of these queries were unusual?

SELECT sd.schema_name, td.time, SUBSTR(std.statement_text, 1, 128) AS query FROM query_fact AS qf JOIN time_dim AS td ON td.time_id = qf.time_id JOIN schema_dim AS sd ON sd.schema_id = qf.schema_id JOIN statement_dim AS std ON std.statement_id = qf.statement_id WHERE td.time BETWEEN '17:00' AND '18:30' AND sd.schema_name = 'brman_catalog' AND std.command = 'Query' ORDER BY td.time, qf.statement_id LIMIT 10 ; +---------------+----------+----------------------------------------------------------------------------------------------------------------------------------+ | schema_name | time | query | +---------------+----------+----------------------------------------------------------------------------------------------------------------------------------+ | brman_catalog | 17:59:00 | SET NAMES `utf8` | | brman_catalog | 17:59:00 | SELECT COUNT ( * ) AS `cnt` FROM `information_schema` . `tables` WHERE `table_schema` = ? AND TABLE_NAME = ? | | brman_catalog | 17:59:00 | SELECT COUNT ( * ) AS `cnt` FROM `information_schema` . `tables` WHERE `table_schema` = ? AND TABLE_NAME = ? | | brman_catalog | 17:59:00 | CREATE TABLE `metadata` ( `id` TINYINT UNSIGNED NOT NULL AUTO_INCREMENT , `key` VARCHARACTER (?) NOT NULL , `value` VARCHARACTER | | brman_catalog | 17:59:00 | INSERT INTO `metadata` ( `key` , `value` ) VALUES (...) | | brman_catalog | 17:59:00 | INSERT INTO `metadata` ( `key` , `value` ) VALUES (...) | | brman_catalog | 17:59:00 | CREATE TABLE `backups` ( `id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT , `instance_name` VARCHARACTER (?) NOT NULL , `start_ts` | | brman_catalog | 17:59:00 | CREATE TABLE `backup_details` ( `backup_id` INTEGER UNSIGNED NOT NULL , `hostname` VARCHARACTER (?) NULL , `binlog_file` VARCHAR | | brman_catalog | 17:59:00 | CREATE TABLE `files` ( `id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT , `schema_name` VARCHARACTER (?) NULL , `original_name` VAR | | brman_catalog | 17:59:00 | CREATE TABLE `binary_logs` ( `id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT , `filename` VARCHARACTER (?) NOT NULL , `begin_ts` I | +---------------+----------+----------------------------------------------------------------------------------------------------------------------------------+
Suggestions for improvement

Based on this first iteration of the model, you can already see which questions the model cannot yet answer or where the model is too imprecise. This can then be improved in a second round....

Examples of this are:

  • The granularity of the time dimension may be too coarse with an accuracy of minutes. Would it make more sense to use seconds?
  • The question of how long a connection was open is not so easy to answer. Perhaps a further fact table would be appropriate here? SELECT cd.connection_number, cd.user, cd.hostname, tdf.time AS time_from, tdt.time AS time_to, (UNIX_TIMESTAMP(tdt.time) - UNIX_TIMESTAMP(tdf.time)) AS duration FROM connection_dim AS cd JOIN query_fact AS qf1 ON cd.connection_id = qf1.connection_id JOIN time_dim AS tdf ON tdf.time_id = qf1.time_id JOIN statement_dim AS sdf ON sdf.statement_id = qf1.statement_id JOIN query_fact AS qf2 ON cd.connection_id = qf2.connection_id JOIN time_dim AS tdt ON tdt.time_id = qf2.time_id JOIN statement_dim AS sdt ON sdt.statement_id = qf2.statement_id WHERE tdf.time BETWEEN '17:00' AND '18:30' AND sdf.command = 'Connect' AND sdt.command = 'Quit' AND (UNIX_TIMESTAMP(tdt.time) - UNIX_TIMESTAMP(tdf.time)) > 0 ORDER BY tdf.time ; +-------------------+-------+-----------+-----------+----------+----------+ | connection_number | user | hostname | time_from | time_to | duration | +-------------------+-------+-----------+-----------+----------+----------+ | 211 | brman | localhost | 17:59:00 | 18:00:00 | 60 | | 215 | root | localhost | 18:00:00 | 18:17:00 | 1020 | | 219 | brman | localhost | 18:06:00 | 18:07:00 | 60 | | 225 | brman | localhost | 18:10:00 | 18:11:00 | 60 | | 226 | brman | localhost | 18:13:00 | 18:14:00 | 60 | +-------------------+-------+-----------+-----------+----------+----------+
  • Of course, it would be exciting if an AI were used to solve the problem. How do you train it correctly and does it find the problem once it has been trained?

So much for the little gimmick of building a data warehouse...

Taxonomy upgrade extras: data warehousegeneral query log

FromDual Performance Monitor for MariaDB 2.1.0 has been released

Shinguz - Sat, 2023-12-09 18:27

FromDual has the pleasure to announce the release of the new version 2.1.0 of its popular Database Performance Monitor for MariaDB and Galera Cluster fpmmm.

The FromDual Performance Monitor for MariaDB (fpmmm) enables DBAs and System Administrators to monitor and understand what is going on inside their MariaDB database instances and on the machines where the databases reside.

More detailed information you can find in the fpmmm Installation Guide.

Download

The new FromDual Performance Monitor for MariaDB (fpmmm) can be downloaded from here or you can use our FromDual repositories. How to install and use fpmmm is documented in the fpmmm Installation Guide.

In the inconceivable case that you find a bug in the FromDual Performance Monitor for MariaDB please report it to the FromDual Bug-tracker or just send us an email.

Any feedback, statements and testimonials are welcome as well! Please send them to us.

Monitoring as a Service (MaaS)

You do not want to set-up your Database monitoring yourself? No problem: Choose our MariaDB Monitoring as a Service (Maas) program to safe time and costs!

Installation of Performance Monitor 2.1.0

A complete guide on how to install FromDual Performance Monitor you can find in the fpmmm Installation Guide.

Upgrade of fpmmm tarball from 1.x to 2.1.0

There are some changes in the configuration file (fpmmm.conf):

  • The access rights should be change as follows: chmod 600 /etc/fpmmm.conf
  • The key Methode was spelled wrong in the configuration file. It was renamed to Method.
  • The key PidFile is ambiguous which could lead to problems and bugs. Thus it was changed to either MyPidFile for fpmmm and DbPidFile for the database.

Upgrade with DEB/RPM packages should happen automatically. For tarballs follow this:

shell> cd /opt shell> tar xf /download/fpmmm-2.1.0.tar.gz shell> rm -f fpmmm shell> ln -s fpmmm-2.1.0 fpmmm
Changes in FromDual Performance Monitor for MariaDB 2.1.0

This release contains new features and various bug fixes.

You can verify your current FromDual Performance Monitor for MariaDB version with the following command:

shell> /opt/fpmmm/bin/fpmmm --version
General
  • Comments and some more debugging information added.
  • All needed logging added for tracking down the problem of bad stat output.
  • New variables fixed.
  • Error messages improved.
  • Fix error output.
  • Version 10.11.1 was not split up correctly, fixed.
  • myEnv updated to fix bad MySQL detection on Ubuntu from repository.
  • Old database version error fixed if connection to API does not work.
  • Disable smart module by default to avoid error messages.
  • Removed /etc/fpmmm folder everywhere.
  • rc made unique, tests fixed.
  • Default path locations brought up to date.
  • DbPidFile location is new under /run.
  • All tests for MariaDB 10.11 passed.
  • CacheFileBase bug fixed.
  • FreeDSx/SNMP upgraded from 0.4 to 0.5.
  • DB connection handling improved and made more OO style.
  • Configuration file parser made more stable for syntax errors.
  • Error log logged to systemd message improved.
  • Option --version is now done before check options.
  • PHP requirement version specified.

Templates
  • Templates improved.
  • Link for triggers fixed.
  • Working period added to all graphs.
  • Zabbix 6 templates added to Makefile.
  • Zabbix 6.0 templates added (6.0.21) and renamed.

Agent
  • All variables from templates removed, test added for fpmmm MRRELEASE.
  • Fix MRRLEASE tag in fpmmm template fixed again.
  • Function writeDataToCache improved for tracking customer problems.
  • CacheFile is now protected with flock, this should resolve issues with lost brman items.
  • fpmmm version problem error message improved.
  • Disabled = True is not recognized correctly and no error was thrown. This is fixed now.
  • Messages were not handled correctly with SNMP output. This is fixed now.
  • mdstat message removed from error log.
  • Make error messages around sending data nicer.
  • apt-get/yum messages replaced by dnf/apt.
  • Error message made more clear if php-cli package is missing.

Server
  • Trigger too many filesystem locks set from MULTIPLE to SINGLE, threshold increased from 10000 to 16384 (mariadb MaxNOFiles) and message improved.
  • Working time added to server graphs.
  • iostat items remove from server template.
  • Available disks are now reported with space in between.
  • Disk sda5 removed from template, network interfaces enp4s0f1 and bond0 added to template, disks vdb1 and md1 added to template.
  • CPU usage details removed, guest_nice item added.
  • Swapping items added.
  • Typo in host screen fixed.

Galera
  • Galera group replication latency added.
  • In addition to old wsrep_causal_reads the new wsrep_sync_wait variable was added including the trigger.
  • Non Galera node is not detected as such and gives ugly error message, fixed.

InnoDB
  • Item innodb log write requests and innodb log writes fixed in innodb template.
  • Item InnoDB Trx Log bytes written renamed to InnoDB Log bytes written, graph InnoDB Log Activity removed because it is redundant now.
  • Item innodb_redo_log_capacity added for MySQL 8.0.
  • InnoDB buffer pool wait free trigger has wrong filter for item. fixed.
  • InnoDB Buffer Pool wait for free pages trigger added link to item value.
  • innodb_file_format also removed from template.
  • InnoDB buffer pool wait free trigger added.
  • FromDual.MySQL.innodb.Innodb_data_pending_fsyncs changed from absolute values to change_per_second to make graph useful.
  • Links fixed in triggers for innodb module.

MySQL
  • DB is soon out of support message downgraded from average to warning.
  • Item name fixed.
  • Link for table open cache trigger adjusted.
  • Trigger for mysql/mariadb support ends was changed from multiple to single to reduce noise.
  • com_call_procedure status counter fixed in module and template.
  • storage_engine item remove from template, processlist item waiting for table level lock fixed.
  • TOC was updated in template and improved and cleaned-up.
  • Unlock table item is not collected any more and trigger was removed, caused useless alerts.
  • Modern TOC handling implemented.

Process
  • Process module refactored, more logging and tests added.
  • Bug in process module fixed: /proc/PID/stat was not parsed correctly.

Security
  • Links fixed in triggers for security module.

Master
  • Binlog event count and binlog avg event size removed from master template because we cannot calculate those values.

Backup
  • Backup template duration URL fixed.

Packaging
  • Packages added for Debian 10, Debian 11, Debian 12, Ubuntu 20.04, Ubuntu 22.04, Redhat 8 and Redhat 9.
  • Package bug with fpmmm.ini fixed.
  • Debian build version increased and package build config error fixed again.
  • Debian package revision introduced.
  • Bug in config copy during postinst fixed.
  • Package installation error overwriting fpmmm.conf fixed.

For subscriptions of commercial use of fpmmm please get in contact with us.

Taxonomy upgrade extras: performancemonitormonitoringfpmmmmaasreleaseobservation

InnoDB Deadlock on SELECT? Not possible! Or Is It?

Shinguz - Sat, 2023-11-25 16:58
Introduction

Two points in advance:

  1. A deadlock is a state in which two different transactions are no longer able to continue working because each transaction holds a lock that the other transaction would need. Because both transactions are now waiting for the other transaction to release their locks, neither transaction will release their respective locks. And that would last forever. To avoid this, the MariaDB instance intervenes and kills the transaction that has done less work. The application then receives a deadlock error message of this type:
    ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
  2. A general mantra in the MariaDB ecosystem is that a SELECT does not cause locks (exception: FOR UPDATE or LOCK IN SHARE MODE) and therefore cannot be part of a deadlock.

The problem

A long-standing customer comes to the FromDual remote DBA team with a request to explain a deadlock situation:

Hello FromDual Team,
I need your expertise on the subject of deadlocks.
When would it suit you?


The situation is as follows: Transaction 1 consists of a simple INSERT. Transaction 2 consists of a SELECT. This should NOT actually cause a deadlock!

We first check the following points

  • Are all tables affected by these queries properly indexed? Yes, they are. The queries are all running perfectly!
  • Is the SELECT query possibly part of a larger transaction (NOT an auto-commit transaction) and therefore not the actual cause of the deadlock? No, it is not. They are auto-commit transactions.

What now? What else needs to be said for clarification: The SELECT is sent with a very high cadence, i.e. approx. every 5 ms!

It is clear that the INSERT generates locks. It is also displayed. But why does the SELECT command generate locks? These are also displayed!

So we try to break the problem down into individual steps.

The approach

The query looks like this:

SQL> SET @id = (SELECT id FROM test WHERE id = 3);

If we pack this query into an explicit transaction, we can even see the locks:

SQL> START TRANSACTION; SQL> SET @id = (SELECT id FROM test WHERE id = 3);

and in a second session:

SQL> SELECT * FROM information_schema.INNODB_TRX\G *************************** 1. row *************************** trx_id: 0 trx_state: RUNNING trx_started: 2023-11-19 15:27:09 trx_requested_lock_id: NULL trx_wait_started: NULL trx_weight: 2 trx_mysql_thread_id: 3765 trx_query: NULL trx_operation_state: trx_tables_in_use: 0 trx_tables_locked: 1 trx_lock_structs: 2 trx_lock_memory_bytes: 1128 trx_rows_locked: 1 trx_rows_modified: 0 trx_concurrency_tickets: 0 trx_isolation_level: REPEATABLE READ trx_unique_checks: 1 trx_foreign_key_checks: 1 trx_last_foreign_key_error: NULL trx_is_read_only: 0 trx_autocommit_non_locking: 0

Unfortunately, we cannot see what kind of lock (IS) it is, because the view INNODB_LOCKS is empty.

The solution

If we do the same experiment with "normal" SELECTs:

SQL> START TRANSACTION; SELECT id FROM test WHERE id = 3;

or

SQL> START TRANSACTION; SELECT id INTO @id FROM test WHERE id = 3;

we do NOT see any locks:

SQL> SELECT * FROM information_schema.INNODB_TRX\G *************************** 1. row *************************** trx_id: 0 trx_state: RUNNING trx_started: 2023-11-19 15:31:35 trx_requested_lock_id: NULL trx_wait_started: NULL trx_weight: 0 trx_mysql_thread_id: 3765 trx_query: NULL trx_operation_state: trx_tables_in_use: 0 trx_tables_locked: 0 trx_lock_structs: 0 trx_lock_memory_bytes: 1128 trx_rows_locked: 0 trx_rows_modified: 0 trx_concurrency_tickets: 0 trx_isolation_level: REPEATABLE READ trx_unique_checks: 1 trx_foreign_key_checks: 1 trx_last_foreign_key_error: NULL trx_is_read_only: 0 trx_autocommit_non_locking: 0

So it seems that the construct SET @id = (...) causes this IS lock. The customer rewrites his application and shortly afterwards we receive the following message:

Hello FromDual team,
Your tip was spot on.
No more deadlocks since Friday lunchtime.
Thank you and have a nice weekend.


Further clarified questions

MySQL 8.0 behaves the same? Yes, exactly the same.

Addendum

My dear colleague Matthias gave me a follow-up idea: What about MariaDB Stored Procedures and Stored Functions?

The two tests here:

DELIMITER // CREATE OR REPLACE PROCEDURE locktestsp (INOUT id INT) BEGIN SELECT id INTO id FROM test WHERE id = id LIMIT 1; END; // DELIMITER ; SET @id = 3; START TRANSACTION; CALL locktestsp(@id); SELECT @id; SELECT trx_tables_locked, trx_lock_structs, trx_rows_locked FROM information_schema.INNODB_TRX; +-------------------+------------------+-----------------+ | trx_tables_locked | trx_lock_structs | trx_rows_locked | +-------------------+------------------+-----------------+ | 0 | 0 | 0 | +-------------------+------------------+-----------------+

and here:

DELIMITER // CREATE OR REPLACE FUNCTION locktestsf (IN id INT) RETURNS CHAR(50) DETERMINISTIC BEGIN SELECT id INTO id FROM test WHERE id = id LIMIT 1; RETURN id; END; // DELIMITER ; START TRANSACTION; SELECT locktestsf(3); SELECT trx_tables_locked, trx_lock_structs, trx_rows_locked FROM information_schema.INNODB_TRX; +-------------------+------------------+-----------------+ | trx_tables_locked | trx_lock_structs | trx_rows_locked | +-------------------+------------------+-----------------+ | 0 | 0 | 0 | +-------------------+------------------+-----------------+
Translated by deepl.com

Upgrading MariaDB to the pre-latest minor release

Shinguz - Wed, 2023-02-08 13:45

Scenario: MariaDB Community Server is release every 3 months. The exact date is not known but it is typically about 2 to 3 weeks after the MySQL/Oracle CPU.

We upgrade regularly customer systems, also their MariaDB Galera Cluster systems. Because we made in the past some very bad experience upgrading production MariaDB Galera Clusters short after release date we became a bit more careful. Our new policy is: We wait a few weeks before installing a new release.

This week we had the situation that on Monday a new MariaDB release came out and on Wednesday we had to do the Cluster upgrade. So only 2 days in between. Too short time for my taste. But we did not want to omit the upgrade because it was already more than 6 months since the last upgrade...

So we decided to install the pre-latest MariaDB release. Technically it means: Customer was on 10.6.9, 10.6.12 is out (and already available in the repositories) but we want to upgrade to 10.6.11. Additionally we also wanted to upgrade the O/S (Debian 11). So how to do this quite complex task? In short:

  • Set all MariaDB related packages to hold.
  • Upgrade all other packages including a reboot of the machine.
  • Unhold all MariaDB related packages.
  • Install specific MariaDB packages which are not the newest one.

Set MariaDB related packages to hold dpkg -l | grep mariadb ii libdbd-mariadb-perl 1.21-3 amd64 Perl5 database interface to the MariaDB/MySQL databases ii libmariadb3:amd64 1:10.6.9+maria~deb11 amd64 MariaDB database client library ii libmariadb3-compat 1:10.6.9+maria~deb11 amd64 MariaDB database client library MySQL compat package ii libmariadbclient18 1:10.6.9+maria~deb11 amd64 Virtual package to satisfy external libmariadbclient18 depends ii mariadb-client-10.6 1:10.6.9+maria~deb11 amd64 MariaDB database client binaries ii mariadb-client-core-10.6 1:10.6.9+maria~deb11 amd64 MariaDB database core client binaries ii mariadb-common 1:10.6.9+maria~deb11 all MariaDB common configuration files ii mariadb-server-10.6 1:10.6.9+maria~deb11 amd64 MariaDB database server binaries ii mariadb-server-core-10.6 1:10.6.9+maria~deb11 amd64 MariaDB database core server files apt-mark hold galera-4 libmariadb3-compat libmariadb3 libmariadbclient18 mariadb-client-10.6 mariadb-client-core-10.6 mariadb-common mariadb-server-10.6 mariadb-server-core-10.6 mysql-common apt-mark showhold
Upgrade al other O/S dependent packages apt update apt list --upgradable Listing... Done galera-4/unknown 26.4.14-deb11 amd64 [upgradable from: 26.4.11-0+deb11u1] libmariadb3-compat/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.9+maria~deb11] libmariadb3/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.9+maria~deb11] libmariadbclient18/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.9+maria~deb11] libssl1.1/stable-security 1.1.1n-0+deb11u4 amd64 [upgradable from: 1.1.1n-0+deb11u3] mariadb-client-10.6/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.9+maria~deb11] mariadb-client-core-10.6/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.9+maria~deb11] mariadb-common/unknown 1:10.6.12+maria~deb11 all [upgradable from: 1:10.6.9+maria~deb11] mariadb-server-10.6/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.9+maria~deb11] mariadb-server-core-10.6/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.9+maria~deb11] mysql-common/unknown 1:10.6.12+maria~deb11 all [upgradable from: 5.8+1.0.7] openssl/stable-security 1.1.1n-0+deb11u4 amd64 [upgradable from: 1.1.1n-0+deb11u3] apt upgrade The following packages have been kept back: galera-4 libmariadb3 libmariadb3-compat libmariadbclient18 mariadb-client-10.6 mariadb-client-core-10.6 mariadb-common mariadb-server-10.6 mariadb-server-core-10.6 mysql-common The following packages will be upgraded: libssl1.1 openssl 2 upgraded, 0 newly installed, 0 to remove and 10 not upgraded. cat /var/run/reboot-required # reboot [-f]
Unhold MariaDB related packages apt-mark unhold galera-4 libmariadb3-compat libmariadb3 libmariadbclient18 mariadb-client-10.6 mariadb-client-core-10.6 mariadb-common mariadb-server-10.6 mariadb-server-core-10.6 mysql-common apt-mark showhold
Install specific MariaDB version packages apt list -a galera-4 apt install galera-4=26.4.13-bullseye libmariadb3-compat=1:10.6.11+maria~deb11 libmariadb3=1:10.6.11+maria~deb11 libmariadbclient18=1:10.6.11+maria~deb11 mariadb-client-10.6=1:10.6.11+maria~deb11 mariadb-client-core-10.6=1:10.6.11+maria~deb11 mariadb-common=1:10.6.11+maria~deb11 mariadb-server-10.6=1:10.6.11+maria~deb11 mariadb-server-core-10.6=1:10.6.11+maria~deb11 mysql-common=1:10.6.11+maria~deb11 # To avoid 2 database restarts we can also do the reboot here systemctl restart mariadb mariadb-upgrade --user=root apt list --upgradable Listing... Done galera-4/unknown 26.4.14-deb11 amd64 [upgradable from: 26.4.13-bullseye] libmariadb3-compat/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.11+maria~deb11] libmariadb3/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.11+maria~deb11] libmariadbclient18/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.11+maria~deb11] mariadb-client-10.6/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.11+maria~deb11] mariadb-client-core-10.6/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.11+maria~deb11] mariadb-common/unknown 1:10.6.12+maria~deb11 all [upgradable from: 1:10.6.11+maria~deb11] mariadb-server-10.6/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.11+maria~deb11] mariadb-server-core-10.6/unknown 1:10.6.12+maria~deb11 amd64 [upgradable from: 1:10.6.11+maria~deb11] mysql-common/unknown 1:10.6.12+maria~deb11 all [upgradable from: 1:10.6.11+maria~deb11]
Taxonomy upgrade extras: debianpackageupgradelocklockingholdpin

FromDual Ops Center 1.2.1 for MariaDB, MySQL and compatible databases has been released

Shinguz - Tue, 2023-02-07 17:08

FromDual has the pleasure to announce the release of the new version 1.2.1 of its popular FromDual Ops Center focmm, a Graphical User Interface (GUI) for MariaDB, MySQL and compatible databases.

The FromDual Ops Center for MariaDB and MySQL (focmm) helps DBA's and System Administrators to better manage their MariaDB and MySQL databases and Galera Cluster farms. Ops Center makes DBA and Admins life easier!

The main task of Ops Center is to support you in your daily MariaDB and MySQL operation tasks. More information about FromDual Ops Center you can find under General Information.

Download

The new FromDual Ops Center for MariaDB and MySQL (focmm) can be downloaded from our download site. How to install and use focmm is documented in the Ops Center User Guide.

In the inconceivable case that you find a bug in the FromDual Ops Center for MariaDB and MySQL please report it to the FromDual bug tracker or just send us an email.

Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.

Installation of Ops Center 1.2.1

Information on how to install FromDual Ops Center you can find in the Ops Center User Guide.

Upgrade from 0.9.x to 1.2.1

Upgrade from 0.9.x to 1.2.1 should happen automatically. Please do a backup of your Ops Center database instance before you upgrade! Please also check Upgrading.

Changes in Ops Center 1.2.1 Machine
  • ssh "Suspect machine" message fix advice command is now done with the right user and file.
  • Check error fixed and output made nicer.
  • Instance link added in machine overview and code clean-up.
  • Gather machine information added after add instance and before create instance to make sure O/S and distribution information is always there when instance is added or created.
  • Code clean-up in machine refresh.
  • Resource Group tag fixed, and code clean-up in machine show settings.

Instance
  • Repository connection was closed too early in starting and stopping instance. So start/stop failed. Bug fixed.
  • Case is caught properly now where node is started but galera plugin was not activated.
  • Bug in deploy configuration fixed. When " was added the configuration file was cut.
  • Bug in stopping instance fixed. Instance could not be stopped any more.

Cluster
  • Configuration wsrep_on = on is for MariaDB Galera Cluster and made now default. It was missing in some cases.

Load Balancer
  • GLB error handling and error messages improved.
  • Load balancer naming is enforced to unique now and load balancer configuration deployment message added.

Virtual IP (VIP)/Floating IP
  • No changes.

Tools
  • Job: Variable was not initialized correctly (bin/*), fixed.

Configuration
  • No changes.
Database-as-a-Service (DBaaS)
  • More debug information added to catch error in pricing calculations.

Building and Packaging
  • Debian compat level increased from 9 to 13.

Themes / UI
  • Library jquery updated from 3.6.1 to 3.6.3.
  • Field titles made unique Resource Group.

General
  • Some tests improved and frags made more robust.
  • Bugs in catch section fixed.
  • _SERVER[PHP_SELF] in some cases lead to wrong destination, fixed.

Repository
  • No changes.

Taxonomy upgrade extras: OperationsreleaseFromDual Ops Centerops centerdbaasfocmm

FromDual Ops Center 1.2.0 for MariaDB, MySQL and compatible databases has been released

Shinguz - Thu, 2023-01-19 12:01

FromDual has the pleasure to announce the release of the new version 1.2.0 of its popular FromDual Ops Center focmm, a Graphical User Interface (GUI) for MariaDB, MySQL and compatible databases.

The FromDual Ops Center for MariaDB and MySQL (focmm) helps DBA's and System Administrators to better manage their MariaDB and MySQL databases and Galera Cluster farms. Ops Center makes DBA and Admins life easier!

The main task of Ops Center is to support you in your daily MariaDB and MySQL operation tasks. More information about FromDual Ops Center you can find under General Information.

Download

The new FromDual Ops Center for MariaDB and MySQL (focmm) can be downloaded from our download site. How to install and use focmm is documented in the Ops Center User Guide.

In the inconceivable case that you find a bug in the FromDual Ops Center for MariaDB and MySQL please report it to the FromDual bug tracker or just send us an email.

Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.

Installation of Ops Center 1.2.0

Information on how to install FromDual Ops Center you can find in the Ops Center User Guide.

Upgrade from 0.9.x to 1.2.0

Upgrade from 0.9.x to 1.2.0 should happen automatically. Please do a backup of your Ops Center database instance before you upgrade! Please also check Upgrading.

Changes in Ops Center 1.2.0 Machine
  • Delete machine error message improved.
  • Add machine made more user friendly.
  • Delete machine warning message made more verbose.
  • Error handling improved and error messages made more verbose in gathering machine information.

Instance
  • Create instance: InnoDB buffer pool size estimation fixed.
  • Remove -core- packages from installation selection.
  • Access keys added to show instance.
  • Instance and machine names are sorted in selection now.
  • Error handling improved in performance views.
  • Restart instance bug fixed,
  • Status cluster_conf_id added.
  • Galera node bootstrap is shown correctly now.
  • Create schema can do character set now.
  • Start instance refactored.
  • Create instance is also working for Debian now.
  • Restart instance implemented.
  • Repository instance cannot be stopped any more to avoid system outages.
  • Processlist rewritten to P_S.threads.
  • Some more performance views added.
  • Delete instance warning message made more verbose.
  • Create instance: Machine names are sorted now alphabetically.
  • Query Cache remove in create instance because MySQL 8.0 does not support it any more and it is by default off in MariaDB now.
  • Instance backup remembers brman path now.

Cluster
  • Galera safe_to_bootstrap is working now and can be forced.
  • Cluster weight, node weight and segment added for Galera.
  • More information in error message when saving cluster.
  • Master/Slave features are not shown any more if Galera Cluster is chosen.
  • Cluster type cannot be changed any more.
  • Galera disable buttons if instance is stopped.
  • Galera node bootstrap functionality implemented.
  • Galera Cluster configuration deployment implemented.
  • Galera Cluster does not show M/S features any more.
  • Replication operations: SQL thread state was not displayed correctly.
  • GTID related information added to replication operations.
  • Delete instance from cluster made a bit more user friendly.

Load Balancer
  • Some minor bugs in load balancer and VIP failover fixed.
  • Galera Load Balancer balancing policy fixed.
  • Page made more user friendly.
  • Galera Load Balancer restart implemented.
  • Operations for MariaDB MaxScale implemented.
  • Load Balancer socket variable normalized.
  • Delete Load Balancer implemented.
  • Galera Load Balancer configuration is backuped before saving.
  • Galera Load Balancer Load Balancer policy change implemented.
  • Load Balancer policy added.
  • Change weight of load balancer back-end added.
  • Galera Load Balancer configuration persist added.
  • Galera Load Balancer load balancer configuration file parsed and displayed under settings.
  • Galera Load Balancer drain and undrain back-end added.
  • Galera Load Balancer version added to operations.
  • Galera Load Balancer checks refresh every 10 seconds.
  • Galera Load Balancer start and stop implemented
  • Statistics for Galera Load Balancer added.
  • Operations overview for Galera Load Balancer added.
  • Operations menu enabled.

Virtual IP (VIP)/Floating IP
  • VIP failover problem fixed.
  • Bitmask /32 is wrong, was changed to /24.
  • VIP failover made more robust.
  • Deploy standard keepalived configuration implemented.
  • keepalived version added.
  • keepalived failover works fine.
  • keepalived failover is working now.
  • vip keepalived failover started.
  • keepalived stop and start added.
  • Checks for keepalived added.
  • keepalived flag for vip added.
  • VIP can now be kept under keepalived.

Tools
  • Back button remove from crontab, makes no sense here.
  • Read in readJobs fixed and some error messages improved.
  • Display limited to 100 jobs because of memory issues.
  • Bug with remaining my_exec.stderr.* files found and fixed. PID 0 kill was caught, mkdir locking replaced by flock locking.

Configuration
  • No changes.
Database-as-a-Service (DBaaS)
  • No changes.

Building and Packaging
  • .ssh directory is created also on RPM.
  • MariaDB added as default for Rocky8.
  • Syntax error in installation script fixed.
  • On Rocky9 httpd and mysqld is now started during package installation.
  • php-mysql replaced by php-mysqlnd for RHEL package.
  • RPM package building added.
  • Directory .ssh for repository user is created during installation.
  • Package lsb-release added to Debian.
  • Install error message made better for Debian.

Themes / UI
  • jquery updated from 3.6.0 to 3.6.1 and jquery ui themes from 1.12.1 to 1.13.2

General
  • Changed copyright year from 2022 to 2023.
  • Some PHP 8.1 deprecations fixed.
  • Repository DB handle added to all writecheck functions.
  • PHP function exec replaced by my_exec where possible (ssh, scp).
  • whoami fixed, PATH added to my_exec because of Rocky8.
  • Function checkBinary also made working locally on Rocky8.
  • Bug in testEmail fixed.
  • Distribution Debian GNU/Linux is now supported everywhere.
  • Apache mod_rewrite enabled by default.
  • Add Apache SSL to installation.
  • Menu is now controllable via keys.
  • FromDual software versions is added.
  • Error message more verbose in case sendmail is not installed (Debian).
  • myEnv library updated including distro clean-up.
  • Better IP guessing during installation.

Repository
  • Create repository fixed for mariadb 5.5.
  • Code clean-up for focmm configuration file operations.

Taxonomy upgrade extras: OperationsreleaseFromDual Ops Centerops centerdbaasfocmm

Comparing Optimizer Results

Shinguz - Fri, 2022-11-18 10:59

A while ago I read a paper about Optimizer Benchmarks: How Good Are Query Optimizers, Really? by Viktor Leis from Fakultät für Informatik of the Technische Universität München.

Because we have from time to time problems with slow queries of customers especially after upgrading to new database releases it is interesting for me, how different optimizers cope with a query. Sometimes it is not clear to me why the query is slow or how I can make it faster. So the mentioned paper inspired me to compare the same query with the same dataset among different optimizers or optimizer versions. The most recent query from our customer we were testing against MariaDB 10.6, 10.9, 10.10, MySQL 8.0 and PostgreSQL 15.1. The test data

The table we were using is our generic test table which many people already know from our MariaDB and MySQL trainings:

CREATE TABLE `test` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `data` varchar(128) DEFAULT NULL, `ts` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(), PRIMARY KEY (`id`) );
Loading the data from MySQL

How to dump the data from MySQL to load them into PostgreSQL I have already described earlier. This is about how to create the table and load the data into PostgreSQL:

# sudo su - postgres # psql # postgres=# SELECT VERSION(); # postgres=# \l # postgres=# CREATE DATABASE test; CREATE DATABASE postgres=# \c test postgres=# CREATE TABLE test ( id SERIAL NOT NULL PRIMARY KEY, data VARCHAR(128) DEFAULT NULL, ts TIMESTAMP NOT NULL ); test=# \dt+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+----------+-------+----------+-------------+---------------+---------+------------- public | test | table | postgres | permanent | heap | 0 bytes | # psql test < /tmp/test_dump.sql
The Query

The query we had problems with looked something like this:

WITH t AS ( SELECT SUM(id)+0.0 AS c, data AS pn FROM test WHERE ts BETWEEN '2022-11-22 00:00:00' AND '2022-11-22 23:59:59' AND data LIKE 'ab%' GROUP BY data ORDER BY data -- MariaDB 10.6: 0.66s -- MariaDB 10.9: 0.37s -- Competing product: 0.70s ) SELECT 'AAA' AS label, ((SELECT c FROM t WHERE pn = 'ab06d9ca0e5eb27e5bb2e5c0f345b782')+(SELECT c FROM t WHERE pn = 'ab0f8e861348995d1cbf9ec21d4bf52d'))/((SELECT c FROM t WHERE pn = 'ab134910f6808802cf44b587fb825ab5')+(SELECT c FROM t WHERE pn = 'ab1d6682feb41a97054e8c634ecd5552'))*100 AS value -- MariaDB 10.6: 2.04s -- MariaDB 10.9: 0.79s -- Competing product: 0.70s UNION SELECT 'BBB' AS label, (SELECT c FROM t WHERE pn = 'ab2516b576d7463f106b5671bdac6d32')/((SELECT c FROM t WHERE pn = 'ab28e05f4fa736f4738a2ff6cf494f04')+(SELECT c FROM t WHERE pn = 'ab31ab6a7d52fee49990d63dee58371b'))*100 AS value -- MariaDB 10.6: 3.54s -- MariaDB 10.9: 1.38s -- Competing product: 0.70s UNION SELECT 'CCC' AS label, (SELECT c FROM t WHERE pn = 'ab390b211d7bb3cc19fc1f808fb1bc3a')/(SELECT c FROM t WHERE pn = 'ab45af9e4ca10d3a618d40ec3f230349')*100 AS value -- MariaDB 10.6: 4.58s -- MariaDB 10.9: 1.76s -- Competing product: 0.70s UNION SELECT 'DDD' AS label, ((SELECT c FROM t WHERE pn = 'ab531756624d8287d38eb5fe5809bdb4')+(SELECT c FROM t WHERE pn = 'ab5aaa230cc64c15a61f908aea1c3f9c'))/((SELECT c FROM t WHERE pn = 'ab60f7712cd176b6ac26040d69be0e11')+(SELECT c FROM t WHERE pn = 'ab674e464997f4916e7ed23942c53d23'))*100 AS value UNION SELECT 'EEE' AS label, (SELECT c FROM t WHERE pn = 'ab6cc3781441a08d67eb8039521c63bb')/((SELECT c FROM t WHERE pn = 'ab7612bca85db4f6c4638a658fbee646')+(SELECT c FROM t WHERE pn = 'ab7811882f520c2051c6761edf4c306f'))*100 AS value UNION SELECT 'FFF' AS label, (SELECT c FROM t WHERE pn = 'ab7f0bd3758fbdac0ff6abb24db166ad')/(SELECT c FROM t WHERE pn = 'ab87d7d598ad47ca5db3bf9e6cbdee48')*100 AS value -- MariaDB 10.6: 9.15s -- MariaDB 10.9: 3.51s -- Competing product: 0.71s ;
The results

Timing is enabled in PostgreSQL as follows:

postgres=# \timing

Then we were running the different queries against the different databases and versions:

 Q1Q2Q3Q4Q5Q6 MariaDB 10.6.110.14 s0.32 s0.79 s1.39 s1.78 s3.56 s MariaDB 10.9.40.14 s0.33 s0.80 s1.40 s1.80 s3.60 s MariaDB 10.10.20.14 s0.33 s0.80 s1.39 s1.78 s3.55 s MySQL 8.0.310.03 s0.77 s0.77 s0.77 s0.77 s0.77 s PostgreSQL 15.10.03 s0.04 s0.04 s0.04 s0.04 s0.04 s
The Queries Query 1: SELECT COUNT(*) FROM test;
Query 2 SELECT SUM(id)+0.0 AS c, data AS pn FROM test WHERE ts BETWEEN '2022-11-22 00:00:00' AND '2022-11-22 23:59:59' AND data LIKE 'ab%' GROUP BY data ORDER BY data;
Query 3 WITH t AS ( SELECT SUM(id)+0.0 AS c, data AS pn FROM test WHERE ts BETWEEN '2022-11-22 00:00:00' AND '2022-11-22 23:59:59' AND data LIKE 'ab%' GROUP BY data ORDER BY data ) SELECT 'AAA' AS label, ((SELECT c FROM t WHERE pn = 'ab06d9ca0e5eb27e5bb2e5c0f345b782')+(SELECT c FROM t WHERE pn = 'ab0f8e861348995d1cbf9ec21d4bf52d'))/((SELECT c FROM t WHERE pn = 'ab134910f6808802cf44b587fb825ab5')+(SELECT c FROM t WHERE pn = 'ab1d6682feb41a97054e8c634ecd5552'))*100 AS value;
Query 4 WITH t AS ( SELECT SUM(id)+0.0 AS c, data AS pn FROM test WHERE ts BETWEEN '2022-11-22 00:00:00' AND '2022-11-22 23:59:59' AND data LIKE 'ab%' GROUP BY data ORDER BY data ) SELECT 'AAA' AS label, ((SELECT c FROM t WHERE pn = 'ab06d9ca0e5eb27e5bb2e5c0f345b782')+(SELECT c FROM t WHERE pn = 'ab0f8e861348995d1cbf9ec21d4bf52d'))/((SELECT c FROM t WHERE pn = 'ab134910f6808802cf44b587fb825ab5')+(SELECT c FROM t WHERE pn = 'ab1d6682feb41a97054e8c634ecd5552'))*100 AS value UNION SELECT 'BBB' AS label, (SELECT c FROM t WHERE pn = 'ab2516b576d7463f106b5671bdac6d32')/((SELECT c FROM t WHERE pn = 'ab28e05f4fa736f4738a2ff6cf494f04')+(SELECT c FROM t WHERE pn = 'ab31ab6a7d52fee49990d63dee58371b'))*100 AS value;
Query 5 WITH t AS ( SELECT SUM(id)+0.0 AS c, data AS pn FROM test WHERE ts BETWEEN '2022-11-22 00:00:00' AND '2022-11-22 23:59:59' AND data LIKE 'ab%' GROUP BY data ORDER BY data ) SELECT 'AAA' AS label, ((SELECT c FROM t WHERE pn = 'ab06d9ca0e5eb27e5bb2e5c0f345b782')+(SELECT c FROM t WHERE pn = 'ab0f8e861348995d1cbf9ec21d4bf52d'))/((SELECT c FROM t WHERE pn = 'ab134910f6808802cf44b587fb825ab5')+(SELECT c FROM t WHERE pn = 'ab1d6682feb41a97054e8c634ecd5552'))*100 AS value UNION SELECT 'BBB' AS label, (SELECT c FROM t WHERE pn = 'ab2516b576d7463f106b5671bdac6d32')/((SELECT c FROM t WHERE pn = 'ab28e05f4fa736f4738a2ff6cf494f04')+(SELECT c FROM t WHERE pn = 'ab31ab6a7d52fee49990d63dee58371b'))*100 AS value UNION SELECT 'CCC' AS label, (SELECT c FROM t WHERE pn = 'ab390b211d7bb3cc19fc1f808fb1bc3a')/(SELECT c FROM t WHERE pn = 'ab45af9e4ca10d3a618d40ec3f230349')*100 AS value;
Query 6 WITH t AS ( SELECT SUM(id)+0.0 AS c, data AS pn FROM test WHERE ts BETWEEN '2022-11-22 00:00:00' AND '2022-11-22 23:59:59' AND data LIKE 'ab%' GROUP BY data ORDER BY data ) SELECT 'AAA' AS label, ((SELECT c FROM t WHERE pn = 'ab06d9ca0e5eb27e5bb2e5c0f345b782')+(SELECT c FROM t WHERE pn = 'ab0f8e861348995d1cbf9ec21d4bf52d'))/((SELECT c FROM t WHERE pn = 'ab134910f6808802cf44b587fb825ab5')+(SELECT c FROM t WHERE pn = 'ab1d6682feb41a97054e8c634ecd5552'))*100 AS value UNION SELECT 'BBB' AS label, (SELECT c FROM t WHERE pn = 'ab2516b576d7463f106b5671bdac6d32')/((SELECT c FROM t WHERE pn = 'ab28e05f4fa736f4738a2ff6cf494f04')+(SELECT c FROM t WHERE pn = 'ab31ab6a7d52fee49990d63dee58371b'))*100 AS value UNION SELECT 'CCC' AS label, (SELECT c FROM t WHERE pn = 'ab390b211d7bb3cc19fc1f808fb1bc3a')/(SELECT c FROM t WHERE pn = 'ab45af9e4ca10d3a618d40ec3f230349')*100 AS value UNION SELECT 'DDD' AS label, ((SELECT c FROM t WHERE pn = 'ab531756624d8287d38eb5fe5809bdb4')+(SELECT c FROM t WHERE pn = 'ab5aaa230cc64c15a61f908aea1c3f9c'))/((SELECT c FROM t WHERE pn = 'ab60f7712cd176b6ac26040d69be0e11')+(SELECT c FROM t WHERE pn = 'ab674e464997f4916e7ed23942c53d23'))*100 AS value UNION SELECT 'EEE' AS label, (SELECT c FROM t WHERE pn = 'ab6cc3781441a08d67eb8039521c63bb')/((SELECT c FROM t WHERE pn = 'ab7612bca85db4f6c4638a658fbee646')+(SELECT c FROM t WHERE pn = 'ab7811882f520c2051c6761edf4c306f'))*100 AS value UNION SELECT 'FFF' AS label, (SELECT c FROM t WHERE pn = 'ab7f0bd3758fbdac0ff6abb24db166ad')/(SELECT c FROM t WHERE pn = 'ab87d7d598ad47ca5db3bf9e6cbdee48')*100 AS value;
Conclusion

This little tests showed some results I have not expected:

  • MariaDB Optimizer does some bad things on this query. OK, this was expected. Bug is reported: MDEV-30017.
  • The originally very bad execution times from MariaDB 10.6 (see timing notes in the first query) were not reproducible any more the next day. I should consider more to run ANALYZE TABLE. I should have know this because it is not the first time I experienced this.
  • We have a lot of variations between different measurements. This I should investigate a bit more later...
  • In the MariaDB Server Fest 2022 yesterday I heard with MariaDB 11 Optimizer everything will become much better... Wait! there is something more I do not understand and which I did not expect:
  • PostgreSQL response time is constantly dramatic better for these queries. I checked the docu but did not find anything beside their normal cache which would explain this (something similar to the MariaDB Query Cache for example). If somebody has a clue why they manage to respond so fast or why we respond so slow I would be happy for a hint...

Taxonomy upgrade extras: postgresqlOptimizerperformanceresponse timelatency

Migration of your data from one database to another

Shinguz - Thu, 2022-11-17 17:01

Before you consider migrating your data from MySQL to another database you have to know which objects have to be migrated.

With this query you will find the objects to consider:

SELECT TABLE_SCHEMA AS `SCHEMA`, IF(TABLE_TYPE = 'BASE TABLE', 'TABLE', TABLE_TYPE) AS OBJECT_TYPE, TABLE_NAME AS `OBJECT_NAME` , IFNULL(ENGINE, '') AS ENGINE, IFNULL(TABLE_ROWS, '') AS `ROWS` , IFNULL(DATA_LENGTH, '') AS DATA_SIZE, IFNULL(INDEX_LENGTH, '') AS INDEX_SIZE FROM information_schema.tables WHERE TABLE_SCHEMA NOT IN ('information_schema', 'mysql', 'performance_schema', 'sys') UNION SELECT '', IF(is_role = 'Y', 'ROLE', 'USER'), CONCAT("'", user, "'", '@', "'", host, "'") AS OBJECT_TYPE, '', '', '', '' FROM mysql.user UNION SELECT db, type, name, '', '', '', '' FROM mysql.proc WHERE db NOT IN ('sys', 'information_schema', 'performance_schema', 'mysql') UNION SELECT db, 'EVENT', name, '', '', '', '' FROM mysql.event UNION SELECT trigger_schema, 'TRIGGER', trigger_name, '', '', '', '' FROM information_schema.triggers UNION SELECT db, 'EVENT', name, '', '', '', '' FROM mysql.event ORDER BY `SCHEMA` ASC, OBJECT_TYPE ASC, OBJECT_NAME ASC ; +--------+-------------+---------------------------+--------+---------+-----------+------------+ | SCHEMA | OBJECT_TYPE | OBJECT_NAME | ENGINE | ROWS | DATA_SIZE | INDEX_SIZE | +--------+-------------+---------------------------+--------+---------+-----------+------------+ | | ROLE | 'test_r'@'' | | | | | | | USER | 'app'@'%' | | | | | | | USER | 'app'@'127.0.0.1' | | | | | | | USER | 'focmm'@'127.0.0.1' | | | | | | | USER | 'test'@'localhost' | | | | | | sbtest | TABLE | sbtest1 | InnoDB | 9680 | 2637824 | 163840 | | test | EVENT | myevent | | | | | | test | FUNCTION | format_time | | | | | | test | PROCEDURE | diagnostics | | | | | | test | TABLE | dt | InnoDB | 6 | 16384 | 0 | | test | TABLE | test | InnoDB | 1045044 | 63520768 | 0 | | test | TRIGGER | test_trigger | | | | | | test | VIEW | test_v | | | | | +--------+-------------+---------------------------+--------+---------+-----------+------------+

An easy way to dump all the object definitions (except users and roles) is the following command:

mysqldump --user=root --no-data test > /tmp/test_structure_dump.sql

If you want to dump your data for importing them into another SQL database this command can help:

mysqldump --user=root --skip-extended-insert --skip-lock-tables --no-create-info \ --where='id = id ' --skip-add-locks --skip-comments --skip-quote-names test test \ | grep -v '^/\*' | grep -v ^$ > /tmp/test_dump.sql
Taxonomy upgrade extras: migrationdatabase

Linux Container with LXD for focmm unit testing

Shinguz - Thu, 2022-10-13 19:11

Currently we are in the testing phase of the next release of our Ops Center (focmm). This testing is quite complex because it includes testing the interaction of various different components like a MariaDB Galera Cluster, a Galera Load Balancer, a virtual IP, etc.

Recently I was in the Linuxhotel for a Galera Cluster Training and there one of the other trainers was romanticising about Linux Containers and LXD. So I had a short look, if LXD could be useful to simplify our unit testing. Some old knowledge from a previous Docker PoC was quite helpful to start with...

Prepare a LXC container for Galera Load Balancer

This did NOT work as expected because the Debian image was lacking IPv4 addresses and I did not find on the quick how to change that. So I used the Ubuntu 22.04 image.

shell> lxc remote list shell> lxc image list images: ubuntu/22.04 amd64 shell> INSTANCE='qa-glb' shell> lxc launch images:ubuntu/jammy ${INSTANCE} shell> lxc list *glb* +--------+---------+-----------------------+----------------------------------------------+------------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +--------+---------+-----------------------+----------------------------------------------+------------+-----------+ | qa-glb | RUNNING | 10.139.158.183 (eth0) | fd42:1730:178f:78c:216:3eff:fe3f:2948 (eth0) | PERSISTENT | 0 | +--------+---------+-----------------------+----------------------------------------------+------------+-----------+

Now we should remember the IPv4 address for later use. I am sure this can be done more elegant but for now this is fine...

Install Galera Load Balancer in the LXC container shell> lxc exec ${INSTANCE} -- /bin/bash container> apt-get update container> apt-get install wget container> wget https://support.fromdual.com/admin/download/glb_1.0.1-Ubuntu12.04-x86_64.deb container> apt-get install ./glb_1.0.1-Ubuntu12.04-x86_64.deb container> rm -f glb_1.0.1-Ubuntu12.04-x86_64.deb

Then we have to add the unit file:

# # /etc/systemd/system/glb.service # [Unit] Description=Galera Load Balancer Service After=network.target [Service] EnvironmentFile=/etc/default/glbd Type=simple ExecStart=/usr/local/sbin/glbd --daemon --threads $THREADS --max_conn $MAX_CONN $OTHER_OPTIONS --control $CONTROL_ADDR $LISTEN_ADDR $DEFAULT_TARGETS [Install] WantedBy=multi-user.target

enable the unit file, configure the Galera Load Balancer and start it:

container> systemctl enable glb # Galera Load Balancer Configuration # Redhat: /etc/sysconfig/glbd # Debian: /etc/default/glbd LISTEN_ADDR="3306" CONTROL_ADDR="10.139.158.183:8011" CONTROL_FIFO="/var/run/glbd.fifo" THREADS="2" MAX_CONN=151 DEFAULT_TARGETS="10.139.158.1:3330:1 10.139.158.1:3331:1 10.139.158.1:3332:1" OTHER_OPTIONS="--round" container> systemctl start glb container> apt-get remove wget
Container testing

To be sure everything works fine we should do some basic tests:

shell> lxc stop ${INSTANCE} shell> lxc start ${INSTANCE} shell> echo getinfo | nc -q 1 10.139.158.183 8011
Unit testing focmm against Galera Load Balancer in the LXC container

And finally we did the unit testing of focmm against the Galera Local Balancer which is in the LXC container:

shell> ./tst/run_all_tests.php --instance=qamariadb106 --module=LoadBalancer shell>/dev/null Environment is: qamariadb106 Logfile is: /tmp/focmm_tst_qamariadb106.log Tests are: module=LoadBalancer and function=all OK stopLoadBalancerRemote OK startLoadBalancerRemote OK restartLoadBalancerRemote OK readLoadBalancerTypes OK createLoadBalancer OK updateLoadBalancer OK readLoadBalancers OK parseGlbConfiguration OK openSocket OK writeReadSocket OK parseGlbGetInfo OK parseGlbGetStats OK changeLoadBalancerBackendWeightRemote OK deleteLoadBalancer
Taxonomy upgrade extras: containerlxclxdtestingunit testinggaleraload balancer

FromDual Performance Monitor for MariaDB/MySQL allows SNMP monitoring

Shinguz - Thu, 2022-06-23 12:03

The newest release of the Performance Monitor for MariaDB and MySQL v2.0.0 (fpmmm) allows you to monitor your MariaDB and MySQL databases via SNMP (Simple Network Management Protocol).

This feature enables you to report your Database Metrics into Enterprise Monitoring solutions from Microsoft, IBM or into Oracle Cloud Control.

In this article we will show you the few simple steps it needs to send the data from your fpmmm Agent to your SNMP Manager.

Check if SNMP Manager can be reached

To make sure fpmmm can send data to the SNMP Manager at all you can test the connection with the following command to send SNMP Notifications to the SNMP Manager:

# COMMUNITY='public' # MANAGER='192.168.56.102' # PORT='162' # TRAP_OID='1.3.6.1.4.1.57800.1.1.2' # OID='1.3.6.1.4.1.57800.1.1.1' # TYPE='c' # VALUE=$(mariadb --user=root --execute="SELECT variable_value FROM information_schema.global_status WHERE variable_name LIKE 'threads_running'\G" | grep variable_value | cut -d' ' -f2) # snmptrap -v 2c -c ${COMMUNITY} ${MANAGER}:${PORT} '' ${TRAP_OID} ${OID} ${TYPE} "${VALUE}"

Then check in the SNMP Manager Logfile if the traps arrive.

Copy fpmmm MIBs to your fpmmm Agent Machine

You can see where your fpmmm MIBs are expected on your database machine (where your fpmmm Agent is located) with the following command:

# net-snmp-config --default-mibdirs ${HOME}/.snmp/mibs:/usr/share/snmp/mibs:/usr/share/snmp/mibs/iana:/usr/share/snmp/mibs/ietf:/usr/share/mibs/site:/usr/share/snmp/mibs:/usr/share/mibs/iana:/usr/share/mibs/ietf:/usr/share/mibs/netsnmp

Copy the fpmmm MIBs as follows (we assume that you work with the same user as your fpmmm Agent works (zabbix or mysql)):

# mkdir -p ${HOME}/.snmp/mibs # cp /opt/fpmmm/tpl/FromDual-fpmmm-MIB.mib ${HOME}/.snmp/mibs

or choose any other location you want to have them.

To check if the FromDual MIBs are recognized correctly run these commands:

# snmptranslate -DFromDual-fpmmm-MIB.mib -m +FromDual-fpmmm-MIB 1.3.6.1.4.1.57800.1.1.1 registered debug token FromDual-fpmmm-MIB.mib, 1 FromDual-fpmmm-MIB::fpmmmLastrun # snmptranslate -On FromDual-fpmmm-MIB::fpmmmLastrun .1.3.6.1.4.1.57800.1.1.1

Do the same for the SNMP Manager according to your installation instructions of your SNMP Manager.

Configure your FromDual Performance Monitor to send SNMP traps

The fpmmm Configuation File (/etc/fpmmm.conf) should contain at least the following lines to make fpmmm sending SNMP traps instead of Zabbix messages:

[default] # SNMP Server SnmpServer = 192.168.56.102 SnmpCommunity = public SnmpTrapPort = 162 OutputFormat = snmp

Do a test run of fpmmm with:

# /opt/fpmmm/bin/fpmmm --config=/etc/fpmmm.conf

Then check again on your SNMP Server machine if the SNMP traps arrive:

snmptrapd: Agent Address: 0.0.0.0 Agent Hostname: Date: 1 - 1 - 4 - 1 - 1 - 1970 Enterprise OID: . Trap Type: Cold Start Trap Sub-Type: 0 Community/Infosec Context: TRAP2, SNMP v2c, community public Uptime: 0 Description: Cold Start PDU Attribute/Value Pair Array:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1215419) 3:22:34.19#012SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.57800.1.1.2#012SNMPv2-SMI::enterprises.57800.1.1.1 = Counter32: 3 snmptrapd: Agent Address: 0.0.0.0 Agent Hostname: Date: 1 - 1 - 4 - 1 - 1 - 1970 Enterprise OID: . Trap Type: Cold Start Trap Sub-Type: 0 Community/Infosec Context: TRAP2, SNMP v2c, community public Uptime: 0 Description: Cold Start PDU Attribute/Value Pair Array:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (60) 0:00:00.60#012SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.57800.1.1.2#012SNMPv2-SMI::enterprises.57800.2.7.1 = Counter32: 1 snmptrapd: Agent Address: 0.0.0.0 Agent Hostname: Date: 1 - 1 - 4 - 1 - 1 - 1970 Enterprise OID: . Trap Type: Cold Start Trap Sub-Type: 0 Community/Infosec Context: TRAP2, SNMP v2c, community public Uptime: 0 Description: Cold Start PDU Attribute/Value Pair Array:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (60) 0:00:00.60#012SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.57800.1.1.2#012SNMPv2-SMI::enterprises.57800.2.7.2 = Counter32: 151 snmptrapd: Agent Address: 0.0.0.0 Agent Hostname: Date: 1 - 1 - 4 - 1 - 1 - 1970 Enterprise OID: . Trap Type: Cold Start Trap Sub-Type: 0 Community/Infosec Context: TRAP2, SNMP v2c, community public Uptime: 0 Description: Cold Start PDU Attribute/Value Pair Array:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (60) 0:00:00.60#012SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.57800.1.1.2#012SNMPv2-SMI::enterprises.57800.2.7.1 = Counter32: 1 snmptrapd: Agent Address: 0.0.0.0 Agent Hostname: Date: 1 - 1 - 4 - 1 - 1 - 1970 Enterprise OID: . Trap Type: Cold Start Trap Sub-Type: 0 Community/Infosec Context: TRAP2, SNMP v2c, community public Uptime: 0 Description: Cold Start PDU Attribute/Value Pair Array:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (60) 0:00:00.60#012SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.57800.1.1.2#012SNMPv2-SMI::enterprises.57800.2.7.3 = Counter32: 1

This is already all about making fpmmm capable sending SNMP traps.

Sources
Taxonomy upgrade extras: performanceperformance monitoringmonitorfpmmmSNMP

FromDual Performance Monitor for MariaDB 2.0.0 has been released

Shinguz - Thu, 2022-05-19 20:09

FromDual has the pleasure to announce the release of the new version 2.0.0 of its popular Database Performance Monitor for MariaDB and Galera Cluster fpmmm.

The FromDual Performance Monitor for MariaDB (fpmmm) enables DBAs and System Administrators to monitor and understand what is going on inside their MariaDB database instances and on the machines where the databases reside.

More detailed information you can find in the fpmmm Installation Guide.

Download

The new FromDual Performance Monitor for MariaDB (fpmmm) can be downloaded from here or you can use our FromDual repositories. How to install and use fpmmm is documented in the fpmmm Installation Guide.

In the inconceivable case that you find a bug in the FromDual Performance Monitor for MariaDB please report it to the FromDual Bug-tracker or just send us an email.

Any feedback, statements and testimonials are welcome as well! Please send them to us.

Monitoring as a Service (MaaS)

You do not want to set-up your Database monitoring yourself? No problem: Choose our MariaDB Monitoring as a Service (Maas) program to safe time and costs!

Installation of Performance Monitor 2.0.0

A complete guide on how to install FromDual Performance Monitor you can find in the fpmmm Installation Guide.

Upgrade of fpmmm tarball from 1.x to 2.0.0

There are some changes in the configuration file (fpmmm.conf):

  • The access rights should be change as follows: chmod 600 /etc/fpmmm.conf
  • The key Methode was spelled wrong in the configuration file. It was renamed to Method.
  • The key PidFile is ambiguous which could lead to problems and bugs. Thus it was changed to either MyPidFile for fpmmm and DbPidFile for the database.

Upgrade with DEB/RPM packages should happen automatically. For tarballs follow this:

shell> cd /opt shell> tar xf /download/fpmmm-2.0.0.tar.gz shell> rm -f fpmmm shell> ln -s fpmmm-2.0.0 fpmmm
Changes in FromDual Performance Monitor for MariaDB 2.0.0

This release contains new features and various bug fixes.

You can verify your current FromDual Performance Monitor for MariaDB version with the following command:

shell> /opt/fpmmm/bin/fpmmm --version
General
  • Some trigger url links were improved in various templates.
  • Zabbix 5.0 templates were added.
  • New MyEnv library added.
  • oncePerHourRun and oncePerDay run implemented.
  • Code clean-up in various functions: sendData, cacheFile, initValue, mr_version, mylog, extractVersioncomment. This should make fpmmm dramatically faster in some scenarios.
  • Error handling improved, error messages improved, some Linux command replaced by PHP commands, typos fixed.
  • Better handling of non available variables and status information.
  • All old php5 stuff removed.
  • Various fixed of minor bugs.
  • Ambiguous PidFile replaced by MyPidFile for fpmmm and DbPidFile for database PID file. Caution during Upgrade!
  • Lock location changed from /var/run to /run/lock.
  • Several distribution related and new MariaDB privilege related bugs fixed.
  • Variables default, configuration and configuration file sorted out.
  • Lock file handling improved, Nagios functionality added.
  • Config file is checked if it is readable for others.
  • Own locking mechanism replaced by O/S flock.
  • Option checks activated.
  • Newer distros added (Debian 11, Ubuntu 22.04, Rocky Linux 8).
  • PHP 8, MySQL 8 and MariaDB 10.6 tests passed.

Server
  • Item entropy added.
  • Trigger urls fixed.
  • Items added to templates: Server Entropy.
  • NUMA count fixed.
  • Linux md drives are supported now as well.
  • smartctl error handling improved.
  • NUMA node memory balancing graph added and trigger set.
  • CPU time values were far too high. This is fixed with the new CPU measuring variant.
  • Broken md devices is added to server module + trigger.
  • Server disk: minor device is only gathered in extensive, loop device is skipped.
  • Filesystem locks item and trigger added.
  • Path /usr/sbin for tool smartctl added for Debian.
  • Skip loop and snap devices.
  • Server does not wait 1 second any more for CPU info.
  • New Feature GatherData = {normal|minimal|extensive} added.

Data
  • Disabled binary log is caught in data.
  • #innodb_temp schema size added.

Galera
  • tx_isolation made MySQL 8 ready.
  • Bug fixed in case when Galera is disabled but module is activated.
  • Tables without Primary Key and non InnoDB table items and triggers added.
  • Various triggers added to check correct Galera configuration.
  • Change of cluster configuration id fixed.
  • XA trigger fixed.
  • Galera cache item + trigger fixed.
  • Some more items added for Galera compatibility checks.
  • Items wsrep_xa_statements, wsrep_log_bin, wsrep_log_slave_updates and gcache.size added.

User
  • Code refactored and cleaned-up.
  • Problem during max_connections reached fixed.

Agent
  • Trigger links fixed.
  • New fpmmm version items and triggers added.
  • fpmmm version stuff fixed.
  • Lay off of software-release function and implement check for fpmmm version.

InnoDB
  • InnoDB template fixed on deadlock item.
  • All locking related graphs moved together into same InnoDB screen.
  • innodb_buffer_pool_max_dirty_pages_pct added.
  • Fix division by zero bug.
  • Bug in InnoDB status fixed.
  • InnoDB buffer pool dirty pages trigger added and a small number of buffer pool instances trigger removed.
  • Trigger for innodb_flush_log_at_trx_commit set from info to warning because this is security related.
  • Innodb_log_occupancy, Innodb_checkpoint_age, innodb_log_group_capacity and trigger added.
  • innodb_file_format item added.
  • Items innodb_max_dirty_pages_pct added.
  • InnoDB metrics added.
  • Redo log occupancy increased.

MyISAM
  • none

Aria
  • none

Security
  • none

Master
  • Sync_binlog warning change into the opposite. Security first!
  • BINLOG CLIENT privilege replaced by BINLOG MONITOR for MariaDB 10.5.

Slave
  • Template link fixed.
  • Table with missing Primary Key item added.
  • Link to new graphs fixed for 5.0.

Backup
  • Restore items moved from backup to restore template.
  • Items for restore added for automatized restore testing.
  • Trigger links in backup template fixed.
  • Trigger is fired if backup did not happen within last 36 hours.

MySQL
  • Trigger for transaction_cache_size and statement_cache_size was improved.
  • Trigger urls fixed.
  • Table Open Cache and Table Definition Cache triggers link changed to item values.
  • Storage_engine added for compatibility reasons with default_storage_engine.
  • Processlist total threads added.
  • Error log parser added.
  • com_change_db and queries added.
  • Refactoring of code.
  • Processlist graph and items added, code made better.
  • P_S metadata lock warning improved.
  • Metadata lock info plugin installation message added.
  • Storage_engine removed.

Nagios
  • FromDual Nagios plug-ins implemented in fpmmm.

SNMP
  • SNMP implemented for most important MariaDB and Galera metrics.

For subscriptions of commercial use of fpmmm please get in contact with us.

Taxonomy upgrade extras: performancemonitormonitoringfpmmmmaasreleasegraph

MariaDB MaxScale Load Balancer with Master/Master Replication

Shinguz - Fri, 2022-05-13 16:39

For this how-to we were working with a MariaDB 10.6.5 database server and the newest MariaDB MaxScale 6.3.0 (aka 2.6) load balancer/proxy.

As a starting point we had a working MariaDB Master/Master replication with one active Master (Ma) and one passive Master (Mp). Whereas the passive Master was set to read_only. As a replication user we used the user replication.

Creating database accounts for the MaxScale Load Balancer

The MaxScale load balancer connects itself with the application users to the database. To do this it needs to retrieve the available users from the database first. To get these users the MaxScale account needs some privileges:

CREATE USER 'maxscale'@'%' IDENTIFIED BY 'secret'; GRANT SELECT ON mysql.user TO 'maxscale'@'%'; GRANT SELECT ON mysql.db TO 'maxscale'@'%'; GRANT SELECT ON mysql.tables_priv TO 'maxscale'@'%'; GRANT SELECT ON mysql.columns_priv TO 'maxscale'@'%'; GRANT SELECT ON mysql.procs_priv TO 'maxscale'@'%'; GRANT SELECT ON mysql.proxies_priv TO 'maxscale'@'%'; GRANT SELECT ON mysql.roles_mapping TO 'maxscale'@'%'; GRANT SHOW DATABASES ON *.* TO 'maxscale'@'%';

For monitoring the replication and doing proper switchover and failover MaxScale further needs a monitoring account:

CREATE USER 'maxscale_monitor'@'%' IDENTIFIED BY 'secret'; GRANT REPLICATION CLIENT on *.* to 'maxscale_monitor'@'%'; GRANT REPLICATION SLAVE on *.* to 'maxscale_monitor'@'%'; GRANT SUPER, RELOAD on *.* to 'maxscale_monitor'@'%';

And last we need an application account for our test application:

CREATE USER 'app'@'%' IDENTIFIED BY 'secret'; GRANT ALL on test.* to 'app'@'%';
Starting MariaDB MaxScale Load Balancer

Because we do not use the provided DEB/RPM packages on our systems but generic binary tarballs, which are not available, we have to start the MaxScale Load Balancer a bit over-complicated:

# export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/home/mysql/product/maxscale/usr/lib/x86_64-linux-gnu/maxscale # ldd ./maxscale # ./maxscale --help # ./maxscale --config=/home/mysql/etc/maxscale_mm.cnf --datadir=/home/mysql/tmp --basedir=/home/mysql/product/maxscale --logdir=/home/mysql/log --config-check # ./maxscale --config=/home/mysql/etc/maxscale_mm.cnf --datadir=/home/mysql/tmp --basedir=/home/mysql/product/maxscale --logdir=/home/mysql/log

The MaxScale configuration file we used in these tests looks as follows:

# # maxscale_mm.cnf # [maxscale] threads = auto [master1] type = server address = 192.168.1.11 port = 3306 protocol = MariaDBBackend [master2] type = server address = 192.168.1.12 port = 3306 protocol = MariaDBBackend [MultiMasterMonitor] type = monitor module = mariadbmon servers = master1,master2 user = maxscale_monitor password = secret enforce_read_only_slaves = true auto_rejoin = true # auto_failover = true [WriteService] type = service router = readconnroute router_options = master servers = master1,master2 user = maxscale password = secret [ReadService] type = service router = readconnroute router_options = slave servers = master1,master2 user = maxscale password = secret [WriteListener] type = listener service = WriteService protocol = MariaDBClient port = 3306 [ReadListener] type = listener service = ReadService protocol = MariaDBClient port = 3307

If the start was successful can be seen for example with:

# ps -ef | grep maxscale

If you start the MariaDB MaxScale the first time no user/password is needed. So we can connect the the MaxScale Load Balancer with the maxctrl client:

# ./maxctrl maxctrl> list listeners ┌──────────────────┬──────┬──────┬─────────┬──────────────┐ │ Name │ Port │ Host │ State │ Service │ ├──────────────────┼──────┼──────┼─────────┼──────────────┤ │ WriteListener │ 3306 │ :: │ Running │ WriteService │ ├──────────────────┼──────┼──────┼─────────┼──────────────┤ │ MaxAdminListener │ 3307 │ :: │ Running │ ReadService │ └──────────────────┴──────┴──────┴─────────┴──────────────┘ maxctrl> list services ┌──────────────┬───────────────┬─────────────┬───────────────────┬──────────────────┐ │ Service │ Router │ Connections │ Total Connections │ Servers │ ├──────────────┼───────────────┼─────────────┼───────────────────┼──────────────────┤ │ WriteService │ readconnroute │ 0 │ 0 │ master1, master2 │ ├──────────────┼───────────────┼─────────────┼───────────────────┼──────────────────┤ │ ReadService │ readconnroute │ 0 │ 0 │ master1, master2 │ └──────────────┴───────────────┴─────────────┴───────────────────┴──────────────────┘ maxctrl> list servers ┌─────────┬──────────────┬──────┬─────────────┬──────────────────────────────┬────────────┐ │ Server │ Address │ Port │ Connections │ State │ GTID │ ├─────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼────────────┤ │ master1 │ 192.168.1.11 │ 3306 │ 0 │ Master, Running │ 0-3308-134 │ ├─────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼────────────┤ │ master2 │ 192.168.1.12 │ 3306 │ 0 │ Relay Master, Slave, Running │ 0-3308-134 │ └─────────┴──────────────┴──────┴─────────────┴──────────────────────────────┴────────────┘ maxctrl> list sessions ┌────────┬──────┬─────────────────────┬──────────────────────────┬──────┬─────────────┐ │ Id │ User │ Host │ Connected │ Idle │ Service │ ├────────┼──────┼─────────────────────┼──────────────────────────┼──────┼─────────────┤ │ 746468 │ app │ ::ffff:192.168.1.99 │ Fri May 13 15:28:55 2022 │ 0.4 │ ReadService │ └────────┴──────┴─────────────────────┴──────────────────────────┴──────┴─────────────┘

To not allow everybody to access MariaDB MaxScale we set a password for the admin user:

maxctrl> alter user admin secret OK

Then you can connect with username and password:

# ./maxctrl --user=admin --password=secret maxctrl> list users ┌───────┬──────┬────────────┐ │ Name │ Type │ Privileges │ ├───────┼──────┼────────────┤ │ admin │ inet │ admin │ └───────┴──────┴────────────┘
Testing connections over the MaxScale Load Balancer

To test if connections are ending up on the correct database instances we used the following commands:

# mariadb --user=app --host=192.168.1.1 --port=3306 --password=secret test --execute='SELECT @@hostname, @@port' # mariadb --user=app --host=192.168.1.1 --port=3307 --password=secret test --execute='SELECT @@hostname, @@port'
Monitoring of nodes in MariaDB MaxScale

To see what is going on inside the MariaDB MaxScale Load Balancer we used the following command to create a simple real time monitor:

# watch -d -n 1 ./maxctrl --user=admin --password=secret list servers
Switchover and Failover with MariaDB MaxScale

To use MariaDB MaxScales switchover and failover capabilities the Master/Master Replication must be configured to use Global Transaction IDs (GTID). To change to GTID based replication you can use the following commands on both masters:

SQL> STOP SLAVE; SQL> CHANGE MASTER TO MASTER_USE_GTID = SLAVE_POS; SQL> START SLAVE;
Switchover in MariaDB MaxScale

A graceful switchover is used in a controlled situation if we want to switch the roles between the active master and the passive master and vice versa... This can be used for example before a maintenance operation: We do the maintenance operation on the passive Master first, then we switch the roles and then we can do the maintenance operation on the now new passive Master:

maxctrl> call command mariadbmon switchover MultiMasterMonitor

After switching for and back a few times we found out, that the MariaDB MaxScale Monitor has replaced our replication user replication by his own user maxscale_monitor. This is not documented? and I do really not like it, especially if it is done silently...

Failover in MariaDB MaxScale

To provoke/simulate a failover situation we stopped the active Master. Then we see in our monitor that the active Master is down and the passive Master is still running:

maxctrl> list servers ┌─────────┬──────────────┬──────┬─────────────┬────────────────┬────────────┐ │ Server │ Address │ Port │ Connections │ State │ GTID │ ├─────────┼──────────────┼──────┼─────────────┼────────────────┼────────────┤ │ master1 │ 192.168.1.11 │ 3306 │ 0 │ Down │ 0-3309-142 │ ├─────────┼──────────────┼──────┼─────────────┼────────────────┼────────────┤ │ master2 │ 192.168.1.12 │ 3306 │ 1 │ Slave, Running │ 0-3309-142 │ └─────────┴──────────────┴──────┴─────────────┴────────────────┴────────────┘

We further observe, that our application (the famous insert_test.sh) is not working any more but throwing errors... Then we trigger a manual failover:

maxctrl> call command mariadbmon failover MultiMasterMonitor

It looks like an automatic failover is possible (auto_failover = true) but we do not recommend this set-up and thus we did not further investigate in this feature.

After the manual failover the former passive Master becomes active Master:

maxctrl> list servers ┌─────────┬──────────────┬──────┬─────────────┬─────────────────┬───────────────┐ │ Server │ Address │ Port │ Connections │ State │ GTID │ ├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────────┤ │ master1 │ 192.168.1.11 │ 3306 │ 0 │ Down │ 0-3308-365829 │ ├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────────┤ │ master2 │ 192.168.1.12 │ 3306 │ 0 │ Master, Running │ 0-3308-370235 │ └─────────┴──────────────┴──────┴─────────────┴─────────────────┴───────────────┘

Now we simulate the repair of the former failed active Master by just restarting it:

maxctrl> list servers ┌─────────┬──────────────┬──────┬─────────────┬─────────────────┬───────────────┐ │ Server │ Address │ Port │ Connections │ State │ GTID │ ├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────────┤ │ master1 │ 192.168.1.11 │ 3306 │ 0 │ Slave, Running │ 0-3308-401309 │ ├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────────┤ │ master2 │ 192.168.1.12 │ 3306 │ 1 │ Master, Running │ 0-3308-401309 │ └─────────┴──────────────┴──────┴─────────────┴─────────────────┴───────────────┘

It comes back into the MariaDB MaxScale but just as a simple Slave. So a failover seems to break our Master/Master replication for some reasons. This must be repaired afterwards. Whereas a simple switchover seems to work properly. If this is a bug or intended behaviour I do not know... With the following command on master2 the loop is closed again:

SQL> CHANGE MASTER TO master_host='192.168.1.11', master_port=3306, master_user='maxscale_monitor', master_password='secret', master_use_gtid=slave_pos; SQL> START SLAVE;

Now everything is fine and working as expected again:

maxctrl> list servers ┌─────────┬──────────────┬──────┬─────────────┬──────────────────────────────┬───────────────┐ │ Server │ Address │ Port │ Connections │ State │ GTID │ ├─────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼───────────────┤ │ master1 │ 192.168.1.11 │ 3306 │ 1 │ Relay Master, Slave, Running │ 0-3308-440194 │ ├─────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼───────────────┤ │ master2 │ 192.168.1.12 │ 3306 │ 0 │ Master, Running │ 0-3308-440194 │ └─────────┴──────────────┴──────┴─────────────┴──────────────────────────────┴───────────────┘

Stopping and starting the passive Master had not effect at all.

Switchover with a lagging passive master

To test this scenario we created an artificial lag of the passive Master by setting innodb_flush_log_at_trx_commit = 1 and sync_binlog = 1. When the passive master was lagging far enough (about 30 seconds) we tried an switchover:

maxctrl> call command mariadbmon switchover MultiMasterMonitor Error: timeout of 10000ms exceeded

So it is not really clear what happens in this case and the error message is not really telling us about the problem. Further this operations somehow breaks Master/Master replication again.

Failover with a lagging passive master

If we try a failover instead of a switchover we get at least a bit a more meaningful error message:

Error: Server at http://127.0.0.1:8989 responded with 403 Forbidden to `POST maxscale/modules/mariadbmon/failover?MultiMasterMonitor` { "links": { "self": "http://127.0.0.1:8989/v1/maxscale/modules/mariadbmon/failover/" }, "meta": { "errors": [ { "detail": "Can not select 'master1' as a demotion target for failover because it is a running master." }, { "detail": "Failover cancelled." } ] } }
Draining a node with MariaDB MaxScalse

If we try to drain a passive Master we get some warnings but it seems like the result is the expected:

maxctrl> drain server master2 Warning: Saving runtime modifications to 'ReadService' in '/home/mysql/product/maxscale/var/lib/maxscale/maxscale.cnf.d/ReadService.cnf'. The modified values will override the values found in the static configuration files.;Saving runtime modifications to 'WriteService' in '/home/mysql/product/maxscale/var/lib/maxscale/maxscale.cnf.d/WriteService.cnf'. The modified values will override the values found in the static configuration files. To hide these warnings, run: export MAXCTRL_WARNINGS=0 Warning: Saving runtime modifications to 'ReadService' in '/home/mysql/product/maxscale/var/lib/maxscale/maxscale.cnf.d/ReadService.cnf'. The modified values will override the values found in the static configuration files.;Saving runtime modifications to 'WriteService' in '/home/mysql/product/maxscale/var/lib/maxscale/maxscale.cnf.d/WriteService.cnf'. The modified values will override the values found in the static configuration files. To hide these warnings, run: export MAXCTRL_WARNINGS=0 OK maxctrl> list servers ┌─────────┬──────────────┬──────┬─────────────┬──────────────────────┬───────────────┐ │ Server │ Address │ Port │ Connections │ State │ GTID │ ├─────────┼──────────────┼──────┼─────────────┼──────────────────────┼───────────────┤ │ master1 │ 192.168.1.11 │ 3306 │ 0 │ Master, Running │ 0-3308-631119 │ ├─────────┼──────────────┼──────┼─────────────┼──────────────────────┼───────────────┤ │ master2 │ 192.168.1.12 │ 3306 │ 0 │ Maintenance, Running │ 0-3308-631119 │ └─────────┴──────────────┴──────┴─────────────┴──────────────────────┴───────────────┘
Un-drain a node with MariaDB MaxScale

For un-draining the node we tried:

maxctrl> clear server master2 drain OK

On the first look everything seems to be OK. But the State of master2 was still in Maintenance. The command:

maxctrl> clear server master2 maintenance

did the job... If this is a bug or should be considered as an operator error I do not know...

Draining the active Master caused a switchover.

Sources
Taxonomy upgrade extras: replicationmaxscaleload balancerproxyactive-activemaster-mastermariadb

Streaming backup with MariaDB Backup

Shinguz - Thu, 2022-05-12 20:48

Because the original MariaDB documentation is a little bit sparse about this topic here are some notes about how to do a streaming MariaDB backup:

# mariadb-backup --user=root --backup --parallel=4 --stream=mbstream | ssh root@target_machine "/usr/bin/mbstream -x -C /mnt/backup/" # mariadb-backup --user=root --prepare --target-dir=/mnt/backup

FromDual Backup and Recovery Manager for MariaDB and MySQL 2.2.5 has been released

Shinguz - Fri, 2022-02-25 14:53

FromDual has the pleasure to announce the release of the new version 2.2.5 of its popular Backup and Recovery Manager for MariaDB and MySQL (brman).

The new FromDual Backup and Recovery Manager can be downloaded from here. The FromDual Repositories were updated. How to install and use the Backup and Recovery Manager is described in FromDual Backup and Recovery Manager (brman) installation guide.

In the inconceivable case that you find a bug in the FromDual Backup and Recovery Manager please report it to the FromDual Bugtracker or just send us an email.

Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.

Upgrade from 2.x to 2.2.5 shell> cd ${HOME}/product shell> tar xf /download/brman-2.2.5.tar.gz shell> rm -f brman shell> ln -s brman-2.2.5 brman
Changes in FromDual Backup and Recovery Manager 2.2.5

This release is a new minor release. It contains mainly bug fixes. We have tried to maintain backward-compatibility with the 1.2, 2.0 and 2.1 release series. But you should test the new release seriously!

You can verify your current FromDual Backup Manager version with the following command:

shell> fromdual_bman --version shell> bman --version shell> rman --version
General
  • Newest myEnv library copied from myEnv project.
  • Some code clean-up.

FromDual Backup Manager (bman>
  • Option --source-data which was introduce in MySQL 8.0.26 caused a problem in the mysqldump command. Fixed.
  • Tool XtraBackup 8.0 does not support --no-timestamp option any more. Fixed.
  • Library bman.inc restructured into smaller files.
  • Requirement checks for myenv and brman separated.
  • Start backup message move much further up in output.
  • Connection handling of bman made more OO style to nicer cope with some special cases (DB down).
  • Directory for fpmmm is created automatically now if it does not exist.
  • Clean-up file size returned to bman and can now be reported correctly to fpmmm.
  • Output of configuration files read improved.
  • Configuration files of brman are now checked if group or other can read the files.
  • Hostname is converted to IP first now before we check if it is a remote location in doBinlogBackup.
  • Check in doSchemabackup for missing RELOAD privilege (rc=4198) was done in a wrong way. Fixed.
  • Error message improved in doBinlogBackup.
  • Various typos fixed.
  • Various bugs in bman fixed found with testing and tests improved.

FromDual Recovery Manager (rman)
  • Rman also checks configuration file for readability for group and others now.
  • Option --target in rman is accepted now as configuration parameter.

FromDual brman Catalog
  • Some bugs bman_catalog fixed found with testing.

Testing
  • XtraBackup version check bug fixed in full physical backup test.

Subscriptions for commercial use of FromDual Backup and Recovery Manager you can get from from us.

Taxonomy upgrade extras: BackupRestoreRecoverypitrbrmanreleasebmanrmanFromDual Backup and Recovery Manager

Sharding do-it-yourself

Shinguz - Tue, 2022-02-15 12:18

As already mentioned earlier, we roughly have a hand full of customers which are playing with the though of sharding solutions. They typically have many different customers (clients, tenants) and the number of customers becomes so huge (thousands to millions) that one machine cannot cope with the load any more.

So splitting the load by customers to different machines makes sense. This is quite easy when customers are separated per schema. In the good old times of Open Source our customers have implemented those solutions themselves. But nowadays it looks like do-it-yourself is not sexy any more. It seems like this core competence of a business advantage must be outsourced. So some vendors have already made some solutions available to solve this need: Sharding Solutions.

My question here is: Can a generic sharding solution build exactly what your business needs? Are you still capable to optimize your business process in the way you need it when you buy a 3rd party solution?

And: If you use another product, you have also to build up the know-how how to use it correctly. So I am really wondering if it is worth the effort? Buy or make is the question here. So we made a little Proof-of-Concept of a sharding solution. And it did not take too long...

The concept

First of all we have 2 different kinds of components:

  • The Fabric Node - this is the database where all the meta information about the shards are stored.
  • The Shards - these are the databases where all the customer data are stored.

And we need all this more or less in a highly available fashion.

The fabric table can look as simple as this:

CREATE TABLE `tenant` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `tenant_id` varchar(128) NOT NULL, `schema` varchar(128) NOT NULL, `machine` varchar(128) NOT NULL, `port` smallint(5) unsigned NOT NULL, `locked` enum('no','yes') NOT NULL DEFAULT 'no', PRIMARY KEY (`id`), UNIQUE KEY `tenant_id` (`tenant_id`), UNIQUE KEY `schema` (`schema`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ; INSERT INTO tenant VALUES (NULL, 'Customer 1', 'customer_1', '192.168.33.21', 3306, 'no') , (null, 'Customer 2', 'customer_2', '192.168.33.22', 3306, 'yes') , (null, 'Customer 3', 'customer_3', '192.168.33.23', 3306, 'no') ; SQL> SELECT * FROM tenant; +----+------------+------------+---------------+------+--------+ | id | tenant_id | schema | machine | port | locked | +----+------------+------------+---------------+------+--------+ | 4 | Customer 1 | customer_1 | 192.168.33.21 | 3306 | no | | 5 | Customer 2 | customer_2 | 192.168.33.22 | 3306 | yes | | 6 | Customer 3 | customer_3 | 192.168.33.23 | 3306 | no | +----+------------+------------+---------------+------+--------+
Connection

Now our application needs to know on which shard the data for a specific customer is stored. This will be found in the fabric. So our application has to do first a connect to the fabric and then a connect to the shard. To make it more transparent for your application you can encapsulate everything in one method. And if you want to optimize the connecting you can store the sharding information in a local cache.

$dbhFabric = getFabricConnection($aFabricConnection); $aShardConnection = readShardConnection($dbhFabric, $aTenant); $dbhShard = getShardConnection($aShardConnection); // Do everything needed for the tenant... $sql = sprintf("SELECT * FROM `customer_data` limit 3"); $result = $dbhShard->query($sql);

For the PoC we create 3 different schemas (customer_1, customer_2 and customer_3) with some customer data in it:

CREATE TABLE `customer_data` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `customer_name` varchar(128) NOT NULL, `customer_data` varchar(128) NOT NULL, `customer_number` INT UNSIGNED NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ; INSERT INTO `customer_data` VALUES (NULL, 'Customer 3', 'Some data', ROUND(RAND()*1000000, 0)) , (NULL, 'Customer 3', 'Some data', ROUND(RAND()*1000000, 0)) , (NULL, 'Customer 3', 'Some data', ROUND(RAND()*1000000, 0)) ;
Resharding

One of the challenges of such a construct is that the intensity of usage in the shards changes over time. All new customers are possibly placed in the same new shard. The new customers start playing around with your application. They do more and more. So a shard becomes overloaded sooner or later. Or the other way around: You have a lot of customers which were very enthusiastic about your product in the beginning and not so much any more now. So you loose customers on the older shards. Or you have to replace an old shard by newer hardware. All this leads to the situation, that your solution is not balanced any more and must be re-balanced. We call this resharding:

This can be done quite easily with a few commands:

-- Lock tenant UPDATE `tenant` SET `locked` = 'yes' WHERE `tenant_id` = 'Customer 3'; -- Wait some time until cache expires or invalidate cache mariadb-dump --user=root --host=192.168.33.23 --port=3306 customer_3 | mariadb --user=root --host=192.168.33.21 --port=3306 # Check error codes very well. Possibly do some other additional checks! -- Update tenant to new shard UPDATE `tenant` SET `machine` = '192.168.33.21', `port` = 3306 WHERE `tenant_id` = 'Customer 3'; -- DROP tenant in old shard mariadb --user=root --host=192.168.33.23 --port=3306 --execute="DROP SCHEMA `customer_3`" -- Unlock tenant UPDATE `tenant` SET `locked` = 'no' WHERE `tenant_id` = 'Customer 3';
Monitoring

To find out which customers cause a lot of load and which shards are over- or under-loaded you need some kind of sophisticated monitoring. You possibly want to know things like:

  • CPU usage (mpstat)
  • I/O usage (iostat)
  • NW usage
  • RAM usage (free)
  • Number of connections per shard and per customer
  • Number of queries per shard and per customer
  • etc.

Those metrics can be found with some queries:

SHOW GLOBAL STATUS LIKE 'Bytes%'; SELECT * FROM information_schema.global_status WHERE variable_name IN ('Threads_connected', 'Threads_running', 'Max_used_connections') ; SELECT processlist_db, COUNT(*) FROM performance_schema.threads WHERE type = 'FOREGROUND' GROUP BY processlist_db ; SELECT * FROM sys.schema_table_lock_waits WHERE object_schema LIKE 'customer%'; SELECT * FROM sys.schema_table_statistics WHERE table_schema LIKE 'customer%'; SELECT * FROM sys.schema_table_statistics_with_buffer WHERE table_schema LIKE 'customer%';; SELECT table_schema, SUM(data_length) + SUM(index_length) AS schema_size FROM information_schema.tables WHERE table_schema LIKE 'customer%' GROUP BY table_schema ;
Missing features

If you really think you need it, you can also make a nice GUI to show all those metrics. But be prepared: This will cost you most of your time!

We assume that all the shards are accessed with the same user as the fabric is accessed. In reality this is possibly not the case. But the tenant table can be easily extended.

To make the whole concept much easier we omitted the idea of number of replicas which is known from other solutions. We think having every shard redundant by a Master/Slave replication is sufficient.

If you find anything which is missing in this concept study or if you experience some problems we did not thought about, we would be happy hearing from you.

Taxonomy upgrade extras: shardingmulti-tenant

Containers and databases

Shinguz - Fri, 2022-02-11 15:44

In the last months we got more and more requests for supporting MariaDB/MySQL/Galera Cluster in (mostly Docker) containers.

Because of its additional layer and added complexity I do not like containers much. Containers are more complicated during troubleshooting and debugging problems.

Other people have already written more than enough about the advantages of containers. What is more difficult to find are the disadvantages of technologies. Thus I focus on those:

Wrong technology?

Container solutions were designed to deal with stateless applications that have ephemeral data. Containers spin up a quick microservice and then destroy it. This includes all the components of that container (including its cache and data). The transient nature of containers is because all of the components and services of that container are considered to be part of the container (essentially it is all or nothing). Serving the container a data volume owned by the underlying O/S by punching a hole through the container can be very challenging.
Most of the development efforts put into the various solutions had one goal in mind: Statelessness. There are solutions that can help keep your data persistent, but they are very quickly evolving. They require a high level of complexity, that negate any efficiency gains due to increased operational complexity (and risk). [1]

Stability

These container solutions are meant for quick development and deployment of applications that are broken into tiny components: microservices. Normally, these applications evolve very quickly in organizations that are very software/developer-driven. That seems to be how these container solutions (again, especially Docker) are developed as well. New features are pushed out with little testing and design. The main focus seems to be the latest feature set and being first to market. They “beg for forgiveness” instead of “ask for permission.” On top of that, backward compatibility is a distant concern (and even that might be an overstatement). This means that you are going to have to have a mature Continuous Delivery and testing environment as well as a known and tested image repository for your containers. [1]

We have seen complaints about Galera Cluster stability issues inside Docker containers. The signs were pointing without doubt to network issues. If these were real network issues or just container network issues we could not find out yet.

Networking can be tricky in containers world when you want to limit the access within containers and also have proper network communications where required. [3]

Docker might even make your application slower. If you are working with it, you should set limits on how much memory, CPU, or block I/O the container can use. Otherwise, if the kernel detects that the host machine’s memory is running too low to perform important system functions, it could start killing important processes. If the wrong process is killed (including the Docker itself), the system will be unstable.

Performance

We have heard reports that performance overhead of Docker containers can be up to 10%. If this is still true with the right configuration and and recent version must be shown. [2]

You should not expect Docker to speed up an application in any way. [5]

Security

Since there is no full operating system people tend to overlook the security aspect of containers, but if you look up online, you will see that hackers are targeting systems that are hosted in containers and not secured properly.
Since the containers use the same kernel, they are not 100 isolated, so you should be aware of the risks if you are using multiple containers in one server, and make sure you know what you are doing and which containers are running on the same kernel along with your stuff! [3]

All containers share access to a single host operating system. You risk running Docker containers with incomplete isolation. Any malicious code can get access to your computer memory. [5]

Running applications with Docker implies running the Docker daemon with root privileges. Any processes that break out of Docker container will have the same privileges on the host as it did in the container. Running your processes inside the containers as a non-privileged user cannot guarantee security. It depends on the capabilities you add or remove. To mitigate the risks of Docker container breakout, you should not download ready-to-use containers from untrusted sources. [5]

Data storage is intricate – By design, all of the data inside a container leaves forever when it closes down except you save it somewhere else first. There are ways to store data tenaciously in Docker, such as Docker Data Capacities, but this is arguably a test that still has yet to be approached in a seamless manner. [6]

Container O/S is the same as host O/S. If the host O/S is upgraded all the containers get also a new O/S.

Popular docker images have many vulnerabilities. So build and harden your images yourself. [10, 11, 12]

One of the most famous Docker security vulnerabilities can be found here: Alpine Linux Docker Images Shipped for 3 Years with Root Accounts Unlocked and Alpine Linux Docker Image Vulnerability CVE-2019-5021 and Docker Image Vulnerability (CVE-2019-5021).

Complexity

Containerization also means consolidation. And as in consolidated systems usually you can have side effects or effects caused by someone you did not expect.

Debugging

Debugging problems in a container environment becomes more complex because the many additional layers added. Then the necessary information and metrics are not there or not available in the way as expected. This makes troubleshooting more complicated.

Policy

Docker implementation is quite complex. A load of technological supports are necessary for Docker implementation including orchestration, container management, app stack, data screenshots, networking of containers, and so on.

The container ecosystem is split – But the core Docker platform is open source, some container products do not work with other ones. [6]

Features

Container technologies require kernel features which were not present in earlier kernels. This made system maintenance more complicated. This problem may have been solved in the meanwhile?

If you are aware of any other disadvantage not mentioned above please let us know. Some of these disadvantages might have been reduced in the recent years.

Literature
Taxonomy upgrade extras: containerdockerkubernetes

Learning from the Bugs Database

Shinguz - Thu, 2022-01-20 15:05

This week I came across an old known issue reported in May 2010: Master/Slave Replication with binlog_format = ROW and tables without a Primary Key is a bad idea! Especially if these tables are huge.

Why this is a bad idea is described in the bug report #53375:

if one runs DML on a table that has no indexes, a full table scan is done. with RBR, the slave might need to scan the full table for *each* row changed.

The consequence of this behaviour is that the Slave starts lagging. It was further mentioned:

Worst part is that PROCESSLIST, etc provide absolutely NO obvious indication what is going on, for something that may take 12 hours, 3 days or even more...

Symptoms of this problem are described as follows:

Observe 78,278 row locks but only 10,045 undo log entries, so many more rows being scanned than changed. Also observer 16 row deletes per second but 600,754 row reads per second, same mismatch between counts suggesting unindexed accesses are happening.

You may also see "invalidating query cache entries (table)" as a symptom in the processlist. If you see that, check to see whether this is the possible root cause instead of giving full blame to only the query cache."

The suggested workaround is: add a primary key to the table.

But some user complain:

in my case, my only decent primary key is a surrogate key - and that's untenable because of the locking and lost concurrency (even with lock_mode = 2). Even if I solved that, I'd have to use the surrogate in partitioning - which more or less defeats the purpose of partitioning by lopsiding the partitions.

and others claim:

Adding an "otherwise usable (i.e. to improve query times)" PK is not really an option for them since there are no short unique columns.

A long composite key is also not an option because:

  • In InnoDB tables, having a long PRIMARY KEY wastes a lot of space.
  • In InnoDB, the records in nonclustered indexes (also called secondary indexes) contain the primary key columns for the row that are not in the secondary index. InnoDB uses this primary key value to search for the row in the clustered index. If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.

And then comes a first suggestion for solving the issue:

So, we can create a normal short/auto-increment PK, but this is more or less the same as having the internal/hidden InnoDB PK (which does not seem to be used properly for RBR replication purposes).

As mentioned before, possibly the internal/hidden InnoDB PK can be used to resolve this bug.

Short after we get an important information and the learning of the day:

there's nothing that makes the InnoDB internal key consistent between a master and a slave or before and after backup and restore. Row with internal ID 1 can have completely different end user data values on different servers, so it's useless for the purpose being considered here, unfortunately.

Nor is there any prohibition on a slave having a unique key, which will be promoted to PK, even if there is no unique key on the master. They can even have different PKs and there can be good application reasons for doing that. Though we could require at least a unique key on all slaves that matches a master's PK without it hurting unduly.

It is possible to recommend at least _a_ key (not necessarily unique) on the slave and have replication try key-based lookups to narrow down the number of rows that must examined. That combined with batch processing should cut the pain a lot because we can reasonably ask for at least some non-unique but at least reasonably selective key to use. But this is only recommend, not require. If people want no key, we should let them have no key and be slow.

Then we got some further information why moving back to SBR is a bad idea:

It is my opinion that switching to SBR has way too many trade offs (if even for one table) to call it an acceptable workaround. The main crux for this argument being that just about the only time you run into this bug is when you have tables with a massive amount of rows - which is exactly where you start paying heavy penalties for SBR (Locking)."

And a new potential problem rises up:

As far as how the server should know which key to use - am I correct in assuming that it will use the optimizer to determine the index, and you are asking what would happen if the optimizer picked the wrong one?"

Another suggestion for improvement:

Batching looks promising, at least it would reduce the number of scans. But it would still be very painful if no key could be used. While using a key would be very painful if a high percentage of the rows in the table were being touched. So maybe some mixed solution that depends on the count of rows being touched might be best.

In about 2012 they had an implementation for batch jobs.

And you can force a Primary Key now in MySQL 8.0 since 2021 with sql_require_primary_key.

How MariaDB solves the problem you can find here: Row-based Replication With No Primary Key.

Taxonomy upgrade extras: primary keyreplicationRow Based Replication (RBR)Statement Based Replication (SBR)

MariaDB Deadlocks

Shinguz - Tue, 2022-01-11 15:18

We get ever and ever again customer requests concerning Deadlocks. First of all, Deadlocks are usually an application problem, not a database problem! The database itself manifests the application problem with the following message which is sent to the application as an error:

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

If your application receives this error message you know where in your application you have a problem. But a deadlock is always a problem between 2 different connections. So how you can find the other part of the problem?

The full Deadlock situation is shown with the following command:

SQL> SHOW ENGINE InnoDB STATUS\G ... ------------------------ LATEST DETECTED DEADLOCK ------------------------ 2019-12-23 18:55:18 0x7f51045e3700 *** (1) TRANSACTION: TRANSACTION 847, ACTIVE 10 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 4 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1 MySQL thread id 46, OS thread handle 139985942054656, query id 839 localhost root Updating delete from t where id = 10 *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 40 page no 3 n bits 80 index PRIMARY of table `test`.`t` trx id 847 lock_mode X locks rec but not gap waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 32 0: len 4; hex 8000000a; asc ;; 1: len 6; hex 00000000034e; asc N;; 2: len 7; hex 760000019c0495; asc v ;; *** (2) TRANSACTION: TRANSACTION 846, ACTIVE 25 sec starting index read mysql tables in use 1, locked 1 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1 MySQL thread id 39, OS thread handle 139985942361856, query id 840 localhost root Updating delete from t where id = 11 *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 40 page no 3 n bits 80 index PRIMARY of table `test`.`t` trx id 846 lock_mode X locks rec but not gap Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 32 0: len 4; hex 8000000a; asc ;; 1: len 6; hex 00000000034e; asc N;; 2: len 7; hex 760000019c0495; asc v ;; *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 40 page no 3 n bits 80 index PRIMARY of table `test`.`t` trx id 846 lock_mode X locks rec but not gap waiting Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 32 0: len 4; hex 8000000b; asc ;; 1: len 6; hex 00000000034f; asc O;; 2: len 7; hex 770000019d031d; asc w ;; *** WE ROLL BACK TRANSACTION (2) ...

Unfortunately with the SHOW ENGINE INNODB STATUS command you can only show the last Deadlock error. If you want to log all the Deadlock errors in the MariaDB error log you can activate the Deadlock error reporting with the following configuration option in your MariaDB configuration file (my.cnf):

innodb_print_all_deadlocks = ON
How to solve Deadlocks?

Deadlocks always can occur in a transactional database system with fine-grained locking. So your application must be aware of Deadlocks and catch the Deadlock error and retry the transaction. The error message already indicates this. It is an advice for YOU as a developer:

try restarting transaction
How to avoid deadlocks?

Deadlocks are an hot spot on some rows. Hot spot means: Many (at least 2) connections do something (INSERT, UPDATE, REPLACE, DELETE, SELECT FOR UPDATE) on the same rows (which then are locked).

So how can you solve a Deadlock problem? Solving in the meaning of "reduce the probability of a Deadlock close to zero". Zero Deadlocks guarantee you can only achieve in no-concurrency scenarios.

  • Look at the design of your application: Is it necessary that the 2 conflicting transactions are done in the way you are doing it right now? Can you change the behaviour of your application?
  • Reduce locking time: As shorter the locks are hold as smaller the probability is that you run into a Deadlock situation. So make sure your conflicting transactions are running as fast as possible and the locks are hold as short as possible. Check that all the transactions are using perfect indexes.
  • Reduce the amount of locks: The fewer rows your conflicting transactions lock the smaller is the chance for a Deadlock. For example: Several small INSERTs in several transactions instead of one huge INSERT in one big transaction should cause less locks and thus a smaller probability of Deadlocks. Only change (REPLACE, INSERT ON DUPLICATE KEY UPDATE) what is really needed to change. Check before you change, that rows need to be changed.
  • Do your tasks less often. For example polling every 10 seconds instead of every second, or do refreshes of data less often.
  • Timely defer concurrent transactions. For example 2 batch jobs: Do they really need to run at the exact same time?
  • Reduce concurrency (parallelism), if possibly.
  • Catch the Deadlock error and retry the transaction again a few milliseconds later. If this does not happen too often it is not a problem for your database.
  • Change transaction isolation level to a level causing less locks. You can do that per session (SET SESSION ...) to reduce the impact on the whole system or globally (SET GLOBAL ...). But changing the isolation level globally potentially can have also an impact on other parts of your application or even completely different applications running on the same database.

And to be aware: You cannot completely avoid (100%) Deadlocks. They always can happen and you have to cope with it!

Do not mix up Deadlocks with Galera cluster conflicts which look similar at the first look but are not.

Literature
Taxonomy upgrade extras: deadlock

Pages

Subscribe to FromDual aggregator - FromDual all (en)