You are here

dbstat for MariaDB (and MySQL)

Table of contents


An idea that I have been thinking about for a long time and have now, thanks to a customer, finally tackled is dbstat for MariaDB/MySQL. The idea is based on sar/sysstat by Sebastien Godard:

sar - Collect, report, or save system activity information.

and Oracle Statspack:

Statspack is a performance tuning tool ... to quickly gather detailed analysis of the performance of that database instance.

Functionality of dbstat

Although we have had the performance schema for some time, it does not cover some points that we see as a problem in practice and that are requested by customers:

  • The table_size module collects data on the growth of tables. This allows statements to be made about the growth of individual tables, databases, future MariaDB Catalogs or the entire instance. This is interesting for users who are using multi-tenant systems or are otherwise struggling with uncontrolled growth.
  • The processlist module takes a snapshot of the process list at regular intervals and saves it. This information is useful for post-mortem analyses if the user was too slow to save his process list or to understand how a problem has built up.
  • The problem is often caused by long-running transactions, row locks or metadata locks. These are recorded and saved by the trx_and_lck and metadata_lock modules. This means that we can see problems that we did not even notice before or we can see what led to the problem after the accident (analogous to a tachograph in a vehicle).
  • Another question that we sometimes encounter in practice is: When was which database variable changed and what did it look like before? This is covered by the global_variables module. Unfortunately, it is not possible to find out who changed the variable or why. Operational processes are required for this.
  • The last module, global_status, actually covers what sar/sysstat does. It collects the values from SHOW GLOBAL STATUS; and saves them for later analysis purposes or to simply create graphs.

How does dbstat work

dbstat uses the database Event Scheduler as a scheduler. This must first be switched on for MariaDB (event_scheduler = ON). With MySQL it is already switched on by default. The Event Scheduler has the advantage that we can activate the jobs at a finer granularity, for example 10 s, which would not be possible with the crontab.

The Event Scheduler then executes SQL/PSM code to collect the data on the one hand and to delete the data on the other, so that the dbstat database does not grow immeasurably.

The following jobs are currently planned:

ModuleCollectDeleteQuantity structureRemarks
table_size1/d at 02:0412/h, 1000 rows, > 31 d1000 tab × 31 d = 31k rowsShould work up to 288k tables.
processlist1/min1/min, 1000 rows, > 7 d1000 con × 1440 min × 7 d = 10M rowsShould work up to 1000 concurrent connections.
trx_and_lck1/min1/min, 1000 rows, > 7 d100 lck × 1440 min × 7 d = 1M rowsDepends very much on the application.
metadata_lock1/min12/h, 1000 rows, > 30 d100 mdl × 1440 × 30 d = 4M rowsDepends very much on the application.
global_variables1/minnever1000 rowsNormally this table should not grow.
global_status1/min1/min, 1000 rows, > 30 d1000 rows × 1440 × 30 d = 40MRows can become large?

How to install dbstat

dbstat can be downloaded from Github and is licensed under GPLv2.

The installation is simple: First execute the SQL file create_user_and_db.sql. Then execute the corresponding create_*.sql files for the respective modules in the dbstat database. There are currently no direct dependencies between the modules. If you want to use a different user or a different database than dbstat, you have to take care of this yourself.

Query dbstat

Some possible queries on the data have already been prepared. They can be found in the query_*.sql files. Here are a few examples:

table_size
SELECT `table_schema`, `table_name`, `ts`, `table_rows`, `data_length`, `index_length`
  FROM `table_size`
 WHERE `table_catalog` = 'def'
   AND `table_schema` = 'dbstat'
   AND `table_name` = 'table_size'
ORDER BY `ts` ASC
;
+--------------+------------+---------------------+------------+-------------+--------------+
| table_schema | table_name | ts                  | table_rows | data_length | index_length |
+--------------+------------+---------------------+------------+-------------+--------------+
| dbstat       | table_size | 2024-03-09 20:01:00 |          0 |       16384 |        16384 |
| dbstat       | table_size | 2024-03-10 17:26:33 |        310 |       65536 |        16384 |
| dbstat       | table_size | 2024-03-11 08:28:12 |        622 |      114688 |        49152 |
| dbstat       | table_size | 2024-03-12 08:02:38 |        934 |      114688 |        49152 |
| dbstat       | table_size | 2024-03-13 08:08:55 |       1247 |      278528 |        81920 |
+--------------+------------+---------------------+------------+-------------+--------------+

processlist

SELECT connection_id, ts, time, state, SUBSTR(REGEXP_REPLACE(REPLACE(query, "\n", ' '), '\ +', ' '), 1, 64) AS query
  FROM processlist
 WHERE command != 'Sleep'
   AND connection_id = @connection_id
 ORDER BY ts ASC
 LIMIT 5
;
+---------------+---------------------+---------+---------------------------------+---------------------------------------------+
| connection_id | ts                  | time    | state                           | query                                       |
+---------------+---------------------+---------+---------------------------------+---------------------------------------------+
|         14956 | 2024-03-09 20:21:12 |  13.042 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 |
|         14956 | 2024-03-09 20:22:12 |  73.045 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 |
|         14956 | 2024-03-09 20:23:12 | 133.044 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 |
|         14956 | 2024-03-09 20:24:12 | 193.044 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 |
|         14956 | 2024-03-09 20:25:12 | 253.041 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 |
+---------------+---------------------+---------+---------------------------------+---------------------------------------------+

trx_and_lck

SELECT * FROM trx_and_lck\G
*************************** 1. row ***************************
         machine_name:
        connection_id: 14815
               trx_id: 269766
                   ts: 2024-03-09 20:05:57
                 user: root
                 host: localhost
                   db: test
              command: Query
                 time: 41.000
        running_since: 2024-03-09 20:05:16
                state: Statistics
                 info: select * from test where id = 6 for update
            trx_state: LOCK WAIT
          trx_started: 2024-03-09 20:05:15
trx_requested_lock_id: 269766:821:5:7
    trx_tables_in_use: 1
    trx_tables_locked: 1
     trx_lock_structs: 2
      trx_rows_locked: 1
    trx_rows_modified: 0
            lock_mode: X
            lock_type: RECORD
    lock_table_schema: test
      lock_table_name: test
           lock_index: PRIMARY
           lock_space: 821
            lock_page: 5
             lock_rec: 7
            lock_data: 6
*************************** 2. row ***************************
         machine_name:
        connection_id: 14817
               trx_id: 269760
                   ts: 2024-03-09 20:05:57
                 user: root
                 host: localhost
                   db: test
              command: Sleep
                 time: 60.000
        running_since: 2024-03-09 20:04:57
                state:
                 info:
            trx_state: RUNNING
          trx_started: 2024-03-09 20:04:56
trx_requested_lock_id: NULL
    trx_tables_in_use: 0
    trx_tables_locked: 1
     trx_lock_structs: 2
      trx_rows_locked: 1
    trx_rows_modified: 1
            lock_mode: X
            lock_type: RECORD
    lock_table_schema: test
      lock_table_name: test
           lock_index: PRIMARY
           lock_space: 821
            lock_page: 5
             lock_rec: 7
            lock_data: 6

metadata_lock

SELECT lock_mode, ts, user, host, lock_type, table_schema, table_name, time, started, state, query
  FROM metadata_lock
 WHERE connection_id = 14347
 ORDER BY started DESC
 LIMIT 5
;
+-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+
| lock_mode               | ts                  | user | host      | lock_type            | table_schema | table_name | time  | started             | state          | query                                                |
+-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+
| MDL_SHARED_WRITE        | 2024-03-13 10:27:33 | root | localhost | Table metadata lock  | test         | test       | 1.000 | 2024-03-13 10:27:32 | Updating       | UPDATE test set data3 = MD5(id)                      |
| MDL_BACKUP_TRANS_DML    | 2024-03-13 10:27:33 | root | localhost | Backup lock          |              |            | 1.000 | 2024-03-13 10:27:32 | Updating       | UPDATE test set data3 = MD5(id)                      |
| MDL_BACKUP_ALTER_COPY   | 2024-03-13 10:22:33 | root | localhost | Backup lock          |              |            | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) |
| MDL_SHARED_UPGRADABLE   | 2024-03-13 10:22:33 | root | localhost | Table metadata lock  | test         | test       | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) |
| MDL_INTENTION_EXCLUSIVE | 2024-03-13 10:22:33 | root | localhost | Schema metadata lock | test         |            | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) |
+-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+

global_variables

SELECT variable_name, COUNT(*) AS cnt
  FROM global_variables
 GROUP BY variable_name
 HAVING COUNT(*) > 1
;
+-------------------------+-----+
| variable_name           | cnt |
+-------------------------+-----+
| innodb_buffer_pool_size |   7 |
+-------------------------+-----+

SELECT variable_name, ts, variable_value
  FROM global_variables
 WHERE variable_name = 'innodb_buffer_pool_size'
;
+-------------------------+---------------------+----------------+
| variable_name           | ts                  | variable_value |
+-------------------------+---------------------+----------------+
| innodb_buffer_pool_size | 2024-03-09 21:36:28 | 134217728      |
| innodb_buffer_pool_size | 2024-03-09 21:40:25 | 268435456      |
| innodb_buffer_pool_size | 2024-03-09 21:48:14 | 134217728      |
+-------------------------+---------------------+----------------+

global_status

SELECT s1.ts
     , s1.variable_value AS 'table_open_cache_misses'
     , s2.variable_value AS 'table_open_cache_hits'
  FROM global_status AS s1
  JOIN global_status AS s2 ON s1.ts = s2.ts
 WHERE s1.variable_name = 'table_open_cache_misses'
   AND s2.variable_name = 'table_open_cache_hits'
   AND s1.ts BETWEEN '2024-03-13 11:55:00' AND '2024-03-13 12:05:00'
 ORDER BY ts ASC
;
+---------------------+-------------------------+-----------------------+
| ts                  | table_open_cache_misses | table_open_cache_hits |
+---------------------+-------------------------+-----------------------+
| 2024-03-13 11:55:47 | 1001                    | 60711                 |
| 2024-03-13 11:56:47 | 1008                    | 61418                 |
| 2024-03-13 11:57:47 | 1015                    | 62125                 |
| 2024-03-13 11:58:47 | 1022                    | 62829                 |
| 2024-03-13 11:59:47 | 1029                    | 63533                 |
| 2024-03-13 12:00:47 | 1036                    | 64237                 |
| 2024-03-13 12:01:47 | 1043                    | 64944                 |
| 2024-03-13 12:02:47 | 1050                    | 65651                 |
| 2024-03-13 12:03:47 | 1057                    | 66355                 |
| 2024-03-13 12:04:47 | 1064                    | 67059                 |
+---------------------+-------------------------+-----------------------+

Testing

We have currently rolled out dbstat on our test and production systems to test it and see whether our assumptions regarding stability and calculations of the quantity structure are correct. In addition, using it ourselves is the best way to find out if something is missing or if the handling is impractical (Eat your own dog food).

Sources