You are here

What can MySQL performance monitoring graphs tell you?

Many of you may monitor their databases for different purposes. Beside alerting it is often good to also make some graphs from MySQL performance counters to see what is actually happening on your database.

The following graphs where made with our FromDual Performance Monitor for MySQL as a Service (MaaS) set-up. If you do not have the time to install a performance monitoring yourself please feel free to contact us for our MaaS solution.

Overview

First of all it is a good idea to have an overview of all the settings in you different databases and if they are compliant to your standards.

Here it looks like two of our databases are still running with Statement Bases Replication (SBR). Further the MySQL General Query Log is enabled which is non optimal for write performance and the default Storage Engine is still set to MyISAM which is not wanted in our case.

mpm_snapshot_01.png

InnoDB

This server is mostly an InnoDB Server. We can see that we have some write traffic because the InnoDB Buffer Pool has constantly 15 - 20% of dirty pages. Further we see that we have a very constant level of dirty pages:

mpm_snapshot_02.png

Here I guess, the database was restarted on Wednesday:

mpm_snapshot_24.png

We can clearly see the positive impact of MySQL partitioning:

buffer_pool_2.png

Read and write are here more or less equally high with some strong spikes:

mpm_snapshot_03.png

If we have a closer look we can see that one typical spike is always at 06:00 in the morning. It is read and write so it is possibly NOT a backup but more likely a reporting or maintenance job:

mpm_snapshot_04.png

This system does mostly write with a heavy read phase during midnight:

mpm_snapshot_25.png

The read starts at 23:00 and ends at 03:30. It could be some nightly reporting?

mpm_snapshot_26.png

Extreme read and write spikes. Not good for a system that likes to have close to real time behavior:

mpm_snapshot_42.png

The consequences we can see immediately, Locking:

mpm_snapshot_43.png

When we look at the InnoDB Locking we can see that this job at 06:00 in the morning causes some lockings. Up to 2.5 seconds. If this causes troubles we really have to investigate what is running at that time:

mpm_snapshot_05.png

If we look at the last 14 days we can see a huge read spike some time ago, what has happened there? This has possibly influenced the whole system as well:

mpm_snapshot_06.png

Here we have a special InnoDB read pattern. Can you see it? Every 3 hours at xx:30 it happens. We should try to find out what it is:

mpm_snapshot_27.png

From time to time we can see some big transactions:

mpm_snapshot_44.png

MyISAM

Currently and over the last few days our MyISAM key buffer was mostly empty. But the high water mark Key_blocks_used indicates that they were used in the past. So we should try to find out if this key buffer is still used and if not if we can free this memory (about 2.4 Gbyte):

mpm_snapshot_07.png

From time to time we see some MyISAM key reads from memory. This could be caused by MyISAM table indexes or by temporary tables gone to disk. So not really much MyISAM traffic at all and it is mostly happening during week days:

mpm_snapshot_08.png

This is a MyISAM mostly system but nevertheless we do not have MyISAM table locking issues:

mpm_snapshot_33.png

Connections

We are not far from hitting max_connections. So either we increase this value or we think about why we have so many connections open concurrently. On February 5th we had many concurrent running threads. That was surely not good for the whole system performance:

mpm_snapshot_09.png

On February 12th there was possibly something wrong with the application: We see many aborted clients what never had before:

mpm_snapshot_10.png

It looks like thread cache was always big enough:

mpm_snapshot_11.png

On this server we definitly had a problem and we hit the roof with the amount of connections. I am wondering if it would not be more useful to even lower max_connections here?

mpm_snapshot_28.png

Network traffic

Network traffic was growing a bit last week but now is stable again. Our network should not have a problem with this load:

mpm_snapshot_12.png

This is a read mostly database. The patter comes from sorts and handler_read_rnd (see below):

mpm_snapshot_30.png

Handler Interface

Here we see that we do many handler_read_next and handler_read_rnd_next operations. This is equivalent to index range scans and full table scans. Our spike at February 11th seems to come from such an index range scan:

mpm_snapshot_13.png

This server does mostly read by a full table scan! There is a huge potential for optimization so the server could cover more load if the queries were done more optimal:

mpm_snapshot_31.png

It seems there were some UPDATEs involved on 8th:

mpm_snapshot_14.png

We see many INSERTs but few to no DELETEs. This means the database size is steadly growing. We should think about archiving old data?

mpm_snapshot_15.png

OK. This seems to be a maintenance job. Luckily it was set off-peak hours (08:00):

mpm_snapshot_45.png

Somebody does evil things hear: handler_read_rnd is bad for performance. And it pretty much correlates with the network traffic from above:

mpm_snapshot_29.png

Sort

Sort behavior seems to have changed significantly on Monday. We should find out if application has released a new version:

mpm_snapshot_16.png

On this server we see some sort_merge_passes which is a sign of a too small sort buffer or just huge sorts:

mpm_snapshot_32.png

Queries

The majority of queries sent against this database are SELECT's:

mpm_snapshot_17.png

Temporary Tables

The use of temporary tables has changed. This is a sign again that something in the application was modified:

mpm_snapshot_18.png

If we look a bit closer we can see that the use of temporary disk tables has increased. We should keep an eye on this:

mpm_snapshot_19.png

Here we most probably have a problem with temporary tables on disk. It happens quite periodical and predictable so we can investigate who causes those:

mpm_snapshot_34.png

If you look closer at it we can see that it is a similar pattern as with the data reads from above:

mpm_snapshot_35.png

MySQL Process information

The number of page faults has changed dramatically over the weekend. So what was changed there?

mpm_snapshot_20.png

Query Cache

There are no query cache hits but Query Cache is enabled! There are several reasons for this. We should investigate or disable Query Cache at all:

mpm_snapshot_21.png

We have some low memory prunes. Shall we increase Query Cache size?

mpm_snapshot_36.png

Hmmm. It is already quite big. Better to defragment it from time to time to get rid of the free memory:

mpm_snapshot_37.png

Table Definition Cache

The Table Definition Cache is much to big here. A value of 256 (default) would be far more than enough. Better to release the resources:

mpm_snapshot_22.png

This is the opposite case. Here the Table Definition Cache is much too small at least when the job at 23:00 is running:

mpm_snapshot_38.png

Table (Open) Cache

Same situation here with the Table Open Cache. A value of 3 - 4k would be enough. Better to release the resources. There are some cases known where a too big table cache causes performance issues:

mpm_snapshot_23.png

Same here. Too small at midnight jobs:

mpm_snapshot_39.png

Binary logging (Master)

The Binary Log has a size of about 100 Mbyte (Debian/Ubuntu?) and is filled up every hour (ca. 25 kybte/s). During the night we have some less traffic, during day some more traffic:

mpm_snapshot_40.png

From time to time we have some spikes in binary log traffic. But binary log cache seems to be always big enough:

mpm_snapshot_41.png

Replication Slave

The Slave was not lagging very often and only for short time:

mpm_snapshot_46.png

If you want such performance graphs as well from your system, just let us know!