You are here

Do not underestimate performance impacts of swapping on NUMA database systems

If your MariaDB or MySQL database system is swapping it can have a significant impact on your database query performance! Further it can also slow down your database shutdown and thus influence the whole reboot of your machine. This is especially painful if you have only short maintenance windows or if you do not want to spend the whole night with operation tasks.

When we do reviews of our customer MariaDB or MySQL database systems one of the items to check is Swap Space and swapping. With the free command you can find if your system has Swap Space enabled at all and how much of your Swap Space is used:

# free
              total        used        free      shared  buff/cache   available
Mem:       16106252     3300424      697284      264232    12108544    12011972
Swap:      31250428     1701792    29548636

With the command:

# swapon --show
NAME      TYPE       SIZE USED PRIO
/dev/sdb2 partition 29.8G 1.6G   -1

you can show on which disk drive your Swap Space is physically located. And with the following 3 commands you can find if your system is currently swapping or not:

# vmstat 1
procs ------------memory------------ ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd    free   buff    cache   si   so    bi    bo   in   cs us sy id wa st
 1  0 1701784 692580 355716 11757864    2   12   685   601  237  146  9  3 86  2  0
 0  0 1701784 692472 355720 11757840    0    0     0   196  679 2350  2  1 97  1  0
 0  0 1701784 692720 355728 11757332    0    0     0   104  606 2170  0  1 98  1  0

# sar -W 1
15:44:30     pswpin/s pswpout/s
15:44:31         0.00      0.00
15:44:32         0.00      0.00
15:44:33         0.00      0.00

# sar -S 1
15:43:02    kbswpfree kbswpused  %swpused  kbswpcad   %swpcad
15:43:03     29548648   1701780      5.45     41552      2.44
15:43:04     29548648   1701780      5.45     41552      2.44
15:43:05     29548648   1701780      5.45     41552      2.44

Side note: Recent Linux distributions tend to use Swap Files instead of Swap Partitions. The performance impact seems to be negligible compared to the operational advantages of Swap Files... [ 1 ] [ 2 ] [ 3 ] [ 4 ]

What is Swap Space on a Linux system

Modern Operating Systems like Linux manage Virtual Memory (VM) which consists of RAM (fast) and Disk (HDD very slow and SSD slow). If the Operating System is short in fast RAM it tries to write some "old" pages to slow disk to get more free fast RAM for "new" pages and/or for the file system cache. This technique enables the Operating System to keep more and/or bigger processes running than physical RAM is available (overcommitment of RAM).
If one of those "old" pages is needed again it has to be swapped in which technically is a physical random disk read (which is slow, this is also called a major page fault).
If this block is a MariaDB or MySQL database block this disk read to RAM will slow down your SELECT queries but also INSERT, UPDATE and DELETE when you do write queries. This can severely slow down for example your clean-up jobs which have to remove "old" data (located on disk possibly in Swap Space).

Sizing of Swap Space for database systems

A rule of thumb for Swap Space is: Have always Swap Space but never use it (disk is cheap nowadays)!

A reasonable Swap Space sizing for database systems is the following:

Amount of RAMSwap Space
4 GiB of RAM or lessa minimum of 4 GiB of Swap Space, is this really a Database server?
8 GiB to 16 GiB of RAMa minimum of once the amount of RAM of Swap Space
24 GiB to 64 GiB of RAMa minimum of half the amount of RAM of Swap Space
more than 64 GiB of RAMa minimum of 32 GiB of Swap Space

If you have a close look at your Swap usage and if you monitor your Swap Space precisely and if you know exactly what you are doing you can lower these values...

It is NOT recommended to disable Swap Space

Some people tend to disable Swap Space. We see this mainly in virtualized environments (virtual machines) and cloud servers. From the VM/Cloud administrator point of view I can even understand why they disable Swap. But from the MariaDB / MySQL DBA point of view this is a bad idea.

If you do proper MariaDB / MySQL configuration (innodb_buffer_pool_size = 75% of RAM) the server should not swap a lot. But if you exaggerate with memory configuration the system starts swapping heavily. Till to the end the OOM-Killer will be activated by your Linux killing the troublemaker (typically the database process). If you have sufficient Swap Space enabled you get some time to detect a bad database configuration and act accordingly. If you have Swap Space disabled completely you do not get this safety buffer and OOM killer will act immediately and kill your database process when you run out of RAM. This really cannot be in the interest of the DBA.

Some literature to read further about Swap: In defence of swap: common misconceptions

Influence swapping - Swappiness

The Linux kernel documentation tells us the following about swappiness:

swappiness

This control is used to define how aggressive the kernel will swap memory pages. Higher values will increase aggressiveness, lower values decrease the amount of swap. A value of 0 instructs the kernel not to initiate swap until the amount of free and file-backed pages is less than the high water mark in a zone.

The default value is 60.

Source: Documentation for /proc/sys/vm/*

A informative article on StackExchange: Why is swappiness set to 60 by default?

To change your swappiness the following commands will help:

# sysctl vm.swappiness 
vm.swappiness = 60

# sysctl vm.swappiness=1

# sysctl vm.swappiness 
vm.swappiness = 1

To make these changes persistent you have to write it to some kind of configuration file dependent on your Operating System:

#
# /etc/sysctl.d/99-sysctl.conf
#
vm.swappiness=1

Who is using the Swap Space?

For further analysing your Swap Space and to find who is using your Swap Space please see our article MariaDB and MySQL swap analysis.

What if your system is still swapping? - NUMA!

If you did everything correctly until here and your system is still swapping you possibly missed one point: NUMA systems behave a bit tricky related to Databases and swapping. The first person who wrote extensively about this problem in the MySQL ecosystem was Jeremy Cole in 2010 in his two well written articles which you can find here:

What NUMA is you can find here: Non-uniform memory access.

If you have spent your money for an expensive NUMA system you can find with the following command:

# lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                56
On-line CPU(s) list:   0-55
Thread(s) per core:    2
Core(s) per socket:    14
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
Stepping:              1
CPU MHz:               2600.000
CPU max MHz:           2600.0000
CPU min MHz:           1200.0000
BogoMIPS:              5201.37
Virtualization:        VT-x
Hypervisor vendor:     vertical
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              35840K
NUMA node0 CPU(s):     0-13,28-41
NUMA node1 CPU(s):     14-27,42-55

If you are now in the unfortunate situation of having such a huge box with several sockets you can do different things:

  • Configuring your MariaDB / MySQL database to allocate memory evenly on both sockets with the parameter innodb_numa_interleave. This works since MySQL 5.6.27, MySQL 5.7.9 and MariaDB 10.2.4 but there were various bugs in this area in Debian and CentOS packages (e.g. #80288, #78953, #79354 and MDEV-18660).
  • Disable NUMA support in your BIOS (Node Interleaving = enabled). Then there is no NUMA presentation to the Operating System any more.
  • Start your MariaDB / MySQL database with numactl --interleave all as described here: MySQL and NUMA.
  • Set innodb_buffer_pool_size to 75% of half of your RAM. Sad for having too much of RAM.
  • Playing around with the following Linux settings could help to decrease swapping: vm.zone_reclaim_mode=0 and kernel.numa_balancing=0.

Some NUMA map examples

Download: numa-maps-summary.pl

un-balanced NUMA system balanced NUMA sytem
./numa-maps-summary.pl < /proc/$(pidof mysqld)/numa_maps
N0        :         28978412 (110.54 GB)
N1        :         10777305 ( 41.11 GB)
active    :         35345437 (134.83 GB)
anon      :         39751779 (151.64 GB)
dirty     :         39751779 (151.64 GB)
kernelpagesize_kB:     23320 (  0.09 GB)
mapmax    :              367 (  0.00 GB)
mapped    :             4033 (  0.02 GB)
./numa-maps-summary.pl < /proc/$(pidof mysqld)/numa_maps
N0        :         11819082 ( 45.09 GB)
N1        :          8917524 ( 34.02 GB)
active    :              247 (  0.00 GB)
anon      :         20732465 ( 79.09 GB)
dirty     :         20732465 ( 79.09 GB)
kernelpagesize_kB:     24600 (  0.09 GB)
mapmax    :              398 (  0.00 GB)
mapped    :             4236 (  0.02 GB)

Sources

Some further information about Swap Space you can find here:

Taxonomy upgrade extras: 

Comments

Why does Linux swap out pages when I have many pages cached and vm.swappiness is set to 0? Shouldn't cached pages get resized and no swapping should occur? https://www.quora.com/Why-does-Linux-swap-out-pages-when-I-have-many-pag...
Shinguzcomment

  • swappiness can have a value between 0 and 100.
  • swappiness=0:
    • Kernel version 3.5 and newer: disables swappiness.
    • Kernel version older than 3.5: avoids swapping processes out of physical memory for as long as possible.
  • swappiness=1:
    • Kernel version 3.5 and over: minimum swappiness without disabling it entirely.
  • swappiness=100:
    • Tells the kernel to aggressively swap processes out of physical memory and move them to swap cache.

Source: How do I configure swappiness?

Shinguzcomment

InnoDB NUMA interleave with MariaDB also read here.
Shinguzcomment