How does Galera Cluster behave with many nodes?

Fri, 2025-01-24 18:12 — Shinguz

Recently I had the opportunity to have a lot of Linux systems (VMs with Rocky Linux 9) from one of our regular Galera Cluster trainings all to myself for a week. And MariaDB 11.4.4 with Galera Cluster was already installed on the machines.

Since I had long wanted to try out how a Galera Cluster behaves with an increasing number of nodes, now was the opportunity to try it out.

The following questions were to be answered:

How does the throughput of a Galera cluster behave depending on the number of Galera nodes?
Which configuration gives us the highest throughput?

A total of 5 different test parameters were experimented with:

Number of Galera nodes.
Number of client machines (= instances).
Number of threads per client (--threads=).
Number of Galera threads (wsrep_slave_threads).
Runtime of the tests. This parameter was varied because some tests were cancelled during the run. It may be possible to eliminate this parameter with a lower rate (--rate) in the load test. As it turned out, it did have an influence on the result or the measured throughput (e.g. test 4b and 5 or 18 and 19).

A total of 35 different tests were run. See raw data.

Throughput as a function of the number of Galera nodes

graph

Throughput related to # nodes
Test	# gal nodes	# threads/client	runtime [s]	tps	runtime [s]
7	1	8	180	596.3	180
8	2	8	180	567.8	180
9	3	8	180	531.9	180
11	4	8	180	495.2	180
12	5	8	180	492.2	180
13	6	8	180	502.9	180
14	7	8	180	459.5	180
15	8	8	180	458.6	180
16	9	8	180	429.2	180

The throughput in the Galera cluster decreased slightly from 600 tps to 430 tps (28%) when the number of nodes was increased from 1 to 9.

Throughput as a function of the number of connections

The main variation here was with the number of clients and threads per client. The optimum seems to be 30 - 40 connections in this setup. Varying the number of Galera threads (wsrep_slave_threads) does not seem to have had much effect in our case. The system does not seem to be able to deliver much more than 1200 tps. In particular, the machines of the described Galera nodes did not have too much CPU idle time.

graph

Total # connections vs. throughput
Test	# client nodes	# threads/client	# con tot	# gal threads	runtime [s]	tps
16	1	8	8	1	180	429.2
17	2	8	16	1	180	684.5
18	3	8	24	1	180	603.8
19	3	8	24	1	120	925.2
20	3	8	24	1	120	919.8
21	4	8	32	1	120	1081.1
22	5	8	40	1	120	1196.0
23	5	8	40	4	120	1132.2
23b	5	8	40	8	120	1106.0
24	5	16	80	4	120	1233.8
25	5	32	160	4	120	1095.7

Throughput as a function of all possible parameters

By further varying the parameters, in particular by reducing the number of Galera nodes from 9 to 3, the throughput could be further increased from just under 1200 to just over 1400 tps.

graph

Throughput related to various different parameters
Test	# gal nodes	# client nodes	# threads/client	# con tot	tps
23	9	5	8	40	1132.2
23b	9	5	8	40	1106.0
24	9	5	16	80	1233.8
25	9	5	32	160	1095.7
26	8	5	32	160	1132.4
27	7	5	32	160	1207.6
28	6	5	16	80	1333.3
29	5	5	8	40	1278.6
30	5	5	8	40	1281.5
31	4	5	8	40	1374.1
32	3	5	8	40	1304.3
33	3	6	8	48	1428.9

With the given hardware, there seems to be an optimum somewhere around 3 Galera nodes and approx. 40 connections. More detailed clarifications would be interesting here...

Statistical Design of Experiments (DoE)

Here it would be exciting to work with the method of statistical design of experiments to determine this optimum more precisely or to find it more quickly.

Mettler Toledo: DoE - Statistical Design of Experiments - A statistical approach to reaction optimisation.
Wikipedia.en: Statistical design of experiments..
Design of Experiments - DoE.
Novustat: Quality Engineering: Statistical Design of Experiments - Our ultimate overview!.

Hardware specification

VM's from Hetzner: CX22 (2 vCPU, 4 Gibyte RAM (effective: 3.5 Gibyte (why that?)), 40 Gibyte disc)

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          40 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   2
  On-line CPU(s) list:    0,1
Vendor ID:                GenuineIntel
  BIOS Vendor ID:         QEMU
  Model name:             Intel Xeon Processor (Skylake, IBRS, no TSX)
    BIOS Model name:      NotSpecified
    CPU family:           6
    Model:                85
    Thread(s) per core:   1
    Core(s) per socket:   2
    Socket(s):            1
    Stepping:             4
    BogoMIPS:             4589.21
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2a
                          pic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault pti ssbd ibrs ibpb fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clwb avx512cd avx512bw avx512vl xs
                          aveopt xsavec xgetbv1 xsaves arat pku ospke md_clear
Virtualization features:
  Hypervisor vendor:      KVM
  Virtualization type:    full
Caches (sum of all):
  L1d:                    64 KiB (2 instances)
  L1i:                    64 KiB (2 instances)
  L2:                     8 MiB (2 instances)
  L3:                     16 MiB (1 instance)

Benchmark tool / load generator

sysbench was used as a load generator.

# dnf install epel-release
# dnf install sysbench

Each client runs on its own scheme to avoid Galera cluster conflicts. In reality, this is not always the case, but it is the optimal case for Galera.

SQL> CREATE DATABASE sbtest<n>;

Each client connects to a different Galera node (1 - 6 clients distributed on 1 - 9 Galera nodes).

GALERA_IP=<galera_ip>
DATABASE=sbtest<n>

# sysbench oltp_common --mysql-host=${GALERA_IP} --mysql-user=app --mysql-password=secret --mysql-db=${DATABASE} --db-driver=mysql prepare
# sysbench oltp_read_write --time=180 --db-driver=mysql --mysql-host=${GALERA_IP} --mysql-user=app --mysql-password=secret --mysql-db=${DATABASE} --threads=8 --rate=1000 --report-interval=1 run
# sysbench oltp_common --mysql-host=${GALERA_IP} --mysql-user=app --mysql-password=secret --mysql-db=${DATABASE} --db-driver=mysql cleanup

MariaDB and Galera configuration

[server]

binlog_format = row
innodb_autoinc_lock_mode = 2
innodb_flush_log_at_trx_commit = 2
query_cache_size = 0
query_cache_type = 0

wsrep_on = on
wsrep_provider = /usr/lib64/galera-4/libgalera_smm.so
wsrep_cluster_address = "gcomm://10.0.0.2,10.0.0.3,10.0.0.4,10.0.0.5,10.0.0.6,10.0.0.7,10.0.0.8,10.0.0.9,10.0.0.10,10.0.0.11,10.0.0.12,10.0.0.13,10.0.0.14,10.0.0.15,10.0.0.16,10.0.0.17"
wsrep_cluster_name = 'Galera Cluster'
wsrep_node_address = 10.0.0.2
wsrep_sst_method = rsync
wsrep_sst_auth = sst:secret