You are here
MariaDB Galera Cluster with Corosync/Pacemaker VIP
Sometimes customers want to have a very simple Galera Cluster set-up. They do not want to invest into machines and build up the know-how for load balancers in front of the Galera Cluster.
For this type of customers there is a possibility to just run a VIP controlled by Corosync/Pacemaker in front of the Galera Cluster moving an IP address from one node to the other. But this is just an active/passive/passive set-up and reads and writes are only possible to one node at the time.
So you loose the scaling read/write and load-balancing functionality and just have the high availability feature left.
Corosync/Pacemaker
A few words upfront about Corosync/Pacemaker:
Pacemaker is a Cluster Resource Manager (CRM) (similar to InitV or SystemD). It "is the thing that starts and stops services (like your database or mail server) and contains logic for ensuring both that they are running, and that they are only running in one location (to avoid data corruption)." [ 1 ]
Corosync on the other hand is the thing that provides the messaging layer and talks to instances of itself on the other node(s). Corosync provides reliable communication between nodes, manages cluster membership and determines quorum. Think of Corosync as dbus but between nodes.
The following proof of concept is based on Pacemaker 2.0 and Corosync 3.0. Commands for older versions of Corosync/Pacemaker may vary slightly.
# crmadmin --version Pacemaker 2.0.1 # corosync -v Corosync Cluster Engine, version '3.0.1'
Prerequisites
- DNS resolution must work.
- Nodes must be reachable (firewall).
- Nodes must allow traffic between them.
The following steps must be performed on all 3 nodes unless specified otherwise:
DNS resolution
Add the hosts to your /etc/host
file (or however you do hostname resolution in your set-up):
# # /etc/hosts # 192.168.56.103 node1 192.168.56.133 node2 192.168.56.134 node3
Especially pay attention to choose the right IP address if you have different network interfaces: One for inter-cluster-communication (192.168.56.*) and one for application-traffic (192.168.1.*).
Check all the nodes on all the nodes:
# ping node1 # ping node2 # ping node3
Firewall
Check your firewall settings:
# iptables -L # systemctl status firewalld
A simple Corosync/Pacemaker Cluster needs the following firewall settings [ 3 ]:
- TCP port 2224 for
pcsd
, Web UI and node-to-node communication. - TCP port 3121 if cluster has any Pacemaker Remote nodes.
- TCP port 5403 for quorum device with
corosync-qnetd
. - UDP port 5404 for corosync if it is configured for multicast UDP.
- UDP port 5405 for corosync.
Install Corosync/Pacemaker
Install the Corosync/Pacemaker packages:
# apt-get install pacemaker pcs
The user which is used for the Corosync/Pacemaker Cluster is the following:
# grep hacluster /etc/passwd hacluster:x:106:112::/var/lib/pacemaker:/usr/sbin/nologin
Set the password for the Corosync/Pacemaker Cluster user:
# passwd hacluster New password: Retype new password: passwd: password updated successfully
Configuring the Corosync/Pacemaker Cluster
Start the Pacemaker/Corosync Configuration System Daemon (pcsd
):
# systemctl enable pcsd # systemctl start pcsd # systemctl status pcsd --no-pager # journalctl -xe -u pcsd --no-pager
Authenticate the nodes in the Cluster (on one node only):
# pcs host auth node1 node2 node3 Username: hacluster Password: node1: Authorized node3: Authorized node2: Authorized
If something fails the following command will do the undo operation:
# pcs pscd clear-auth [node]
Create the Corosync/Pacemaker Cluster
To create the Corosync/Pacemaker Cluster run the following command (on one node only):
# pcs cluster setup galera-cluster --start node1 node2 node3 --force No addresses specified for host 'node1', using 'node1' No addresses specified for host 'node2', using 'node2' No addresses specified for host 'node3', using 'node3' Warning: node1: Cluster configuration files found, the host seems to be in a cluster already Warning: node3: Cluster configuration files found, the host seems to be in a cluster already Warning: node2: Cluster configuration files found, the host seems to be in a cluster already Destroying cluster on hosts: 'node1', 'node2', 'node3'... node1: Successfully destroyed cluster node3: Successfully destroyed cluster node2: Successfully destroyed cluster Requesting remove 'pcsd settings' from 'node1', 'node2', 'node3' node3: successful removal of the file 'pcsd settings' node1: successful removal of the file 'pcsd settings' node2: successful removal of the file 'pcsd settings' Sending 'corosync authkey', 'pacemaker authkey' to 'node1', 'node2', 'node3' node1: successful distribution of the file 'corosync authkey' node1: successful distribution of the file 'pacemaker authkey' node3: successful distribution of the file 'corosync authkey' node3: successful distribution of the file 'pacemaker authkey' node2: successful distribution of the file 'corosync authkey' node2: successful distribution of the file 'pacemaker authkey' Synchronizing pcsd SSL certificates on nodes 'node1', 'node2', 'node3'... node2: Success node3: Success node1: Success Sending 'corosync.conf' to 'node1', 'node2', 'node3' node1: successful distribution of the file 'corosync.conf' node2: successful distribution of the file 'corosync.conf' node3: successful distribution of the file 'corosync.conf' Cluster has been successfully set up. Starting cluster on hosts: 'node1', 'node2', 'node3'...
This command creates the file: /etc/corosync/corosync.conf
.
The command pcs cluster start
will trigger Pacemaker and Corosync start in the background:
# systemctl status pacemaker --no-pager # systemctl status corosync --no-pager
Undo if something fails:
# pcs cluster destroy
Check your Corosync/Pacemaker Cluster:
# pcs status Cluster name: galera-cluster WARNINGS: No stonith devices and stonith-enabled is not false Stack: corosync Current DC: node3 (version 2.0.1-9e909a5bdd) - partition with quorum Last updated: Mon Mar 15 15:45:21 2021 Last change: Mon Mar 15 15:40:45 2021 by hacluster via crmd on node3 3 nodes configured 0 resources configured Online: [ node1 node2 node3 ] No resources Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
To start the pacemaker
and corosync
services at system restart enable them in SystemD (on all 3 nodes again):
# systemctl enable pacemaker # systemctl enable corosync
Add Corosync/Pacemaker Resources
A resource is a service which is managed by the Cluster. For example a Web-Server, a database instance or a Virtual IP address.
Add a Virtual IP (VIP) address resource (aka Floating IP, on one node only):
# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.1.199 cidr_netmask=32 op monitor interval=5s # pcs status resources VirtualIP (ocf::heartbeat:IPaddr2): Stopped # pcs status cluster Cluster Status: Stack: corosync Current DC: node3 (version 2.0.1-9e909a5bdd) - partition with quorum Last updated: Mon Mar 8 16:54:03 2021 Last change: Mon Mar 8 16:52:32 2021 by root via cibadmin on node1 3 nodes configured 1 resource configured PCSD Status: node2: Online node3: Online node1: Online # pcs status nodes Pacemaker Nodes: Online: node1 node2 node3 Standby: Maintenance: Offline: Pacemaker Remote Nodes: Online: Standby: Maintenance: Offline: # pcs resource enable VirtualIP # pcs status Cluster name: galera-cluster WARNINGS: No stonith devices and stonith-enabled is not false Stack: corosync Current DC: node3 (version 2.0.1-9e909a5bdd) - partition with quorum Last updated: Mon Mar 15 15:53:07 2021 Last change: Mon Mar 15 15:51:29 2021 by root via cibadmin on node2 3 nodes configured 1 resource configured Online: [ node1 node2 node3 ] Full list of resources: VirtualIP (ocf::heartbeat:IPaddr2): Stopped Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
As we can see the resource VirtualIP
is still stopped. To get more information, you can run the following command:
# crm_verify -L -V (unpack_resources) error: Resource start-up disabled since no STONITH resources have been defined (unpack_resources) error: Either configure some or disable STONITH with the stonith-enabled option (unpack_resources) error: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid
Beacause we do NOT have shared data (Galera Cluster is a shared-nothing architecture) we do not need STONITH:
# pcs property set stonith-enabled=false
After stonith-enabled
is set to false the VIP will be started:
# pcs resource status VirtualIP (ocf::heartbeat:IPaddr2): Started node1 # ip -f inet addr show enp0s8 3: enp0s8:mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 inet 192.168.1.122/24 brd 192.168.1.255 scope global dynamic enp0s8 valid_lft 84918sec preferred_lft 84918sec inet 192.168.1.199/32 brd 192.168.1.255 scope global enp0s8 valid_lft forever preferred_lft forever
Because quorum and fencing is done also by Galera Cluster we do not want interferrence by Corosync/Pacemaker. Thus we set the no-quorum-policy
to ignore
:
# pcs property set no-quorum-policy=ignore
Graceful manual switchover
The rudest variant moving a resource away from a node is to take if offline:
# pcs cluster stop node2 node2: Stopping Cluster (pacemaker)... node2: Stopping Cluster (corosync)... # pcs cluster start node2 node2: Starting Cluster...
A softer possibility moving a resource away from a node is by putting the node into standby:
# pcs node standby node2
To get it back will move the resource to the node again:
# pcs node unstandby node2
Both methods have in common, that the resource is moved back when the node in online again. This is possibly not what you want. To nicest way moving a resource away is the move command:
# pcs resource status VirtualIP (ocf::heartbeat:IPaddr2): Started node2 # pcs resource move VirtualIP node3 # pcs resource status VirtualIP (ocf::heartbeat:IPaddr2): Started node3
Prevent Resources from Moving back after Recovery
To prevent a resource moving around we can define a stickiness for a resource:
# pcs resource defaults No defaults set # pcs resource defaults resource-stickiness=100 Warning: Defaults do not apply to resources which override them with their own defined values # pcs resource defaults resource-stickiness: 100
With later tests I have seen that a resource stickiness of INFINIY
gave some better, but not perfect results.
Graphical Web User Interface
Pacemaker/Corosync also provides a Graphical Web User Interface. It can be reached via all IP addresses/interfaces of each node:
# netstat -tlpn | grep -e python -e PID Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:2224 0.0.0.0:* LISTEN 16550/python3 tcp6 0 0 :::2224 :::* LISTEN 16550/python3
It can simply be reached via the following Link: https://127.0.0.1:2224/login
The user and password are the same as you used above setting up the Cluster.
If you plan to NOT use the Web-GUI you can disable it on all nodes in the following files: /etc/default/pcsd
(Debian, Ubuntu) or /etc/sysconfig/pcsd
(CentOS) followed by a restart of the pcsd
process.
Improvements
There is still some space for improvements: If a Galera node becomes not Synced
(also including Donor/Desynced
?) the VIP address should also move somewhere else. One possibility is to hook this into the wsrep_notify_command
variable:
[mysqld] wsrep_notify_command = pcs_standby_node.sh
The script pcs_standby_node.sh
should cover the following scenarios:
Sceario | w/o script | with script |
---|---|---|
Machine halts suddenly (power off) | OK | OK |
Machine reboots/restarts | OK | OK |
Split Brain | OK*** | OK*** |
Instance restarts | NOK* | OK |
Instance goes non-synced | NOK* | OK |
Instance dies (crash, Oom, kill -9) | NOK* | NOK** |
Max connections reached | NOK* | NOK** |
* Your application will experience errors such as:
ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.1.199' (111) ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104 ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use
** For this last case we need some more tooling...
*** Not tested but should work.
If you let Galera run the scrip now you will get some errors:
sudo /usr/sbin/pcs node unstandby node1 sudo: unable to change to root gid: Operation not permitted sudo: unable to initialize policy plugin ret=1
To make the script work we have to add the mysql
user to the haclient
group and add some ACLs [ 11 ]:
# grep haclient /etc/group haclient:x:112: # usermod -a -G haclient mysql # pcs acl enable # pcs acl role create standby_r description="Put node to standby" write xpath /cib # pcs acl user create mysql standby_r # pcs acl
Now the failover works quiet smooth and I have not seen any errors any more. Just sometimes the connections hang. I tried to reduce the hang with reducing tcp_retries2
to 3 as suggested here [ 10 ] but it did not help. If anybody has a hint please let me know!
General thoughts
- A Corosync/Pacemaker Cluster is IMHO too complicated (!= KISS) for a simple VIP failover solution!
- Probably keepalived is the simpler solution. See also: [ 4, 5 and 6 ]
Literature
- 1 Pacemaker, Heartbeat, Corosync, WTF?
- 2 Corosync vs Pacemaker: wrong usage of "Corosync"
- 3 Configuring the iptables Firewall to Allow Cluster Components
- 4 Unbreakable MySQL Cluster with Galera and Linux Virtual Server (LVS)
- 5 Making HAProxy High Available for MySQL Galera Cluster
- 6 MariaDB master/master GTID based replication with keepalived VIP
- 7 Clusters from Scratch
- 8 Perform a Failover
- 9 Prevent Resources from Moving after Recovery
- 10 What value should I set for the tcp_retries2 parameter?
- 11 Setting user permissions for a Pacemaker cluster
- Shinguz's blog
- Log in or register to post comments