MySQL Cluster restore

Thu, 2007-03-22 19:05 — Shinguz

Recently the question came up if it is faster to restore a MySQL cluster when all nodes are up or only ONE node from each node group during restore.

The answer from our gurus was: All nodes up during restore! I wanted to find out why. So I set up the following cluster and started to measure:

MySQL Cluster set up

Cluster set-up

MySQL Cluster backup

The backup is not that interesting. But I made the drawing for possible future use :-) :

Backup

MySQL Cluster restore

For the restore there are 4 different ways thinkable:

Restore with all nodes up and all 4 backup pieces are restored in sequence. (1a)
Restore with all nodes up and all 4 backup pieces are restored in parallel. (1b)
Restore with 1 node of each node group down and all 4 backup pieces are restored in sequence. (2a)
Restore with 1 node of each node group down and all 4 backup pieces are restored in parallel. (2b)

Restore test

And we got the following times:

Way	Initial	Stop	Meta	Data	Exit	Total	Start
1a	~ 30 - 35 s	n.a.	~ 80 - 85 s	~ 170 s	< 1 s	~ 280 - 290 s	n.a.
1b	~ 30 - 35 s	n.a.	~ 80 - 85 s	~ 130 s	< 1 s	~ 240 - 250 s	n.a.
2a	~ 30 - 35 s	~ 20 - 25 s	~ 115 s ¹⁾	~ 220 s	< 1 s	~ 385 - 395 s	~ 215 - 225 s
2b	~ 30 - 35 s	~ 20 - 25 s	~ 115 s ¹⁾	~ 115 s ^{1) 2)}	< 1 s	~ 280 - 290 s	~ 215 - 225 s
Comments	The same for all.		30 s sleep?		negligible	Now we know it!	not in critical path

Remarks:

1. It seems like the client is waiting for around 30 seconds in this constellation.
2. I got the following error message. According to our support this is OK and does not loose any data!

Temporary error: 4006: Connect failure - out of connection objects (increase MaxNoOfConcurrentTransactions)
Restore successful, but encountered temporary error, please look at configuration.
NDBT_ProgramExit: 0 - OK

Conclusion

As announced it is the fastest way to restore a MySQL cluster when all nodes are up and parallel restore is done. In addition you have the advantage of crash resistance during restore when all nodes are there and the handling is also easier.
What you should consider is: If you restore in parallel you cannot run the cluster in SINGLE USER MODE. You have to protect your cluster in this situation in an other way.

Taxonomy upgrade extras: