GlusterFS Split-Brain Recovery Made Easy
Posted by Joe Julian 1 year, 2 months ago (comments)
Split brain. Sounding like something from a B movie about zombies, it's probably more terrifying to data storage people than flesh eating undead would be.
Split-brain occurs when two or more replicated copies of a file become divergent independantly from each other. This can happen due to a network partition where some clients write to one server while other clients write to another, or through partitions over time, where server1 is taken out of service, writes happen to server2. server1 is returned to service and server2 is removed without the files having been healed. Writes occur on server1 and when server2 is returned to service, each has writes independant of one another.
Prior to this post, fixing split brain files in clustered systems required finding the file that needed healed on whichever brick it happened to be on, reading the extended attributed. Extrapolating path and file locations and removing them on one (or more, depending on the replica count) bricks.
Recently, however, I tried splitting the volume definition such that the translator graph was split to produce separate mounts for each replica. This maintains the distribute properties and allows you the same single namespace you would have with a normal mounted volume. Thus was born splitmount.
What can I do with it?
Take a file, /life/lessons/chocolate/gump.txt on volume myvol1 that reports as split-brain in the report from "gluster volume heal myvol1 info split-brain". We simply mount the volume with splitmount, check both versions of the file, pick a good one and delete the other.
# splitmount server1 myvol1 /tmp/sbfix Your split replicas are mounted under /tmp/sbfix in directories r1 through r2
Obviously if you have more than replica 2, those will be r1 through however many replicas you have.
Compare your files, use stat, diff, whatever tool works for the file you're checking. In this demonstration case, it turns out both files just have different permissions. We'll keep the one on the second replica.
# rm /tmp/sbfix/r1/life/lessons/chocolate/gump.txt
Then just heal the file again
# gluster volume heal myvol1
If that's all you have to heal, just umount and clean up.
# umount /tmp/sbfix/r* # rm -rf /tmp/sbfix
That's all there is to it.
Where is it?
You can grab this from https://github.com/joejulian/glusterfs-splitbrain
Building and Installing splitmount
Download the source:
git clone https://github.com/joejulian/glusterfs-splitbrain.git splitmount
cd splitmount
To install splitmount in your home directory:
python setup.py install --user
To install splitmount system wide:
python setup.py install