1.1 Key differences between versions 1.6.2x and 2.0.x

1.     Master host(s) configuration is done solely via DNS – it is no longer possible to list master(s) IP address(es) in clients’ and Chunkservers’ configuration; default name for master domain is mfsmaster, it can be changed in configuration files;


2. In Pro version Metaloggers become optional, they can be replaced by additional Master Servers; in Community Edition it is still strongly recommended to set up Metaloggers.


3. Mfsmetarestore tool is no longer present in the system; instead, it is enough to start the master process with -a switch;

4. Configuration files now sit in mfs subdirectory inside the /etc directory (this change was introduced in 1.6.27).


1.2   Many Master Servers – how does it work?


In previous MooseFS versions you had only one master process and any number of Metaloggers. In the event of master failure, system administrator was able to retrieve ”metadata” information from the Metalogger and start a new master (on a new machine, if necessary), so the file system was up and running again. But this was always causing the system to be unavailable to clients for a period of time and required manual work to bring it back up.


New MooseFS Pro version introduces many Master Servers working together in multiple roles. One role is ”leader”. The Leader Master is acting as it used to for the Chunkservers and clients. There is never more than one leader in any working system.


The other role is ”follower”. The follower master is doing what Metaloggers used to do – it downloads metadata from the leader master and keeps it. But unlike a Metalogger, if a leader master stops working, a follower master is immediately ready to take on the role of leader. If the leader master fails, a new candidate for leader is chosen from the followers. The candidate assumes a role of ”elect”, that automatically converts to ”leader” as soon as more than half of the Chunkservers connect to elect. There can be more than one follower in the system


The whole switching operation is almost invisible to the system users, as it usually takes between a couple to a dozen or so seconds. When/if the former leader master starts working again, it assumes the role of follower. If a follower master fails, it has no effect on the whole system. If such a master starts working again, it again assumes the role of follower.

