This is one of the most daunting problems I have ever solved in my life. Another one with a similar complex nature that comes in to my mind is the extended regular expression parser I implemented. For the latter I still not 100% sure if it works perfectly as expected, it passed some typical tests though.
This one can have a test set that is relatively easy to create and later stage of the work was mainly based on testing and debugging iterations.
The problem itself is easy to describe, work out the minimum spanning tree of a synchronised network.
A bit of explanation of the term synchronised network here, which is defined in distributed algorithm roughly as,
A network that is composed of nodes (processes) connected by links, where the links have different weight (distances), the nodes communicate with each other by sending messages, according the following rules, in each step
- first, each node should send messages to some of its neighbours according to its current status (status is deliberately used in place of state which refers to the context a node is in, or equivalently all the data that describes the node the algorithm is working on)
- second, each node should receive messages from its neighbours and according to the messages it receives and transfer its status to another
(Note: duplex is enabled and the nodes on the ends of each link are equal, therefore in the message sending phase of each step messages are sent through links in both-direction. So the graph is a typical bi-directional/undirected graph.)
The lecture notes of Distributed Algorithm by MIT proposed a sketchy algorithm which it states that it has some issues with deducing the timing for synchronisation to solve. However the implementation out of weeks of debugging I finally put forward here doesn't have much of this at all. And I have a feeling it has already got to the point where it's not faraway to be generalised to non-synchronised network which I so far haven't read about and thus don't have not much idea of though.
The general idea is based on the original solution by the notes split the tree discovering process into sub-tree forming units starting from sub-trees of single nodes. Each sub-tree forming unit goes through a complete set of the main designed statuses, Sub-tree forming units are demarcated by a few signalling statuses that merge two sub-trees into one.
The algorithm has been highly optimised at the processes of minimal outbound link election and the deduction of its completion, and some statuses are compressed in time slots consumption and therefore certain subroutines are shared by multiple statuses.
The implementation is considerably complicated and involving subtle details, and it's not of much point to post it here without comprehensive illustration. So only the link to the source code is provided, and the code is reasonably commented.
Ch3_2 folder of https://damit.codeplex.com/