Replica Selection/Propogate Strategy

Choose sequence: local node -> local rack -> remote rack

a selected replica node should be a good node:
1> if the node is (being) decommissed
2> the remaining capacity of the target machine
3> the communication traffic of the target machine
 based on current connection of that machine, if current connection number > average connection num(total connections / total machine size) node.getXceiverCount() > (2.0 * avgLoad)
4> if the target rack has chosen too many nodes

Data replication topology:
first replica machine -> second replica -> ...

After a list of replicas are selected, replica list should be sorted in pipe.
The aim of this sort is to find the shortest path, since data replication will be travelling from the first replica machine to the final replica machine, so, we should reduce round trip.
This is basically a traveling salesman problem:

java 代码
  1. int index=0;   
  2. for( ;index
  3.   DatanodeDescriptor shortestNode = null;   
  4.   int shortestDistance = Integer.MAX_VALUE;   
  5.   int shortestIndex = index;   
  6.   forint i=index; i
  7.     DatanodeDescriptor currentNode = nodes.get(i);   
  8.     int currentDistance = clusterMap.getDistance( writer, currentNode );   
  9.     if(shortestDistance>currentDistance ) {   
  10.       shortestDistance = currentDistance;   
  11.       shortestNode = currentNode;   
  12.       shortestIndex = i;   
  13.     }   
  14.   }   
  15.   //switch position index & shortestIndex   
  16.   if( index != shortestIndex ) {   
  17.     nodes.set(shortestIndex, nodes.get(index));   
  18.     nodes.set(index, shortestNode);   
  19.   }   
  20.   writer = shortestNode;   
  21. }  

你可能感兴趣的:(rack)