BlockPlacementPolicyDefault是BlockPlacementPolicy的实现类,你可以实现自己的实现类,用dfs.block.replicator.classname参数配置你的实现类。
我们先看一下接口说明:
以下的方法为写入器选择numOfReplicas个数据结点来存储一个数据块的副本,数据块大小为blocksize。如果数量不够numOfReplicas,尽可能多返回。
@param srcPath 这个方法返回的数据块是哪个文件的。
@param numOfReplicas 需要更多的副本。
@param write, 写入器所在的服务器,如果不是集群中的服务器,则为空。
@param chosen 已经选择作为目标的数据结点。
@param returnChosenNodes 如果为真,那么返回已经选择的数据结点。
@param excludedNodes 这个列表中的结点应该排除在外,不能被选为目标结点。
@param blocksize,数据要写入的大小。
@return 返回DatanodeDescriptor的实例数组,这些结点作为此数据块的目标结点,并且被为作一个pipeline被排序。
/**
* choose numOfReplicas data nodes for writer
* to re-replicate a block with size blocksize
* If not, return as many as we can.
*
* @param srcPath the file to which this chooseTargets is being invoked.
* @param numOfReplicas additional number of replicas wanted.
* @param writer the writer's machine, null if not in the cluster.
* @param chosen datanodes that have been chosen as targets.
* @param returnChosenNodes decide if the chosenNodes are returned.
* @param excludedNodes datanodes that should not be considered as targets.
* @param blocksize size of the data to be written.
* @return array of DatanodeDescriptor instances chosen as target
* and sorted as a pipeline.
*/
public abstract DatanodeStorageInfo[] chooseTarget(String srcPath,
int numOfReplicas,
Node writer,
List chosen,
boolean returnChosenNodes,
Set excludedNodes,
long blocksize,
BlockStoragePolicy storagePolicy);
验证本数据块副本的置放是否满足置放策略,例如,各副本在系统中被置放在不少于minRacks个架构中。
@param srcPath 被验证文件的全路径。
@param LBlk,带位置信息的数据块
@param numOfReplicas 要验证的文件副本数量
@return 验证的结果
/**
* Verify if the block's placement meets requirement of placement policy,
* i.e. replicas are placed on no less than minRacks racks in the system.
*
* @param srcPath the full pathname of the file to be verified
* @param lBlk block with locations
* @param numOfReplicas replica number of file to be verified
* @return the result of verification
*/
abstract public BlockPlacementStatus verifyBlockPlacement(String srcPath,
LocatedBlock lBlk,
int numOfReplicas);
判断删除一个数据块的特定复本,仍然使唤数据块满足配置上数据块置放策略。
/**
* Decide whether deleting the specified replica of the block still makes
* the block conform to the configured block placement policy.
*
* @param srcBC block collection of file to which block-to-be-deleted belongs
* @param block The block to be deleted
* @param replicationFactor The required number of replicas for this block
* @param moreThanOne The replica locations of this block that are present
* on more than one unique racks.
* @param exactlyOne Replica locations of this block that are present
* on exactly one unique racks.
* @param excessTypes The excess {@link StorageType}s according to the
* {@link BlockStoragePolicy}.
* @return the replica that is the best candidate for deletion
*/
abstract public DatanodeStorageInfo chooseReplicaToDelete(
BlockCollection srcBC,
Block block,
short replicationFactor,
Collection moreThanOne,
Collection exactlyOne,
List excessTypes);
此方法用来建立一个数据块置放策略对象。BlockPlacementPolicy的所有实现类都应该有此方法。
@param conf 配置对象
@param stats 从stats可以获取集群的信息。
@param clusterMap 集群的拓扑信息。
/**
* Used to setup a BlockPlacementPolicy object. This should be defined by
* all implementations of a BlockPlacementPolicy.
*
* @param conf the configuration object
* @param stats retrieve cluster status from here
* @param clusterMap cluster topology
*/
abstract protected void initialize(Configuration conf, FSClusterStats stats,
NetworkTopology clusterMap,
Host2NodesMap host2datanodeMap);
在删除cur上的副本之后,调整rackmap,moreThanOne和exactlyOne。
@param rackMap 机架到副本的一个映射。
@param moreThanOne。机架中包含多于一个副本的数据结点。
@param exactlyOne. 机架中只有一个副本的数据结点
@param cur 当前要删除的副本的信息。
/**
* Adjust rackmap, moreThanOne, and exactlyOne after removing replica on cur.
*
* @param rackMap a map from rack to replica
* @param moreThanOne The List of replica nodes on rack which has more than
* one replica
* @param exactlyOne The List of replica nodes on rack with only one replica
* @param cur current replica to remove
*/
public void adjustSetsWithChosenReplica(
final Map> rackMap,
final List moreThanOne,
final List exactlyOne,
final DatanodeStorageInfo cur) {
final String rack = getRack(cur.getDatanodeDescriptor());
final List storages = rackMap.get(rack);
storages.remove(cur);
if (storages.isEmpty()) {
rackMap.remove(rack);
}
if (moreThanOne.remove(cur)) {
if (storages.size() == 1) {
final DatanodeStorageInfo remaining = storages.get(0);
moreThanOne.remove(remaining);
exactlyOne.add(remaining);
}
} else {
exactlyOne.remove(cur);
}
}
/**
* Get rack string from a data node
* @return rack of data node
*/
protected String getRack(final DatanodeInfo datanode) {
return datanode.getNetworkLocation();
}
@param datanodes,要被分成两个集合的所有结点。
@param rackMap 机架到副本的一个映射。
@param moreThanOne 机架中包含多于一个副本的数据结点。
@param exactlyOne机架中只有一个副本的数据结点
/**
* Split data nodes into two sets, one set includes nodes on rack with
* more than one replica, the other set contains the remaining nodes.
*
* @param dataNodes datanodes to be split into two sets
* @param rackMap a map from rack to datanodes
* @param moreThanOne contains nodes on rack with more than one replica
* @param exactlyOne remains contains the remaining nodes
*/
public void splitNodesWithRack(
final Iterable storages,
final Map> rackMap,
final List moreThanOne,
final List exactlyOne) {
for(DatanodeStorageInfo s: storages) {
final String rackName = getRack(s.getDatanodeDescriptor());
List storageList = rackMap.get(rackName);
if (storageList == null) {
storageList = new ArrayList();
rackMap.put(rackName, storageList);
}
storageList.add(s);
}
// split nodes into two sets
for(List storageList : rackMap.values()) {
if (storageList.size() == 1) {
// exactlyOne contains nodes on rack with only one replica
exactlyOne.add(storageList.get(0));
} else {
// moreThanOne contains nodes on rack with more than one replica
moreThanOne.addAll(storageList);
}
}
}