BlockPlacementPolicyDefault是BlockPlacementPolicy的实现类,你可以实现自己的实现类,用dfs.block.replicator.classname参数配置你的实现类。
我们先看一下接口说明:
以下的方法为写入器选择numOfReplicas个数据结点来存储一个数据块的副本,数据块大小为blocksize。如果数量不够numOfReplicas,尽可能多返回。
@param srcPath 这个方法返回的数据块是哪个文件的。
@param numOfReplicas 需要更多的副本。
@param write, 写入器所在的服务器,如果不是集群中的服务器,则为空。
@param chosen 已经选择作为目标的数据结点。
@param returnChosenNodes 如果为真,那么返回已经选择的数据结点。
@param excludedNodes 这个列表中的结点应该排除在外,不能被选为目标结点。
@param blocksize,数据要写入的大小。
@return 返回DatanodeDescriptor的实例数组,这些结点作为此数据块的目标结点,并且被为作一个pipeline被排序。
/** * choose <i>numOfReplicas</i> data nodes for <i>writer</i> * to re-replicate a block with size <i>blocksize</i> * If not, return as many as we can. * * @param srcPath the file to which this chooseTargets is being invoked. * @param numOfReplicas additional number of replicas wanted. * @param writer the writer's machine, null if not in the cluster. * @param chosen datanodes that have been chosen as targets. * @param returnChosenNodes decide if the chosenNodes are returned. * @param excludedNodes datanodes that should not be considered as targets. * @param blocksize size of the data to be written. * @return array of DatanodeDescriptor instances chosen as target * and sorted as a pipeline. */ public abstract DatanodeStorageInfo[] chooseTarget(String srcPath, int numOfReplicas, Node writer, List<DatanodeStorageInfo> chosen, boolean returnChosenNodes, Set<Node> excludedNodes, long blocksize, BlockStoragePolicy storagePolicy);
验证本数据块副本的置放是否满足置放策略,例如,各副本在系统中被置放在不少于minRacks个架构中。
@param srcPath 被验证文件的全路径。
@param LBlk,带位置信息的数据块
@param numOfReplicas 要验证的文件副本数量
@return 验证的结果
/** * Verify if the block's placement meets requirement of placement policy, * i.e. replicas are placed on no less than minRacks racks in the system. * * @param srcPath the full pathname of the file to be verified * @param lBlk block with locations * @param numOfReplicas replica number of file to be verified * @return the result of verification */ abstract public BlockPlacementStatus verifyBlockPlacement(String srcPath, LocatedBlock lBlk, int numOfReplicas);判断删除一个数据块的特定复本,仍然使唤数据块满足配置上数据块置放策略。
/** * Decide whether deleting the specified replica of the block still makes * the block conform to the configured block placement policy. * * @param srcBC block collection of file to which block-to-be-deleted belongs * @param block The block to be deleted * @param replicationFactor The required number of replicas for this block * @param moreThanOne The replica locations of this block that are present * on more than one unique racks. * @param exactlyOne Replica locations of this block that are present * on exactly one unique racks. * @param excessTypes The excess {@link StorageType}s according to the * {@link BlockStoragePolicy}. * @return the replica that is the best candidate for deletion */ abstract public DatanodeStorageInfo chooseReplicaToDelete( BlockCollection srcBC, Block block, short replicationFactor, Collection<DatanodeStorageInfo> moreThanOne, Collection<DatanodeStorageInfo> exactlyOne, List<StorageType> excessTypes);
此方法用来建立一个数据块置放策略对象。BlockPlacementPolicy的所有实现类都应该有此方法。
@param conf 配置对象
@param stats 从stats可以获取集群的信息。
@param clusterMap 集群的拓扑信息。
/** * Used to setup a BlockPlacementPolicy object. This should be defined by * all implementations of a BlockPlacementPolicy. * * @param conf the configuration object * @param stats retrieve cluster status from here * @param clusterMap cluster topology */ abstract protected void initialize(Configuration conf, FSClusterStats stats, NetworkTopology clusterMap, Host2NodesMap host2datanodeMap);
在删除cur上的副本之后,调整rackmap,moreThanOne和exactlyOne。
@param rackMap 机架到副本的一个映射。
@param moreThanOne。机架中包含多于一个副本的数据结点。
@param exactlyOne. 机架中只有一个副本的数据结点
@param cur 当前要删除的副本的信息。
/** * Adjust rackmap, moreThanOne, and exactlyOne after removing replica on cur. * * @param rackMap a map from rack to replica * @param moreThanOne The List of replica nodes on rack which has more than * one replica * @param exactlyOne The List of replica nodes on rack with only one replica * @param cur current replica to remove */ public void adjustSetsWithChosenReplica( final Map<String, List<DatanodeStorageInfo>> rackMap, final List<DatanodeStorageInfo> moreThanOne, final List<DatanodeStorageInfo> exactlyOne, final DatanodeStorageInfo cur) { final String rack = getRack(cur.getDatanodeDescriptor()); final List<DatanodeStorageInfo> storages = rackMap.get(rack); storages.remove(cur); if (storages.isEmpty()) { rackMap.remove(rack); } if (moreThanOne.remove(cur)) { if (storages.size() == 1) { final DatanodeStorageInfo remaining = storages.get(0); moreThanOne.remove(remaining); exactlyOne.add(remaining); } } else { exactlyOne.remove(cur); } }
/** * Get rack string from a data node * @return rack of data node */ protected String getRack(final DatanodeInfo datanode) { return datanode.getNetworkLocation(); }
@param datanodes,要被分成两个集合的所有结点。
@param rackMap 机架到副本的一个映射。
@param moreThanOne 机架中包含多于一个副本的数据结点。
@param exactlyOne机架中只有一个副本的数据结点
/** * Split data nodes into two sets, one set includes nodes on rack with * more than one replica, the other set contains the remaining nodes. * * @param dataNodes datanodes to be split into two sets * @param rackMap a map from rack to datanodes * @param moreThanOne contains nodes on rack with more than one replica * @param exactlyOne remains contains the remaining nodes */ public void splitNodesWithRack( final Iterable<DatanodeStorageInfo> storages, final Map<String, List<DatanodeStorageInfo>> rackMap, final List<DatanodeStorageInfo> moreThanOne, final List<DatanodeStorageInfo> exactlyOne) { for(DatanodeStorageInfo s: storages) { final String rackName = getRack(s.getDatanodeDescriptor()); List<DatanodeStorageInfo> storageList = rackMap.get(rackName); if (storageList == null) { storageList = new ArrayList<DatanodeStorageInfo>(); rackMap.put(rackName, storageList); } storageList.add(s); } // split nodes into two sets for(List<DatanodeStorageInfo> storageList : rackMap.values()) { if (storageList.size() == 1) { // exactlyOne contains nodes on rack with only one replica exactlyOne.add(storageList.get(0)); } else { // moreThanOne contains nodes on rack with more than one replica moreThanOne.addAll(storageList); } } }