hadoop 2.6.3 BlockPlacementPolicy分析

BlockPlacementPolicyDefault是BlockPlacementPolicy的实现类,你可以实现自己的实现类,用dfs.block.replicator.classname参数配置你的实现类。


我们先看一下接口说明:

 以下的方法为写入器选择numOfReplicas个数据结点来存储一个数据块的副本,数据块大小为blocksize。如果数量不够numOfReplicas,尽可能多返回。

@param srcPath 这个方法返回的数据块是哪个文件的。

@param numOfReplicas 需要更多的副本。

@param write, 写入器所在的服务器,如果不是集群中的服务器,则为空。

@param chosen 已经选择作为目标的数据结点。

@param returnChosenNodes 如果为真,那么返回已经选择的数据结点。

@param excludedNodes 这个列表中的结点应该排除在外,不能被选为目标结点。

@param blocksize,数据要写入的大小。

@return 返回DatanodeDescriptor的实例数组,这些结点作为此数据块的目标结点,并且被为作一个pipeline被排序。

/**
   * choose <i>numOfReplicas</i> data nodes for <i>writer</i> 
   * to re-replicate a block with size <i>blocksize</i> 
   * If not, return as many as we can.
   *
   * @param srcPath the file to which this chooseTargets is being invoked.
   * @param numOfReplicas additional number of replicas wanted.
   * @param writer the writer's machine, null if not in the cluster.
   * @param chosen datanodes that have been chosen as targets.
   * @param returnChosenNodes decide if the chosenNodes are returned.
   * @param excludedNodes datanodes that should not be considered as targets.
   * @param blocksize size of the data to be written.
   * @return array of DatanodeDescriptor instances chosen as target
   * and sorted as a pipeline.
   */
  public abstract DatanodeStorageInfo[] chooseTarget(String srcPath,
                                             int numOfReplicas,
                                             Node writer,
                                             List<DatanodeStorageInfo> chosen,
                                             boolean returnChosenNodes,
                                             Set<Node> excludedNodes,
                                             long blocksize,
                                             BlockStoragePolicy storagePolicy);

验证本数据块副本的置放是否满足置放策略,例如,各副本在系统中被置放在不少于minRacks个架构中。

@param srcPath 被验证文件的全路径。

@param LBlk,带位置信息的数据块

@param numOfReplicas 要验证的文件副本数量

@return 验证的结果

/**
   * Verify if the block's placement meets requirement of placement policy,
   * i.e. replicas are placed on no less than minRacks racks in the system.
   * 
   * @param srcPath the full pathname of the file to be verified
   * @param lBlk block with locations
   * @param numOfReplicas replica number of file to be verified
   * @return the result of verification
   */
  abstract public BlockPlacementStatus verifyBlockPlacement(String srcPath,
      LocatedBlock lBlk,
      int numOfReplicas);
判断删除一个数据块的特定复本,仍然使唤数据块满足配置上数据块置放策略。

  /**
   * Decide whether deleting the specified replica of the block still makes 
   * the block conform to the configured block placement policy.
   * 
   * @param srcBC block collection of file to which block-to-be-deleted belongs
   * @param block The block to be deleted
   * @param replicationFactor The required number of replicas for this block
   * @param moreThanOne The replica locations of this block that are present
   *                    on more than one unique racks.
   * @param exactlyOne Replica locations of this block that  are present
   *                    on exactly one unique racks.
   * @param excessTypes The excess {@link StorageType}s according to the
   *                    {@link BlockStoragePolicy}.
   * @return the replica that is the best candidate for deletion
   */
  abstract public DatanodeStorageInfo chooseReplicaToDelete(
      BlockCollection srcBC,
      Block block, 
      short replicationFactor,
      Collection<DatanodeStorageInfo> moreThanOne,
      Collection<DatanodeStorageInfo> exactlyOne,
      List<StorageType> excessTypes);


此方法用来建立一个数据块置放策略对象。BlockPlacementPolicy的所有实现类都应该有此方法。

@param conf 配置对象

@param stats 从stats可以获取集群的信息。

@param clusterMap 集群的拓扑信息。

 /**
   * Used to setup a BlockPlacementPolicy object. This should be defined by 
   * all implementations of a BlockPlacementPolicy.
   * 
   * @param conf the configuration object
   * @param stats retrieve cluster status from here
   * @param clusterMap cluster topology
   */
  abstract protected void initialize(Configuration conf,  FSClusterStats stats, 
                                     NetworkTopology clusterMap, 
                                     Host2NodesMap host2datanodeMap);

在删除cur上的副本之后,调整rackmap,moreThanOne和exactlyOne。

@param rackMap 机架到副本的一个映射。

@param moreThanOne。机架中包含多于一个副本的数据结点。

@param exactlyOne. 机架中只有一个副本的数据结点

@param cur 当前要删除的副本的信息。

/**
   * Adjust rackmap, moreThanOne, and exactlyOne after removing replica on cur.
   *
   * @param rackMap a map from rack to replica
   * @param moreThanOne The List of replica nodes on rack which has more than 
   *        one replica
   * @param exactlyOne The List of replica nodes on rack with only one replica
   * @param cur current replica to remove
   */
  public void adjustSetsWithChosenReplica(
      final Map<String, List<DatanodeStorageInfo>> rackMap,
      final List<DatanodeStorageInfo> moreThanOne,
      final List<DatanodeStorageInfo> exactlyOne,
      final DatanodeStorageInfo cur) {
    
    final String rack = getRack(cur.getDatanodeDescriptor());
    final List<DatanodeStorageInfo> storages = rackMap.get(rack);
    storages.remove(cur);
    if (storages.isEmpty()) {
      rackMap.remove(rack);
    }
    if (moreThanOne.remove(cur)) {
      if (storages.size() == 1) {
        final DatanodeStorageInfo remaining = storages.get(0);
        moreThanOne.remove(remaining);
        exactlyOne.add(remaining);
      }
    } else {
      exactlyOne.remove(cur);
    }
  }

getRack方法返回数据结点的网络路径。

  /**
   * Get rack string from a data node
   * @return rack of data node
   */
  protected String getRack(final DatanodeInfo datanode) {
    return datanode.getNetworkLocation();
  }

把数据结点分成两个集合,一个集合包含数据结点所在rack的副本数大于1的数据结点,另一个集合包含其它的结点。

@param datanodes,要被分成两个集合的所有结点。

@param rackMap 机架到副本的一个映射。

@param moreThanOne 机架中包含多于一个副本的数据结点。

@param exactlyOne机架中只有一个副本的数据结点

 /**
   * Split data nodes into two sets, one set includes nodes on rack with
   * more than one  replica, the other set contains the remaining nodes.
   * 
   * @param dataNodes datanodes to be split into two sets
   * @param rackMap a map from rack to datanodes
   * @param moreThanOne contains nodes on rack with more than one replica
   * @param exactlyOne remains contains the remaining nodes
   */
  public void splitNodesWithRack(
      final Iterable<DatanodeStorageInfo> storages,
      final Map<String, List<DatanodeStorageInfo>> rackMap,
      final List<DatanodeStorageInfo> moreThanOne,
      final List<DatanodeStorageInfo> exactlyOne) {
    for(DatanodeStorageInfo s: storages) {
      final String rackName = getRack(s.getDatanodeDescriptor());
      List<DatanodeStorageInfo> storageList = rackMap.get(rackName);
      if (storageList == null) {
        storageList = new ArrayList<DatanodeStorageInfo>();
        rackMap.put(rackName, storageList);
      }
      storageList.add(s);
    }
    
    // split nodes into two sets
    for(List<DatanodeStorageInfo> storageList : rackMap.values()) {
      if (storageList.size() == 1) {
        // exactlyOne contains nodes on rack with only one replica
        exactlyOne.add(storageList.get(0));
      } else {
        // moreThanOne contains nodes on rack with more than one replica
        moreThanOne.addAll(storageList);
      }
    }
  }







你可能感兴趣的:(hadoop 2.6.3 BlockPlacementPolicy分析)