Hadoop 3.1.3学习笔记3

Hadoop 3.1.3学习笔记3

本节将介绍在丢失数据后的重建(Resctruction)过程中,Hadoop是如何选择源节点和目标节点

在学习笔记2中介绍了丢失数据的重构过程,其中核心方法为StripedBlockReconstructor的reconstruct方法

从上一节我们得知,从readMinimumSources中获取了数量等同于数据块数量的datanode作为重建的数据源,reconstructTargets方法开始对数据进行重建。

private void reconstructTargets(int toReconstructLen) throws IOException {
     
  ByteBuffer[] inputs = getStripedReader().getInputBuffers(toReconstructLen);

  int[] erasedIndices = stripedWriter.getRealTargetIndices();
  ByteBuffer[] outputs = stripedWriter.getRealTargetBuffers(toReconstructLen);

  long start = System.nanoTime();
  getDecoder().decode(inputs, erasedIndices, outputs);
  long end = System.nanoTime();
  this.getDatanode().getMetrics().incrECDecodingTime(end - start);

  stripedWriter.updateRealTargetBuffers(toReconstructLen);
}

这里的inputs就是上述数据源,erasedIndices就是丢失的数据块,outputs就是重构后输出的数据,显然,我们一次“重建”的最小单位是bytebuffer,这里的解码过程调用了decode接口,这个方法根据所选的码型(codec)不同对应不同的doDecode方法,例如LRC码:

protected void doDecode(ByteBufferDecodingState decodingState) throws IOException {
     
    CoderUtil.resetOutputBuffers(decodingState.outputs,
            decodingState.decodeLength);
    prepareDecoding(decodingState.inputs, decodingState.erasedIndexes);

    ByteBuffer[] realInputs = new ByteBuffer[numRealInputUnits];
    for (int i = 0; i < numRealInputUnits; i++) {
     
        realInputs[i] = decodingState.inputs[validIndexes[i]];
    }
    LRCUtil.encodeData(gfTables, realInputs, decodingState.outputs);
}

其中prepareDecoding方法负责获取实际的“数据源”,具体到LRC,当某个block丢失后,我们选取这个block所在局部组的其他block即可将其恢复,具体选择方法可以阅读以下方法:

private <T> void prepareDecoding(T[] inputs, int[] erasedIndexes) throws IOException {
     
    int[] tmpValidIndexes = CoderUtil.getValidIndexes(inputs);

    // Initialize the number of input units for global recover use
    this.numRealInputUnits = getNumDataUnits();
    int k = getNumDataUnits();
    int l = getNumLocalParityUnits();
    int r = getNumParityUnits();
    int[] tmpRealValidIndexes = new int[getNumDataUnits()];
    // Verify if we need to recover locally or globally
    // when erasedIndexes.length = 1 we only need l of units to recover <==> local recover
    if (erasedIndexes.length == 1){
     
        if (erasedIndexes[0] < k + l){
     
            // We only need half of data units to recover data
            this.numRealInputUnits = k / l;

            // Create a candidate
            int[] localIndexes = new int[this.numRealInputUnits + 1];
            if (erasedIndexes[0] < k / 2 || erasedIndexes[0] == k){
     
                this.localXFlag = true;

                // Generate a candidate list for local X indexes
                for (int j = 0; j < this.numRealInputUnits; j++){
     
                    localIndexes[j] = j;
                }
                localIndexes[this.numRealInputUnits] = k;
            } // end if the first erased index is in local X part.
            else{
     
                this.localYFlag = true;

                // Generate a candidate list for local Y indexes
                for (int j = 0; j < this.numRealInputUnits; j++){
     
                    localIndexes[j] = j + k / 2;
                }
                localIndexes[this.numRealInputUnits] = k + 1;
            }
            // Select the local indexes from the candidate list
            tmpRealValidIndexes = new int[this.numRealInputUnits];
            int cur = 0;
            for (int j = 0; j < localIndexes.length; j++) {
     
                if (localIndexes[j] != erasedIndexes[0]) {
     
                    tmpRealValidIndexes[cur++] = localIndexes[j];
                }
            }
        } // end if erasedIndexes[0] < getNumDataUnits() + 2
        else {
     
            this.numRealInputUnits = getNumDataUnits();
            tmpRealValidIndexes = tmpValidIndexes;
        }
    } // end if erasedIndexes.length == 1
    else if (erasedIndexes.length < r + l){
     
        this.numRealInputUnits = getNumDataUnits();
        int erasedFlag = 0;
        if (erasedIndexes[0] < k/2 || erasedIndexes[0] == k){
     
            // X region has at least one erased unit
            erasedFlag = 0;
        }
        else if (erasedIndexes[0] < k || erasedIndexes[0] == k + 1){
     
            // Y region has at least one erased unit
            erasedFlag = 1;
        }
        else {
     
            erasedFlag = 2; // All erased units are in the global parity region
        }

        tmpRealValidIndexes = getGlobalValidIndexes(tmpValidIndexes, this.numRealInputUnits, erasedFlag);
    } // end if erasedIndexes.length < getNumParityUnits()
    else {
     
        if (erasedIndexesInLocal(erasedIndexes)){
     
            throw new HadoopIllegalArgumentException(
                    "Too many erased in a local part, data not recoverable");
        }
        else {
     
            this.numRealInputUnits = getNumDataUnits();
            tmpRealValidIndexes = tmpValidIndexes;
        }
    }

    if (Arrays.equals(this.cachedErasedIndexes, erasedIndexes) &&
            Arrays.equals(this.validIndexes, tmpRealValidIndexes)) {
     
        return; // Optimization. Nothing to do
    }
    this.cachedErasedIndexes =
            Arrays.copyOf(erasedIndexes, erasedIndexes.length);
    this.validIndexes =
            Arrays.copyOf(tmpRealValidIndexes, tmpRealValidIndexes.length);

    processErasures(erasedIndexes);
}

其余的部分遵循LRC的编解码方法,不再赘述。在这里要特别指出一点,这里的LRC码的实现并没有减少数据读取量,因为依然读取了k个(即与数据块数量相同)bytebuffer的数据,此处需要日后加以修正。

下面说明修复好的数据如何被放到指定的datanode中。

在reconstructTargets之后,调用stripedWriter的transferData2Targets方法将修复好的数据发送到制定的datanode中。

int transferData2Targets() {
     
  int nSuccess = 0;
  for (int i = 0; i < targets.length; i++) {
     
    if (targetsStatus[i]) {
     
      boolean success = false;
      try {
     
        writers[i].transferData2Target(packetBuf);
        nSuccess++;
        success = true;
      } catch (IOException e) {
     
        LOG.warn(e.getMessage());
      }
      targetsStatus[i] = success;
    }
  }
  return nSuccess;
}

我们观察此方法,transferData2Target显然是发送packetBuf大小的数据到目标节点,我们后面再说,此处是否发送取决于targetsStatus,即目标状态,这个数组在stripedWriter的initTargetStreams中被赋值

int initTargetStreams() {
     
  int nSuccess = 0;
  for (short i = 0; i < targets.length; i++) {
     
    try {
     
      writers[i] = createWriter(i);
      nSuccess++;
      targetsStatus[i] = true;
    } catch (Throwable e) {
     
      LOG.warn(e.getMessage());
    }
  }
  return nSuccess;
}

从此方法可以看出,targetsStatus的每一位为true还是false取决于createWriter即建立写入流

private StripedBlockWriter createWriter(short index) throws IOException {
     
  return new StripedBlockWriter(this, datanode, conf,
      reconstructor.getBlock(targetIndices[index]), targets[index],
      targetStorageTypes[index], targetStorageIds[index]);
}

我们观察StripedBlockWriter构造函数中的init方法

private void init() throws IOException {
     
  Socket socket = null;
  DataOutputStream out = null;
  DataInputStream in = null;
  boolean success = false;
  try {
     
    InetSocketAddress targetAddr =
        stripedWriter.getSocketAddress4Transfer(target);
    socket = datanode.newSocket();
    NetUtils.connect(socket, targetAddr,
        datanode.getDnConf().getSocketTimeout());
    socket.setTcpNoDelay(
        datanode.getDnConf().getDataTransferServerTcpNoDelay());
    socket.setSoTimeout(datanode.getDnConf().getSocketTimeout());

    Token<BlockTokenIdentifier> blockToken =
        datanode.getBlockAccessToken(block,
            EnumSet.of(BlockTokenIdentifier.AccessMode.WRITE),
            new StorageType[]{
     storageType}, new String[]{
     storageId});

    long writeTimeout = datanode.getDnConf().getSocketWriteTimeout();
    OutputStream unbufOut = NetUtils.getOutputStream(socket, writeTimeout);
    InputStream unbufIn = NetUtils.getInputStream(socket);
    DataEncryptionKeyFactory keyFactory =
        datanode.getDataEncryptionKeyFactoryForBlock(block);
    IOStreamPair saslStreams = datanode.getSaslClient().socketSend(
        socket, unbufOut, unbufIn, keyFactory, blockToken, target);

    unbufOut = saslStreams.out;
    unbufIn = saslStreams.in;

    out = new DataOutputStream(new BufferedOutputStream(unbufOut,
        DFSUtilClient.getSmallBufferSize(conf)));
    in = new DataInputStream(unbufIn);

    DatanodeInfo source = new DatanodeInfoBuilder()
        .setNodeID(datanode.getDatanodeId()).build();
    new Sender(out).writeBlock(block, storageType,
        blockToken, "", new DatanodeInfo[]{
     target},
        new StorageType[]{
     storageType}, source,
        BlockConstructionStage.PIPELINE_SETUP_CREATE, 0, 0, 0, 0,
        stripedWriter.getChecksum(), stripedWriter.getCachingStrategy(),
        false, false, null, storageId, new String[]{
     storageId});

    targetSocket = socket;
    targetOutputStream = out;
    targetInputStream = in;
    success = true;
  } finally {
     
    if (!success) {
     
      IOUtils.closeStream(out);
      IOUtils.closeStream(in);
      IOUtils.closeStream(socket);
    }
  }
}

很显然此方法建立了到目标datanode的socket,也即若此处无法连接到目标datanode,则抛出异常,则targetsStatus对应位无法被设为true(Boolean初始值为false)。那么,目标datanode又源自哪里呢?

stripedWriter的target来自于stripedReconInfo,最终由BlockPlacement的chooseTarget决定。

你可能感兴趣的:(hadoop,java,hadoop,hdfs)