冲鸭hhh

HDFS 写流程源码分析

一、客户端
- （一）文件创建及Pipeline构建阶段
- （二）数据写入
- （三）输出流关闭
二、NameNode端
- （一）create

环境为hadoop 3.1.3

一、客户端

以下代码创建并写入文件。

public void create() throws URISyntaxException, IOException, InterruptedException {
        // 配置文件
        Configuration conf = new Configuration();
        // 获取文件系统
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.157.128:9000"), conf, "root");
        // 创建文件并写入数据
        FSDataOutputStream out = fs.create(new Path("/root/test3.txt"));
        out.write("Hello, HDFS".getBytes());
        out.flush();
        // 关闭流
        fs.close();
    }

Configuration加载了hadoop的配置信息，下为其静态代码块，可以看到眼熟的配置文件名称。

static{
    //print deprecation warning if hadoop-site.xml is found in classpath
    ClassLoader cL = Thread.currentThread().getContextClassLoader();
    if (cL == null) {
      cL = Configuration.class.getClassLoader();
    }
    if(cL.getResource("hadoop-site.xml")!=null) {
      LOG.warn("DEPRECATED: hadoop-site.xml found in the classpath. " +
          "Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, "
          + "mapred-site.xml and hdfs-site.xml to override properties of " +
          "core-default.xml, mapred-default.xml and hdfs-default.xml " +
          "respectively");
    }
    addDefaultResource("core-default.xml");
    addDefaultResource("core-site.xml");
  }

FileSystem是Hadoop文件系统的抽象类，有许多实现（如下图），hdfs便是其分布式文件系统的具体实现。

这里我们关注的为DistributedFileSystem。

（一）文件创建及Pipeline构建阶段

FSDataOutputStream out = fs.create(new Path("/root/test3.txt"));

上面一行代码用于创建/root/test3.txt文件，并获取该文件的输出流。经过多次跳转，定向到DistributedFileSystem的create方法。

public FSDataOutputStream create(final Path f, final FsPermission permission,
    final EnumSet<CreateFlag> cflags, final int bufferSize,
    final short replication, final long blockSize, final Progressable progress,
    final ChecksumOpt checksumOpt) throws IOException {
    statistics.incrementWriteOps(1);  // metric
    Path absF = fixRelativePart(f);  // 获取绝对路径
    return new FileSystemLinkResolver<FSDataOutputStream>() {
      @Override
      public FSDataOutputStream doCall(final Path p)
          throws IOException, UnresolvedLinkException {
        /*
        * 主要工作：
        * 1、向NameNode进行create方法的rpc调用，创建文件
        * 2、启动DataStreamer，用于后续的数据传输
        */
        final DFSOutputStream dfsos = dfs.create(getPathName(p), permission,
                cflags, replication, blockSize, progress, bufferSize,
                checksumOpt);
        return dfs.createWrappedOutputStream(dfsos, statistics);  // 封装返回HdfsDataOutputStream
      }

	  // 异常重试
      @Override
      public FSDataOutputStream next(final FileSystem fs, final Path p)
          throws IOException {
        return fs.create(p, permission, cflags, bufferSize,
            replication, blockSize, progress, checksumOpt);
      }
    }.resolve(this, absF);
  }

DFSClient的create方法主要做了两件事，一是向NameNode进行create方法的rpc调用，创建文件，二是启动DataStreamer，用于后续的数据传输。

public DFSOutputStream create(String src, 
                             FsPermission permission,
                             EnumSet<CreateFlag> flag, 
                             boolean createParent,
                             short replication,
                             long blockSize,
                             Progressable progress,
                             int buffersize,
                             ChecksumOpt checksumOpt,
                             InetSocketAddress[] favoredNodes) throws IOException {
    // 检查客户端状态
    checkOpen();
    // 封装权限信息（rw-r--r--）
    if (permission == null) {
      permission = FsPermission.getFileDefault();
    }
    FsPermission masked = permission.applyUMask(dfsClientConf.uMask);
    if(LOG.isDebugEnabled()) {
      LOG.debug(src + ": masked=" + masked);
    }
    // 更优先选择作为DataNode的节点
    String[] favoredNodeStrs = null;
    if (favoredNodes != null) {
      favoredNodeStrs = new String[favoredNodes.length];
      for (int i = 0; i < favoredNodes.length; i++) {
        favoredNodeStrs[i] = 
            favoredNodes[i].getHostName() + ":" 
                         + favoredNodes[i].getPort();
      }
    }
    /*
    * 1、create的rpc调用
    * 2、DataStreamer启动
    */
    final DFSOutputStream result = DFSOutputStream.newStreamForCreate(this,
        src, masked, flag, createParent, replication, blockSize, progress,
        buffersize, dfsClientConf.createChecksum(checksumOpt),
        favoredNodeStrs);
    // lease
    beginFileLease(result.getFileId(), result);
    return result;
  }

这里我们着重关注DFSOutputStream.newStreamForCreate()方法。

static DFSOutputStream newStreamForCreate(DFSClient dfsClient, String src,
      FsPermission masked, EnumSet<CreateFlag> flag, boolean createParent,
      short replication, long blockSize, Progressable progress, int buffersize,
      DataChecksum checksum, String[] favoredNodes) throws IOException {
    HdfsFileStatus stat = null;

    // Retry the create if we get a RetryStartFileException up to a maximum
    // number of times
    boolean shouldRetry = true;
    int retryCount = CREATE_RETRY_COUNT;
    while (shouldRetry) {
      shouldRetry = false;
      try {
        // rpc调用
        stat = dfsClient.namenode.create(src, masked, dfsClient.clientName,
            new EnumSetWritable<CreateFlag>(flag), createParent, replication,
            blockSize, SUPPORTED_CRYPTO_VERSIONS);
        break;
      } catch (RemoteException re) {
        IOException e = re.unwrapRemoteException(
            AccessControlException.class,
            DSQuotaExceededException.class,
            FileAlreadyExistsException.class,
            FileNotFoundException.class,
            ParentNotDirectoryException.class,
            NSQuotaExceededException.class,
            RetryStartFileException.class,
            SafeModeException.class,
            UnresolvedPathException.class,
            SnapshotAccessControlException.class,
            UnknownCryptoProtocolVersionException.class);
        if (e instanceof RetryStartFileException) {
          if (retryCount > 0) {
            shouldRetry = true;
            retryCount--;
          } else {
            throw new IOException("Too many retries because of encryption" +
                " zone operations", e);
          }
        } else {
          throw e;
        }
      }
    }
    Preconditions.checkNotNull(stat, "HdfsFileStatus should not be null!");
    final DFSOutputStream out = new DFSOutputStream(dfsClient, src, stat,
        flag, progress, checksum, favoredNodes);
    // 启动DataStreamer
    out.start();
    return out;
  }

首先进行了create方法的rpc调用。然后我们着重关注DFSOutputStream.start()。

  private synchronized void start() {
    streamer.start();
  }

开启了DataStreamer线程用于向DataNode发送数据，所以我们着重关注DataStreamer.run()。

public void run() {
      long lastPacket = Time.now();
      TraceScope traceScope = null;
      if (traceSpan != null) {
        traceScope = Trace.continueSpan(traceSpan);
      }
      while (!streamerClosed && dfsClient.clientRunning) {

        // if the Responder encountered an error, shutdown Responder
        if (hasError && response != null) {
          try {
            // ResponseProcessor，用于处理下游DataNode的响应
            response.close();
            response.join();
            response = null;
          } catch (InterruptedException  e) {
            DFSClient.LOG.warn("Caught exception ", e);
          }
        }

        Packet one;
        try {
          // process datanode IO errors if any
          boolean doSleep = false;
          if (hasError && (errorIndex >= 0 || restartingNodeIndex >= 0)) {
            doSleep = processDatanodeError();
          }

          // dataQueue中装的是Packet（Block(128MB) -> Packet(64KB) -> Chunk(512B数据 + 4B校验)）
          // Block：数据存储单元； Packet：数据传输单元； Chunk：校验单元
          synchronized (dataQueue) {
            // wait for a packet to be sent.
            long now = Time.now();
            while ((!streamerClosed && !hasError && dfsClient.clientRunning 
                && dataQueue.size() == 0 && 
                (stage != BlockConstructionStage.DATA_STREAMING || // 状态为DATA_STREAMING表示链接已建立，正在传输数据
                 stage == BlockConstructionStage.DATA_STREAMING && 
                 now - lastPacket < dfsClient.getConf().socketTimeout/2)) || doSleep ) {
              long timeout = dfsClient.getConf().socketTimeout/2 - (now-lastPacket);
              timeout = timeout <= 0 ? 1000 : timeout;
              timeout = (stage == BlockConstructionStage.DATA_STREAMING)?
                 timeout : 1000;
              try {
                dataQueue.wait(timeout);  // 等待唤醒（Packet填充完毕，dataQueue不为空了）
              } catch (InterruptedException  e) {
                DFSClient.LOG.warn("Caught exception ", e);
              }
              doSleep = false;
              now = Time.now();
            }
            
            if (streamerClosed || hasError || !dfsClient.clientRunning) {
              continue;
            }
            // get packet to be sent.
            if (dataQueue.isEmpty()) {
              // 心跳包
              one = createHeartbeatPacket();
            } else {
              one = dataQueue.getFirst(); // regular data packet
            }
          }
          assert one != null;

          // get new block from namenode.
          if (stage == BlockConstructionStage.PIPELINE_SETUP_CREATE) {
            if(DFSClient.LOG.isDebugEnabled()) {
              DFSClient.LOG.debug("Allocating new block");
            }
            /*
            * 主要工作：
            * nextBlockOutputStream()
            * 1、向NameNode发送addBlock的rpc请求，新建block加入文件，分配并返回存储该block的DataNode
            * 2、向第一个DataNode建立连接（链式复制）
            * setPipeline()
            * 3、记录参与链式复制的节点及相关信息
            */
            setPipeline(nextBlockOutputStream());
            // 启动ResponseProcessor（接收Pipeline中第一个DataNode的ack），更改输出流状态为DATA_STREAMING
            initDataStreaming();
          } else if (stage == BlockConstructionStage.PIPELINE_SETUP_APPEND) {
            if(DFSClient.LOG.isDebugEnabled()) {
              DFSClient.LOG.debug("Append to block " + block);
            }
            setupPipelineForAppendOrRecovery();
            initDataStreaming();
          }

		  // 最后一个Packet超出了Block限制
          long lastByteOffsetInBlock = one.getLastByteOffsetBlock();
          if (lastByteOffsetInBlock > blockSize) {
            throw new IOException("BlockSize " + blockSize +
                " is smaller than data size. " +
                " Offset of packet in block " + 
                lastByteOffsetInBlock +
                " Aborting file " + src);
          }

		  // 发送Block的最后一个Packet之前，先等待其它Packet都已经被DataNode接收
		  // 保证一个block被完整地接收（然后调用complete()方法的rpc），防止跨block的Packet同时等待ack
          if (one.lastPacketInBlock) {
            // wait for all data packets have been successfully acked
            synchronized (dataQueue) {
              while (!streamerClosed && !hasError && 
                  ackQueue.size() != 0 && dfsClient.clientRunning) {
                try {
                  // wait for acks to arrive from datanodes
                  dataQueue.wait(1000);
                } catch (InterruptedException  e) {
                  DFSClient.LOG.warn("Caught exception ", e);
                }
              }
            }
            if (streamerClosed || hasError || !dfsClient.clientRunning) {
              continue;
            }
            // 输出流置为关闭状态
            // 因为后续block所存储的DataNode可能与这个block不同，所以pipeline也没啥用了，直接关闭就行
            stage = BlockConstructionStage.PIPELINE_CLOSE;
          }
          
          // send the packet
          synchronized (dataQueue) {
            // move packet from dataQueue to ackQueue
            // 如果不是心跳包，就将该包从待发送队列（dataQueue）移到待响应队列（ackQueue），等待DataNode响应
            if (!one.isHeartbeatPacket()) {
              dataQueue.removeFirst();
              ackQueue.addLast(one);
              dataQueue.notifyAll();
            }
          }

          if (DFSClient.LOG.isDebugEnabled()) {
            DFSClient.LOG.debug("DataStreamer block " + block +
                " sending packet " + one);
          }

          // write out data to remote datanode
          try {
            // 向DataNode发送Packet
            one.writeTo(blockStream);
            blockStream.flush();   
          } catch (IOException e) {
            // HDFS-3398 treat primary DN is down since client is unable to 
            // write to primary DN. If a failed or restarting node has already
            // been recorded by the responder, the following call will have no 
            // effect. Pipeline recovery can handle only one node error at a
            // time. If the primary node fails again during the recovery, it
            // will be taken out then.
            tryMarkPrimaryDatanodeFailed();
            throw e;
          }
          lastPacket = Time.now();
          
          // update bytesSent
          // 更新已发送的数据在block中的偏移量
          long tmpBytesSent = one.getLastByteOffsetBlock();
          if (bytesSent < tmpBytesSent) {
            bytesSent = tmpBytesSent;
          }

          if (streamerClosed || hasError || !dfsClient.clientRunning) {
            continue;
          }

          // Is this block full?
          // 如果刚才发送的Packet是block中的最后一个，持续等待，直到该Packet已被ack，
          // 此时该block的所有Packet都已经被ack了（因为只有前面所有的包都被ack了，最后一个包才会发出去）
          if (one.lastPacketInBlock) {
            // wait for the close packet has been acked
            synchronized (dataQueue) {
              while (!streamerClosed && !hasError && 
                  ackQueue.size() != 0 && dfsClient.clientRunning) {
                dataQueue.wait(1000);// wait for acks to arrive from datanodes
              }
            }
            if (streamerClosed || hasError || !dfsClient.clientRunning) {
              continue;
            }

			/*
			* 1、停止ResponseProcessor
			* 2、关闭输出流
			* 3、pipeline重置
			* 4、输出流状态置为PIPELINE_SETUP_CREATE，等待下一个block传输时再次建立连接
			*/
            endBlock();
          }
          if (progress != null) { progress.progress(); }

          // This is used by unit test to trigger race conditions.
          if (artificialSlowdown != 0 && dfsClient.clientRunning) {
            Thread.sleep(artificialSlowdown); 
          }
        } catch (Throwable e) {
          // Log warning if there was a real error.
          if (restartingNodeIndex == -1) {
            DFSClient.LOG.warn("DataStreamer Exception", e);
          }
          if (e instanceof IOException) {
            setLastException((IOException)e);
          } else {
            setLastException(new IOException("DataStreamer Exception: ",e));
          }
          hasError = true;
          if (errorIndex == -1 && restartingNodeIndex == -1) {
            // Not a datanode issue
            streamerClosed = true;
          }
        }
      }
      if (traceScope != null) {
        traceScope.close();
      }

	  // 释放先前没能释放成功的资源
      closeInternal();
    }

这里我们主要关注建立连接的nextBlockOutputStream()方法和启动ResponseProcessor的initDataStreaming()方法。

    private LocatedBlock nextBlockOutputStream() throws IOException {
      LocatedBlock lb = null;
      DatanodeInfo[] nodes = null;
      StorageType[] storageTypes = null;
      int count = dfsClient.getConf().nBlockWriteRetry;
      boolean success = false;
      ExtendedBlock oldBlock = block;
      do {
        hasError = false;
        lastException.set(null);
        errorIndex = -1;
        success = false;

        long startTime = Time.now();
        // 不会被选作用于存储该block的DataNode
        DatanodeInfo[] excluded =
            excludedNodes.getAllPresent(excludedNodes.asMap().keySet())
            .keySet()
            .toArray(new DatanodeInfo[0]);
        block = oldBlock;
        /* 发送addBlock方法的rpc请求给NameNode，以：
        * 1、创建block并加入该文件
        * 2、选择存储该block的DataNode并排序，返回（lb）
        */
        lb = locateFollowingBlock(startTime,
            excluded.length > 0 ? excluded : null);
        block = lb.getBlock();
        block.setNumBytes(0);
        bytesSent = 0;
        accessToken = lb.getBlockToken();
        nodes = lb.getLocations();
        storageTypes = lb.getStorageTypes();

        //
        // Connect to first DataNode in the list.
        //
        // 连接链中第一个DataNode
        success = createBlockOutputStream(nodes, storageTypes, 0L, false);

        if (!success) {
          DFSClient.LOG.info("Abandoning " + block);
          dfsClient.namenode.abandonBlock(block, fileId, src,
              dfsClient.clientName);
          block = null;
          DFSClient.LOG.info("Excluding datanode " + nodes[errorIndex]);
          excludedNodes.put(nodes[errorIndex], nodes[errorIndex]);
        }
      } while (!success && --count >= 0);

      if (!success) {
        throw new IOException("Unable to create new block.");
      }
      return lb;
    }

nextBlockOutputStream()方法向NameNode申请了新块，并获取该块的存储节点链，并与链中第一个DataNode建立连接。这里我们关注createBlockOutputStream()方法。

    private boolean createBlockOutputStream(DatanodeInfo[] nodes,
        StorageType[] nodeStorageTypes, long newGS, boolean recoveryFlag) {
      if (nodes.length == 0) {
        DFSClient.LOG.info("nodes are empty for write pipeline of block "
            + block);
        return false;
      }
      Status pipelineStatus = SUCCESS;
      String firstBadLink = "";
      boolean checkRestart = false;
      if (DFSClient.LOG.isDebugEnabled()) {
        for (int i = 0; i < nodes.length; i++) {
          DFSClient.LOG.debug("pipeline = " + nodes[i]);
        }
      }

      // persist blocks on namenode on next flush
      persistBlocks.set(true);

      int refetchEncryptionKey = 1;
      while (true) {
        boolean result = false;
        DataOutputStream out = null;
        try {
          assert null == s : "Previous socket unclosed";
          assert null == blockReplyStream : "Previous blockReplyStream unclosed";
          // 建立socket连接
          s = createSocketForPipeline(nodes[0], nodes.length, dfsClient);
          long writeTimeout = dfsClient.getDatanodeWriteTimeout(nodes.length);
          
          OutputStream unbufOut = NetUtils.getOutputStream(s, writeTimeout);
          InputStream unbufIn = NetUtils.getInputStream(s);
          IOStreamPair saslStreams = dfsClient.saslClient.socketSend(s,
            unbufOut, unbufIn, dfsClient, accessToken, nodes[0]);
          unbufOut = saslStreams.out;
          unbufIn = saslStreams.in;
          out = new DataOutputStream(new BufferedOutputStream(unbufOut,
              HdfsConstants.SMALL_BUFFER_SIZE));
          blockReplyStream = new DataInputStream(unbufIn);
  
          //
          // Xmit header info to datanode
          //
  
          BlockConstructionStage bcs = recoveryFlag? stage.getRecoveryStage(): stage;

          // We cannot change the block length in 'block' as it counts the number
          // of bytes ack'ed.
          ExtendedBlock blockCopy = new ExtendedBlock(block);
          blockCopy.setNumBytes(blockSize);

          // send the request
          // 发送请求，建立连接（WRITE_BLOCK类型）
          new Sender(out).writeBlock(blockCopy, nodeStorageTypes[0], accessToken,
              dfsClient.clientName, nodes, nodeStorageTypes, null, bcs, 
              nodes.length, block.getNumBytes(), bytesSent, newGS,
              checksum4WriteBlock, cachingStrategy.get(), isLazyPersistFile);
  
          // receive ack for connect
          // 收到ack
          BlockOpResponseProto resp = BlockOpResponseProto.parseFrom(
              PBHelper.vintPrefixed(blockReplyStream));
          pipelineStatus = resp.getStatus();
          firstBadLink = resp.getFirstBadLink();
          
          // Got an restart OOB ack.
          // If a node is already restarting, this status is not likely from
          // the same node. If it is from a different node, it is not
          // from the local datanode. Thus it is safe to treat this as a
          // regular node error.
          if (PipelineAck.isRestartOOBStatus(pipelineStatus) &&
            restartingNodeIndex == -1) {
            checkRestart = true;
            throw new IOException("A datanode is restarting.");
          }
          if (pipelineStatus != SUCCESS) {
            if (pipelineStatus == Status.ERROR_ACCESS_TOKEN) {
              throw new InvalidBlockTokenException(
                  "Got access token error for connect ack with firstBadLink as "
                      + firstBadLink);
            } else {
              throw new IOException("Bad connect ack with firstBadLink as "
                  + firstBadLink);
            }
          }
          assert null == blockStream : "Previous blockStream unclosed";
          blockStream = out;
          result =  true; // success
          restartingNodeIndex = -1;
          hasError = false;
        } catch (IOException ie) {
          if (restartingNodeIndex == -1) {
            DFSClient.LOG.info("Exception in createBlockOutputStream", ie);
          }
          if (ie instanceof InvalidEncryptionKeyException && refetchEncryptionKey > 0) {
            DFSClient.LOG.info("Will fetch a new encryption key and retry, " 
                + "encryption key was invalid when connecting to "
                + nodes[0] + " : " + ie);
            // The encryption key used is invalid.
            refetchEncryptionKey--;
            dfsClient.clearDataEncryptionKey();
            // Don't close the socket/exclude this node just yet. Try again with
            // a new encryption key.
            continue;
          }
  
          // find the datanode that matches
          if (firstBadLink.length() != 0) {
            for (int i = 0; i < nodes.length; i++) {
              // NB: Unconditionally using the xfer addr w/o hostname
              if (firstBadLink.equals(nodes[i].getXferAddr())) {
                errorIndex = i;
                break;
              }
            }
          } else {
            assert checkRestart == false;
            errorIndex = 0;
          }
          // Check whether there is a restart worth waiting for.
          if (checkRestart && shouldWaitForRestart(errorIndex)) {
            restartDeadline = dfsClient.getConf().datanodeRestartTimeout +
                Time.now();
            restartingNodeIndex = errorIndex;
            errorIndex = -1;
            DFSClient.LOG.info("Waiting for the datanode to be restarted: " +
                nodes[restartingNodeIndex]);
          }
          hasError = true;
          setLastException(ie);
          result =  false;  // error
        } finally {
          if (!result) {
            IOUtils.closeSocket(s);
            s = null;
            IOUtils.closeStream(out);
            out = null;
            IOUtils.closeStream(blockReplyStream);
            blockReplyStream = null;
          }
        }
        return result;
      }
    }

NameNode收到连接请求后，会为该block创建一个DataXceiver，这个后面到NameNode端会讲。然后回到initDataStreaming()方法。

    private void initDataStreaming() {
      this.setName("DataStreamer for file " + src +
          " block " + block);
      // 初始化ResponseProcessor（需要从哪些DataNode收取ack）
      response = new ResponseProcessor(nodes);
      // 启动ResponseProcessor
      response.start();
      // 将输出流状态置为DATA_STREAMING
      stage = BlockConstructionStage.DATA_STREAMING;
    }

该方法主要是启动ResponseProcessor线程用于收取DataNode的ack，所以我们主要关注ResponseProcessor.run()方法。

      public void run() {

        setName("ResponseProcessor for block " + block);
        // 用于反序列化ack消息
        PipelineAck ack = new PipelineAck();

        while (!responderClosed && dfsClient.clientRunning && !isLastPacketInBlock) {
          // process responses from datanodes.
          try {
            // read an ack from the pipeline
            long begin = Time.monotonicNow();
            // 反序列化ack消息
            ack.readFields(blockReplyStream);
            long duration = Time.monotonicNow() - begin;
            if (duration > dfsclientSlowLogThresholdMs
                && ack.getSeqno() != Packet.HEART_BEAT_SEQNO) {
              DFSClient.LOG
                  .warn("Slow ReadProcessor read fields took " + duration
                      + "ms (threshold=" + dfsclientSlowLogThresholdMs + "ms); ack: "
                      + ack + ", targets: " + Arrays.asList(targets));
            } else if (DFSClient.LOG.isDebugEnabled()) {
              DFSClient.LOG.debug("DFSClient " + ack);
            }

			// 获取请求序号
            long seqno = ack.getSeqno();
            // processes response status from datanodes.
            // 由于hdfs采用链式复制，所以链中第一个节点的ack消息会聚合所有链中节点的ack
            // 这里一一校验是否有DataNode复制失败
            for (int i = ack.getNumOfReplies()-1; i >=0  && dfsClient.clientRunning; i--) {
              final Status reply = ack.getReply(i);
              // Restart will not be treated differently unless it is
              // the local node or the only one in the pipeline.
              if (PipelineAck.isRestartOOBStatus(reply) &&
                  shouldWaitForRestart(i)) {
                restartDeadline = dfsClient.getConf().datanodeRestartTimeout +
                    Time.now();
                setRestartingNodeIndex(i);
                String message = "A datanode is restarting: " + targets[i];
                DFSClient.LOG.info(message);
               throw new IOException(message);
              }
              // node error
              if (reply != SUCCESS) {
                setErrorIndex(i); // first bad datanode
                throw new IOException("Bad response " + reply +
                    " for block " + block +
                    " from datanode " + 
                    targets[i]);
              }
            }
            
            assert seqno != PipelineAck.UNKOWN_SEQNO : 
              "Ack for unknown seqno should be a failed ack: " + ack;
            if (seqno == Packet.HEART_BEAT_SEQNO) {  // a heartbeat ack
              continue;
            }

            // a success ack for a data packet
            Packet one;
            synchronized (dataQueue) {
              one = ackQueue.getFirst();
            }
            /*
            * 每个block由单线程的DataStreamer负责传输，在DataNode中也由对应的单线程DataXceiver进行处理，
            * 由于提供的通信链路能保证FIFO，所以序列号应该能对上，消息不会乱序到达
            */
            if (one.seqno != seqno) {
              throw new IOException("ResponseProcessor: Expecting seqno " +
                                    " for block " + block +
                                    one.seqno + " but received " + seqno);
            }
            isLastPacketInBlock = one.lastPacketInBlock;

            // Fail the packet write for testing in order to force a
            // pipeline recovery.
            if (DFSClientFaultInjector.get().failPacket() &&
                isLastPacketInBlock) {
              failPacket = true;
              throw new IOException(
                    "Failing the last packet for testing.");
            }
              
            // update bytesAcked
            block.setNumBytes(one.getLastByteOffsetBlock());

			// ack队列移除该Packet，唤醒dataQueue
            synchronized (dataQueue) {
              lastAckedSeqno = seqno;
              ackQueue.removeFirst();
              dataQueue.notifyAll();

              one.releaseBuffer(byteArrayManager);
            }
          } catch (Exception e) {
            if (!responderClosed) {
              if (e instanceof IOException) {
                setLastException((IOException)e);
              }
              hasError = true;
              // If no explicit error report was received, mark the primary
              // node as failed.
              tryMarkPrimaryDatanodeFailed();
              synchronized (dataQueue) {
                dataQueue.notifyAll();
              }
              if (restartingNodeIndex == -1) {
                DFSClient.LOG.warn("DFSOutputStream ResponseProcessor exception "
                     + " for block " + block, e);
              }
              responderClosed = true;
            }
          }
        }
      }

（二）数据写入

out.write("Hello, HDFS".getBytes());

首先进入FilterOutputStream.write()。

    public void write(byte b[]) throws IOException {
        write(b, 0, b.length);
    }

然后进入DataOutputStream.write()。

    public synchronized void write(byte b[], int off, int len)
        throws IOException
    {
        out.write(b, off, len);
        incCount(len);
    }

进入FSDataOutputStream.write()。

    public void write(byte b[], int off, int len) throws IOException {
      out.write(b, off, len);
      position += len;                            // update position
      if (statistics != null) {
        statistics.incrementBytesWritten(len);
      }
    }

进入FSOutputSummer.write()。

  public synchronized void write(byte b[], int off, int len)
      throws IOException {
    
    checkClosed();
    
    if (off < 0 || len < 0 || off > b.length - len) {
      throw new ArrayIndexOutOfBoundsException();
    }

    for (int n=0;n<len;n+=write1(b, off+n, len-n)) {  // 注意这个write1()
    }
  }

进入FSOutputSummer.write1()。

  private int write1(byte b[], int off, int len) throws IOException {
    if(count==0 && len>=buf.length) {
      // local buffer is empty and user buffer size >= local buffer size, so
      // simply checksum the user buffer and send it directly to the underlying
      // stream
      // 如果buffer是空的，而且待写入数据大小大于buffer大小，直接生成校验和并写chunk
      final int length = buf.length;
      writeChecksumChunks(b, off, length);
      return length;
    }
    
    // copy user data to local buffer
    // 如果buffer不为空，首先计算buffer剩余大小，
    // 并填充对应长度的数据进入buffer
    int bytesToCopy = buf.length-count;
    bytesToCopy = (len<bytesToCopy) ? len : bytesToCopy;
    System.arraycopy(b, off, buf, count, bytesToCopy);
    count += bytesToCopy;
    if (count == buf.length) {
      // local buffer is full
      // buffer满了，flush
      flushBuffer();
    } 
    return bytesToCopy;
  }

该方法用于写chunk，关键方法是writeChecksumChunks()，flushBuffer()中该方法也为关键方法。

  private void writeChecksumChunks(byte b[], int off, int len)
  throws IOException {
    // 计算校验和
    sum.calculateChunkedSums(b, off, len, checksum, 0);
    // sum.getBytesPerChecksum()一般为512B
    for (int i = 0; i < len; i += sum.getBytesPerChecksum()) {
      // 该chunk的长度
      int chunkLen = Math.min(sum.getBytesPerChecksum(), len - i);
      // 校验和的偏移量，CRC32C中，校验和的长度为4B
      // 校验和和chunk数据在Packet中是分开存的，所以可以靠此偏移量找到校验和应填充的位置
      int ckOffset = i / sum.getBytesPerChecksum() * getChecksumSize();
      // 写chunk
      writeChunk(b, off + i, chunkLen, checksum, ckOffset, getChecksumSize());
    }
  }

关键关注writeChunk()。

  protected synchronized void writeChunk(byte[] b, int offset, int len,
      byte[] checksum, int ckoff, int cklen) throws IOException {
    dfsClient.checkOpen();
    checkClosed();

    if (len > bytesPerChecksum) {
      throw new IOException("writeChunk() buffer size is " + len +
                            " is larger than supported  bytesPerChecksum " +
                            bytesPerChecksum);
    }
    if (cklen != 0 && cklen != getChecksumSize()) {
      throw new IOException("writeChunk() checksum size is supposed to be " +
                            getChecksumSize() + " but found to be " + cklen);
    }

	// 如果当前Packet为空，就创建Packet
    if (currentPacket == null) {
      // packetSize为65532（64K），chunksPerPacket为127（一个Packet有127个chunk）
      currentPacket = createPacket(packetSize, chunksPerPacket, 
          bytesCurBlock, currentSeqno++);
      if (DFSClient.LOG.isDebugEnabled()) {
        DFSClient.LOG.debug("DFSClient writeChunk allocating new packet seqno=" + 
            currentPacket.seqno +
            ", src=" + src +
            ", packetSize=" + packetSize +
            ", chunksPerPacket=" + chunksPerPacket +
            ", bytesCurBlock=" + bytesCurBlock);
      }
    }

	// 写校验和（checksum的范围是33 ~ 33 + 4 * 127），4 * 127为校验和大小 * chunk数量
    currentPacket.writeChecksum(checksum, ckoff, cklen);
    // 写chunk数据（chunk数据的范围是33 + 4 * 127 ~ 65532）
    currentPacket.writeData(b, offset, len);
    currentPacket.numChunks++;
    bytesCurBlock += len;

    // If packet is full, enqueue it for transmission
    // 如果Packet或者block满了，将现在的Packet加入dataQueue
    if (currentPacket.numChunks == currentPacket.maxChunks ||
        bytesCurBlock == blockSize) {
      if (DFSClient.LOG.isDebugEnabled()) {
        DFSClient.LOG.debug("DFSClient writeChunk packet full seqno=" +
            currentPacket.seqno +
            ", src=" + src +
            ", bytesCurBlock=" + bytesCurBlock +
            ", blockSize=" + blockSize +
            ", appendChunk=" + appendChunk);
      }

	  // 将Packet加入dataQueue
      waitAndQueueCurrentPacket();

      // If the reopened file did not end at chunk boundary and the above
      // write filled up its partial chunk. Tell the summer to generate full 
      // crc chunks from now on.
      if (appendChunk && bytesCurBlock%bytesPerChecksum == 0) {
        appendChunk = false;
        resetChecksumBufSize();
      }

      if (!appendChunk) {
        int psize = Math.min((int)(blockSize-bytesCurBlock), dfsClient.getConf().writePacketSize);
        computePacketChunkSize(psize, bytesPerChecksum);
      }
      //
      // if encountering a block boundary, send an empty packet to 
      // indicate the end of block and reset bytesCurBlock.
      //
      if (bytesCurBlock == blockSize) {
        currentPacket = createPacket(0, 0, bytesCurBlock, currentSeqno++);
        currentPacket.lastPacketInBlock = true;
        currentPacket.syncBlock = shouldSyncBlock;
        waitAndQueueCurrentPacket();
        bytesCurBlock = 0;
        lastFlushOffset = 0;
      }
    }
  }

着重关注waitAndQueueCurrentPacket()。

  private void waitAndQueueCurrentPacket() throws IOException {
    synchronized (dataQueue) {
      try {
      // If queue is full, then wait till we have enough space
      while (!closed && dataQueue.size() + ackQueue.size()  > dfsClient.getConf().writeMaxPackets) {
        try {
          dataQueue.wait();
        } catch (InterruptedException e) {
          // If we get interrupted while waiting to queue data, we still need to get rid
          // of the current packet. This is because we have an invariant that if
          // currentPacket gets full, it will get queued before the next writeChunk.
          //
          // Rather than wait around for space in the queue, we should instead try to
          // return to the caller as soon as possible, even though we slightly overrun
          // the MAX_PACKETS length.
          Thread.currentThread().interrupt();
          break;
        }
      }
      checkClosed();
      // 将当前Packet加入dataQueue
      queueCurrentPacket();
      } catch (ClosedChannelException e) {
      }
    }
  }

继续看queueCurrentPacket()。

  private void queueCurrentPacket() {
    synchronized (dataQueue) {
      if (currentPacket == null) return;
      // 当前Packet加入dataQueue
      dataQueue.addLast(currentPacket);
      lastQueuedSeqno = currentPacket.seqno;
      if (DFSClient.LOG.isDebugEnabled()) {
        DFSClient.LOG.debug("Queued packet " + currentPacket.seqno);
      }
      currentPacket = null;
      dataQueue.notifyAll();
    }
  }

至此传输流程已通。

（三）输出流关闭

fs.close();

最后，在输出流关闭之后（DistributedFileSystem.close()中关闭输出流），会进行complete()远程调用，用于通知NameNode完成一个block。首先看DFSOutputStream.close()。

  public synchronized void close() throws IOException {
    if (closed) {
      IOException e = lastException.getAndSet(null);
      if (e == null)
        return;
      else
        throw e;
    }

    try {
      flushBuffer();       // flush from all upper layers

      if (currentPacket != null) { 
        waitAndQueueCurrentPacket();
      }

      if (bytesCurBlock != 0) {
        // send an empty packet to mark the end of the block
        currentPacket = createPacket(0, 0, bytesCurBlock, currentSeqno++);
        currentPacket.lastPacketInBlock = true;
        currentPacket.syncBlock = shouldSyncBlock;
      }

      flushInternal();             // flush all data to Datanodes
      // get last block before destroying the streamer
      ExtendedBlock lastBlock = streamer.getBlock();
      closeThreads(false);
      // complete() rpc调用
      completeFile(lastBlock);
      dfsClient.endFileLease(fileId);
    } catch (ClosedChannelException e) {
    } finally {
      closed = true;
    }
  }

该方法释放所有与该输出流关联的资源，比如把还没传输完的chunk传输完、停止DataStreamer和ResponseProcessor线程等。着重关注completeFile()。

  private void completeFile(ExtendedBlock last) throws IOException {
    long localstart = Time.now();
    long localTimeout = 400;
    boolean fileComplete = false;
    int retries = dfsClient.getConf().nBlockWriteLocateFollowingRetry;
    while (!fileComplete) {
      // complete() rpc
      fileComplete =
          dfsClient.namenode.complete(src, dfsClient.clientName, last, fileId);
      if (!fileComplete) {
        final int hdfsTimeout = dfsClient.getHdfsTimeout();
        if (!dfsClient.clientRunning ||
              (hdfsTimeout > 0 && localstart + hdfsTimeout < Time.now())) {
            String msg = "Unable to close file because dfsclient " +
                          " was unable to contact the HDFS servers." +
                          " clientRunning " + dfsClient.clientRunning +
                          " hdfsTimeout " + hdfsTimeout;
            DFSClient.LOG.info(msg);
            throw new IOException(msg);
        }
        try {
          if (retries == 0) {
            throw new IOException("Unable to close file because the last block"
                + " does not have enough number of replicas.");
          }
          retries--;
          Thread.sleep(localTimeout);
          localTimeout *= 2;
          if (Time.now() - localstart > 5000) {
            DFSClient.LOG.info("Could not complete " + src + " retrying...");
          }
        } catch (InterruptedException ie) {
          DFSClient.LOG.warn("Caught exception ", ie);
        }
      }
    }
  }

需要注意，这里只需要对文件的最后一个block进行complete()，因为在addBlock的时候，会尝试complete该文件之前未complete的block，由于block是顺序提交的，既然已经申请了下一个block，那之前的block应该都已经完成了最小副本数的存储。

二、NameNode端

（一）create

首先找到NameNode的rpc服务端，进入NameNodeRpcServer.create()。

  public HdfsFileStatus create(String src, FsPermission masked,
      String clientName, EnumSetWritable<CreateFlag> flag,
      boolean createParent, short replication, long blockSize,
      CryptoProtocolVersion[] supportedVersions, String ecPolicyName)
      throws IOException {
    checkNNStartup();
    // 发起请求的客户端ip
    String clientMachine = getClientMachine();  
    if (stateChangeLog.isDebugEnabled()) {
      stateChangeLog.debug("*DIR* NameNode.create: file "
          +src+" for "+clientName+" at "+clientMachine);
    }
    // 目录的长度（8000）和深度（1000）是否超出限制
    if (!checkPathLength(src)) {  
      throw new IOException("create: Pathname too long.  Limit "
          + MAX_PATH_LENGTH + " characters, " + MAX_PATH_DEPTH + " levels.");
    }
    // 当前NameNode的状态（active、backup、standby）是否支持该操作
    namesystem.checkOperation(OperationCategory.WRITE);  
    CacheEntryWithPayload cacheEntry = RetryCache.waitForCompletion(retryCache, null);
    if (cacheEntry != null && cacheEntry.isSuccess()) {
      return (HdfsFileStatus) cacheEntry.getPayload();
    }

    HdfsFileStatus status = null;
    try {
      PermissionStatus perm = new PermissionStatus(getRemoteUser()
          .getShortUserName(), null, masked);
      // 创建文件
      status = namesystem.startFile(src, perm, clientName, clientMachine,  
          flag.get(), createParent, replication, blockSize, supportedVersions,
          ecPolicyName, cacheEntry != null);
    } finally {
      RetryCache.setState(cacheEntry, status != null, status);
    }

    metrics.incrFilesCreated();
    metrics.incrCreateFileOps();
    return status;
  }

这里我们着重看FSNamesystem.startFile()。

  HdfsFileStatus startFile(String src, PermissionStatus permissions,
      String holder, String clientMachine, EnumSet<CreateFlag> flag,
      boolean createParent, short replication, long blockSize,
      CryptoProtocolVersion[] supportedVersions, String ecPolicyName,
      boolean logRetryCache) throws IOException {

    HdfsFileStatus status;
    try {
      // 创建文件
      status = startFileInt(src, permissions, holder, clientMachine, flag,  
          createParent, replication, blockSize, supportedVersions, ecPolicyName,
          logRetryCache);
    } catch (AccessControlException e) {
      logAuditEvent(false, "create", src);
      throw e;
    }
    logAuditEvent(true, "create", src, status);
    return status;
  }

不需要关注ecPolicy等相关参数，这是利用纠删码（Erasure Coding）实现条带式（striped）存储的方式，可以降低数据存储的量级，这里我们不考虑这些。继续看startFileInt()。

  private HdfsFileStatus startFileInt(String src,
      PermissionStatus permissions, String holder, String clientMachine,
      EnumSet<CreateFlag> flag, boolean createParent, short replication,
      long blockSize, CryptoProtocolVersion[] supportedVersions,
      String ecPolicyName, boolean logRetryCache) throws IOException {
    if (NameNode.stateChangeLog.isDebugEnabled()) {
      StringBuilder builder = new StringBuilder();
      builder.append("DIR* NameSystem.startFile: src=").append(src)
          .append(", holder=").append(holder)
          .append(", clientMachine=").append(clientMachine)
          .append(", createParent=").append(createParent)
          .append(", replication=").append(replication)
          .append(", createFlag=").append(flag)
          .append(", blockSize=").append(blockSize)
          .append(", supportedVersions=")
          .append(Arrays.toString(supportedVersions));
      NameNode.stateChangeLog.debug(builder.toString());
    }
    if (!DFSUtil.isValidName(src) ||                       // 路径是否合法
        FSDirectory.isExactReservedName(src) ||            // 路径是否是reserved的
        (FSDirectory.isReservedName(src)                   // 同上
            && !FSDirectory.isReservedRawName(src)         // 是否是预留raw
            && !FSDirectory.isReservedInodesName(src))) {  // 是否是预留inode
      throw new InvalidPathException(src);
    }

    boolean shouldReplicate = flag.contains(CreateFlag.SHOULD_REPLICATE);
    if (shouldReplicate &&
        (!org.apache.commons.lang.StringUtils.isEmpty(ecPolicyName))) {
      throw new HadoopIllegalArgumentException("SHOULD_REPLICATE flag and " +
          "ecPolicyName are exclusive parameters. Set both is not allowed!");
    }

    INodesInPath iip = null;
    boolean skipSync = true; // until we do something that might create edits
    HdfsFileStatus stat = null;
    BlocksMapUpdateInfo toRemoveBlocks = null;

    checkOperation(OperationCategory.WRITE);
    final FSPermissionChecker pc = getPermissionChecker();
    writeLock();
    try {
      checkOperation(OperationCategory.WRITE);
      checkNameNodeSafeMode("Cannot create file" + src);

	  // 获取路径中的inodes，INodesInPath中包含了从根目录到当前文件的各级inode信息
      iip = FSDirWriteFileOp.resolvePathForStartFile(  
          dir, pc, src, flag, createParent);


      if (blockSize < minBlockSize) {
        throw new IOException("Specified block size is less than configured" +
            " minimum value (" + DFSConfigKeys.DFS_NAMENODE_MIN_BLOCK_SIZE_KEY
            + "): " + blockSize + " < " + minBlockSize);
      }

      if (shouldReplicate) {
        blockManager.verifyReplication(src, replication, clientMachine);
      } else {
        final ErasureCodingPolicy ecPolicy = FSDirErasureCodingOp
            .getErasureCodingPolicy(this, ecPolicyName, iip);
        if (ecPolicy != null && (!ecPolicy.isReplicationPolicy())) {
          checkErasureCodingSupported("createWithEC");
          if (blockSize < ecPolicy.getCellSize()) {
            throw new IOException("Specified block size (" + blockSize
                + ") is less than the cell size (" + ecPolicy.getCellSize()
                +") of the erasure coding policy (" + ecPolicy + ").");
          }
        } else {
          // 判断副本数是否超出配置文件设置的限制
          blockManager.verifyReplication(src, replication, clientMachine);
        }
      }

      FileEncryptionInfo feInfo = null;
      if (!iip.isRaw() && provider != null) {
        EncryptionKeyInfo ezInfo = FSDirEncryptionZoneOp.getEncryptionKeyInfo(
            this, iip, supportedVersions);
        // if the path has an encryption zone, the lock was released while
        // generating the EDEK.  re-resolve the path to ensure the namesystem
        // and/or EZ has not mutated
        if (ezInfo != null) {
          checkOperation(OperationCategory.WRITE);
          iip = FSDirWriteFileOp.resolvePathForStartFile(
              dir, pc, iip.getPath(), flag, createParent);
          feInfo = FSDirEncryptionZoneOp.getFileEncryptionInfo(
              dir, iip, ezInfo);
        }
      }

      skipSync = false; // following might generate edits
      toRemoveBlocks = new BlocksMapUpdateInfo();
      dir.writeLock();
      try {
        // 创建文件
        stat = FSDirWriteFileOp.startFile(this, iip, permissions, holder,
            clientMachine, flag, createParent, replication, blockSize, feInfo,
            toRemoveBlocks, shouldReplicate, ecPolicyName, logRetryCache);
      } catch (IOException e) {
        skipSync = e instanceof StandbyException;
        throw e;
      } finally {
        dir.writeUnlock();
      }
    } finally {
      writeUnlock("create");
      // There might be transactions logged while trying to recover the lease.
      // They need to be sync'ed even when an exception was thrown.
      if (!skipSync) {
        // 写edit log，实际上就是预写日志
        getEditLog().logSync();
        if (toRemoveBlocks != null) {
          removeBlocks(toRemoveBlocks);
          toRemoveBlocks.clear();
        }
      }
    }

    return stat;
  }

未完待续

你可能感兴趣的:(hdfs,hadoop,大数据)

数字孪生技术为UI前端注入新活力：实现产品设计的沉浸式体验 ui设计前端开发老司机 ui
hello宝子们...我们是艾斯视觉擅长ui设计、前端开发、数字孪生、大数据、三维建模、三维动画10年+经验!希望我的分享能帮助到您!如需帮助可以评论关注私信我们一起探讨!致敬感谢感恩!一、引言：从“平面交互”到“沉浸体验”的UI革命当用户在电商APP中翻看3D家具模型却无法感知其与自家客厅的匹配度，当设计师在2D屏幕上绘制汽车内饰却难以预判实际乘坐体验——传统UI设计的“平面化、静态化、割裂感”
提升企业级数据处理效率！TDengine 四个集群优化点详解 TDengine （老段） TDengine 运维大数据数据库物联网时序数据库服务器运维 tdengine
为了帮助企业更好地进行大数据处理，我们在此前TDengine3.x系列版本中进行了几项与集群相关的优化和新功能开发，以提升集群的稳定性和在异常情况下的恢复能力。这些优化包括clusterID隔离、leaderrebalance、raftlearner和restorednode。本文将对这几项重要优化进行详细阐述，以解答企业在此领域的疑问，并帮助大家更好地应对相关挑战。clusterID隔离问题fi
ETL可视化工具 DataX -- 简介( 一) dazhong2012 软件工具数据仓库 datax ETL
引言DataX系列文章：ETL可视化工具DataX–安装部署(二)ETL可视化工具DataX–DataX-Web安装(三)1.1DataX1.1.1DataX概览DataX是阿里云DataWorks数据集成的开源版本，在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX实现了包括MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、
中国银联豪掷1亿采购海光C86架构服务器信创新态势海光芯片 C86 国产芯片海光信息
近日，中国银联国产服务器采购大单正式敲定，基于海光C86架构的服务器产品中标，项目金额超过1亿元。接下来，C86服务器将用于支撑中国银联的虚拟化、大数据、人工智能、研发测试等技术场景，进一步提升其业务处理能力、用户服务效率和信息安全水平。作为我国重要的银行卡组织和金融基础设施，中国银联在全球183个国家和地区设有银联受理网络，境内外成员机构超过2600家，是世界三大银行卡品牌之一。此次中国银联发力
全面探索Kafka：架构、应用与流处理
Kafka：企业级消息系统与流处理平台的深度解析ApacheKafka作为分布式流处理平台，广泛应用于大数据处理和实时分析领域。本文将基于其官方文档，详细探讨Kafka的核心功能、应用场景以及如何进行有效管理。背景简介Kafka作为高吞吐量的消息系统，支持企业级的发布-订阅模式。它能够处理大量实时数据，并支持高并发读写操作。本文将依据Kafka官方文档的内容，逐层深入，从入门到高级应用，帮助读者全
Flink时间窗口详解 bxlj_jcj Flink flink 大数据
一、引言在大数据流处理的领域中，Flink的时间窗口是一项极为关键的技术，想象一下，你要统计一个电商网站每小时的订单数量。由于订单数据是持续不断产生的，这就形成了一个无界数据流。如果没有时间窗口的概念，你就需要处理无穷无尽的数据，难以进行有效的统计分析。而时间窗口的作用，就是将这无界的数据流按照时间维度切割成一个个有限的“数据块”，方便我们对这些数据进行处理和分析。比如，我们可以定义一个1小时的时
探索实时流处理的未来：Kafka Streams 深度指南秋或依
探索实时流处理的未来：KafkaStreams深度指南项目介绍欢迎进入KafkaStreams：实时流处理的世界！这不仅仅是一本书，更是一个通往流处理领域深层奥秘的门户。由PrashantPandey编著，这本书以ApacheKafka2.1中的KafkaStreams库为核心，为读者铺就了一条从理解基础概念到熟练掌握KafkaStreams编程的路径。无论是软件工程师、数据架构师，还是对大数据处
Elasticsearch搜索引擎存储：从原理到实践的全景解析 Python×CATIA工业智造搜索引擎 elasticsearch 大数据
引言在大数据时代，数据规模呈指数级增长，传统数据库的模糊查询、实时分析能力逐渐成为瓶颈。Elasticsearch（简称ES）凭借其分布式架构、实时搜索和灵活的数据分析能力，成为企业级搜索与存储的核心引擎。截至2025年，ES在全球日志分析、电商搜索、实时监控等场景的市场占有率超过60%。本文将从存储架构、核心技术、应用场景及优化策略四个维度，深入解析Elasticsearch的设计哲学与实践价值
【Kafka专栏 13】Kafka的消息确认机制：不是所有的“收到”都叫“确认”！
作者名称：夏之以寒作者简介：专注于Java和大数据领域，致力于探索技术的边界，分享前沿的实践和洞见文章专栏：夏之以寒-kafka专栏专栏介绍：本专栏旨在以浅显易懂的方式介绍Kafka的基本概念、核心组件和使用场景，一步步构建起消息队列和流处理的知识体系，无论是对分布式系统感兴趣，还是准备在大数据领域迈出第一步，本专栏都提供所需的一切资源、指导，以及相关面试题，立刻免费订阅，开启Kafka学习之旅！
Hive简介
文章目录Hive简介Hive特点Hive和RDBMS的对比Hive的架构Hive的数据组织Hive数据类型Hive简介1、Hive由Facebook实现并开源2、是基于Hadoop的一个数据仓库工具3、可以将结构化的数据映射为一张数据库表4、并提供HQL(HiveSQL)查询功能5、底层数据是存储在HDFS上6、Hive的本质是将SQL语句转换为MapReduce任务运行7、使不熟悉MapRedu
C语言学生成绩管理系统<；自创>；(功能7有小错误,但可运行） han_xue_feng java
腾讯云加速企业和个人开发创新公开直播预告直播预告：07/18(周四)15:00-16:00随着人工智能与大模型的蓬勃发展，我们正步入一个由技微信实习第一天周五入职，早上早早来到了公司，发现好多人都没上班，到十点才陆陆续续有人来，办理完入职后，mentor中联夏令营遗憾没有入选不过hr的回复真的很好，辛苦啦#提前批简历挂麻了怎么办##机械制造投递记录#大数据开发的工作有点过于简单了吧sq大数据开发的
Python爬虫：从图片或扫描文档中提取文字数据的完整指南 Python爬虫项目 2025年爬虫实战项目 python 爬虫开发语言数据挖掘 c++
1.引言随着大数据技术的不断进步，图像数据逐渐成为了许多行业中重要的数据源之一。图像中不仅包含了丰富的视觉信息，还可能蕴含着大量的文字数据。对于科研、企业、政府等多个领域而言，如何从图片或扫描文档中提取出有价值的文字信息是一个亟待解决的问题。在这一过程中，OCR（OpticalCharacterRecognition，光学字符识别）技术成为了解决这一问题的重要工具。在本文中，我们将探讨如何使用Py
【C语言经典面试题】memcpy函数有没有更高效的拷贝实现方法？架构师李肯嵌入式物联网开发进阶 c语言面试性能优化
【C语言经典面试题】memcpy函数有没有更高效的拷贝实现方法？我相信大部分初中级C程序员在面试的过程中，可能都被问过关于memcpy函数的问题，甚至需要手撕memcpy。本文从另一个角度带你领悟一下memcpy的面试题，你可以看看是否能接得住？文章目录1写在前面2源码实现2.1函数申明2.2简单的功能实现2.3满足大数据量拷贝的功能实现3源码测试4小小总结5更多分享1写在前面假如你遇到下面的面试
python基于Hadoop的NBA球员大数据分析与可视化系统
目录技术栈介绍具体实现截图系统设计研究方法：设计步骤设计流程核心代码部分展示研究方法详细视频演示试验方案论文大纲源码获取/详细视频演示技术栈介绍Django-SpringBoot-php-Node.js-flask本课题的研究方法和研究步骤基本合理，难度适中，本选题是学生所学专业知识的延续，符合学生专业发展方向，对于提高学生的基本知识和技能以及钻研能力有益。该学生能够在预定时间内完成该课题的设计。
大数据技术之集群数据迁移
dfs.namenode.rpc-address.nameservice1.namenode30hadoop104:8020dfs.namenode.rpc-address.nameservice1.namenode37hadoop106:8020dfs.namenode.http-address.nameservice1.namenode30hadoop104:9870dfs.namenode.
HIVE（二） 2301_78012738 hive 数据仓库
目录访问HIVE的三种方式DDLDML数据操作向表中装载数据数据导出常用函数Like和RLike分组Join排序分区表和分桶表访问HIVE的三种方式启动Hive命令，CtrlC退出客户端，执行测试语句，与sql一致[wyc@hadoop102hive]$bin/hive经验小结：在hive中执行语句报错：ExecutionError,returncode2fromorg.apache.hadoop
如何通过YashanDB优化企业大数据处理流程数据库
在当今数据驱动的商业环境中，企业面临着巨大的数据处理挑战。性能瓶颈、数据一致性问题和可扩展性需求使得大数据处理成为一项复杂任务。作为一种新兴的数据库管理系统，YashanDB以其独特的架构设计和强大的数据处理能力，在解决这些挑战方面提供了有效的手段。本文旨在探讨如何利用YashanDB优化大数据处理流程，为企业提供高效、可靠的解决方案。YashanDB的体系架构与部署形态YashanDB支持多种部
Pandas 学习教程 _pass_ Data-Alaysis pandas 信息可视化
目录定义基本操作一维数组操作二维数组操作数据选择过滤数据处理数据清洗数据转换数据分析排序分组聚合数据透视表高级操作合并数据时间序列处理自定义函数调用数据可视化集成数据导出和导入大数据分块处理定义全称：'paneldata'and'pythondataanalysis'Analy:Series(一维数据)、DataFrame(二维数据)主要应用：数据清洗：处理缺失数据、重复数据等数据转换：改变数据的
如何通过YashanDB提升客户体验数据库
如何优化查询速度？这是许多企业在使用数据库技术时常常会遇到的问题。查询速度的快慢直接影响到用户的体验，尤其是在大数据量和高并发的使用场景中。顾客期望迅速获取信息，若响应时间过长，可能导致客户流失。因此，优化数据库的性能成为提升客户体验的关键举措之一。YashanDB作为一种高性能的数据库技术架构，提供了多种优化机制，以提升系统的查询速度和整体处理能力。多种部署架构YashanDB支持多种部署架构，
如何通过YashanDB数据库实现企业级数据分区管理？数据库
在当今大数据时代，企业面临着海量数据的管理和优化访问的问题。如何有效地组织和划分庞大的数据集，以提升查询性能和运维效率，成为数据库系统设计的核心挑战。数据分区技术作为解决大规模数据处理的关键手段，能够显著减少无关数据的访问，优化资源利用率。本文聚焦于YashanDB数据库，详细解析其数据分区管理的实现机制及应用，为企业级应用提供高效、灵活的数据分区解决方案。YashanDB中的数据分区基础Yash
国产开源高性能对象存储RustFS保姆级上手指南光爷不秃对象存储 rust 国产开源软件 rust 云计算开源软件 github 开源数据仓库 database
在云计算与大数据爆发的时代，企业和开发者对存储方案的要求愈发严苛——不仅要能扛住海量数据的读写压力，还得兼顾安全性、可扩展性和兼容性。今天给大家介绍一款基于Rust语言开发的开源分布式对象存储系统——RustFS，它不仅是MinIO的国产化优秀替代方案，更是AI、大数据和云原生场景的理想之选。本文将从基础介绍到实战操作，带大家快速上手这款"优雅的存储解决方案"。一、RustFS核心特性解析Rust
通过YashanDB提升大数据处理能力的指南数据库
数据的急剧增长给数据库技术领域带来了诸多挑战，包括性能瓶颈、数据一致性问题及处理效率低下等。为了应对这些挑战，企业需采取有效的技术手段来提升大数据处理能力。YashanDB作为一款高性能的数据库产品，通过其先进的体系架构、优化的数据存储形式以及强大的并发控制能力，有效地提升了大数据环境下的处理性能。本文旨在为技术人员和决策者提供深入的技术分析和可操作的建议，通过YashanDB的功能特性来实现大数
Java多线程实战指南：从基础到高并发的核心技术解析添砖Java中 java python 开发语言 spring boot spring cloud spring
一、为什么必须掌握多线程？在单核CPU时代，多线程主要用于提高程序响应速度；在如今的多核处理器时代，多线程已成为榨干硬件性能的必备技能。无论是高并发Web服务器、实时数据处理系统，还是游戏引擎，都离不开多线程技术的支撑。典型案例：电商秒杀系统：1秒内处理10万+请求大数据处理：并行计算TB级数据金融交易系统：毫秒级订单撮合二、线程创建的四大核心方式1.继承Thread类（不推荐）classMyTh
安全运维的 “五层防护”：构建全方位安全体系 KKKlucifer 安全运维
在数字化运维场景中，异构系统复杂、攻击手段隐蔽等挑战日益突出。保旺达基于“全域纳管-身份认证-行为监测-自动响应-审计溯源”的五层防护架构，融合AI、零信任等技术，构建全链路安全运维体系，以下从技术逻辑与实践落地展开解析：第一层：全域资产纳管——筑牢安全根基挑战云网基础设施包含分布式计算（Hadoop/Spark）、数据流处理（Storm/Flink）等异构组件，通信协议繁杂，传统方案难以全面纳管
3D 可视化技术开启污水治理全新发展阶段广州华锐视点 3d
3D可视化大屏展示技术在污水厂的应用，已然开启了污水处理的全新篇章。它不仅为污水厂解决了当下管理和展示的难题，更如同一座灯塔，照亮了未来污水处理领域的发展道路。随着科技的持续进步，3D可视化大屏展示技术必将迎来更加辉煌的发展。一方面，其与人工智能、大数据、物联网等前沿技术的融合将愈发紧密。借助人工智能算法，大屏系统将具备更强大的自主学习和分析能力，能够根据实时数据和历史经验，自动优化污水处理工艺参
UI前端大数据可视化实战策略：如何设计交互式数据探索界面？ UI前端开发工作室 ui 前端信息可视化
hello宝子们...我们是艾斯视觉擅长ui设计、前端开发、数字孪生、大数据、三维建模、三维动画10年+经验!希望我的分享能帮助到您!如需帮助可以评论关注私信我们一起探讨!致敬感谢感恩!一、引言：从“被动观看”到“主动探索”的可视化革命传统大数据可视化常陷入“图表堆砌”的困境：企业dashboard上布满折线图、饼图，却难以回答“销售额下降的核心区域是哪里”“用户流失与哪个行为强相关”等深度问题。
【HTML网页】智能健康监测——全方位健康管理专家（包含网页源代码）
智能健康监测分析系统智能健康监测分析系统是一种基于物联网、大数据、人工智能等技术的综合性健康管理解决方案。它具有以下六大核心功能：实时监测系统通过智能传感器和可穿戴设备，实时采集用户的生理数据，例如心率、血压、血氧饱和度、血糖水平和睡眠质量等，确保用户随时掌握自己的身体状况。健康数据分析利用人工智能和大数据分析技术，系统对采集到的数据进行处理和分析，提取有价值的健康信息，如心率变异性、呼吸频率等，
ftp文件服务器有连接数限制,查看ftp服务器连接数命令赵承铭 ftp文件服务器有连接数限制
查看ftp服务器连接数命令内容精选换一换本章节适用于MRS3.x之前版本。Loader支持以下多种连接，每种连接的配置介绍可根据本章节内容了解。obs-connectorgeneric-jdbc-connectorftp-connector或sftp-connectorhbase-connector、hdfs-connector或hive-connectorOBS连接是Loa“数据导入”章节适用于
Elasticsearch 高可用实战：架构设计与场景化解决方案辣呼呼的哈哈 Elasticsearch 入门到精通 elasticsearch wpf 大数据全文检索搜索引擎 restful java
Elasticsearch高可用实战：架构设计与场景化解决方案本文深入探讨Elasticsearch在高并发、大数据量场景下的高可用架构设计，结合电商搜索、日志分析等真实案例，提供可落地的技术方案与Java实现。一、高可用架构设计原则1.分布式架构核心要素客户端负载均衡层协调节点数据节点-分片1数据节点-分片2数据节点-分片3副本分片副本分片副本分片2.高可用黄金法则冗余设计：至少3节点集群+1副
oracle 数据库迁移expdp，impdp（数据泵导出导入）方法小张是铁粉 oracle 数据库
一.优缺点优点：1.高效性能：expdp，impdp使用并行技术，可以显著提高导出导入速度，尤其适用于大数据量的迁移。支持压缩和加密，减少导出文件的大小并提高安全性。2.灵活的对象选择：可以导出整个数据库、特定表空间、用户（Schema）或单个表。支持过滤条件，例如只导出特定表的数据或元数据。3.跨平台兼容性：支持跨平台迁移（例如从Linux到Windows），但需要注意字节序（endiannes
怎么样才能成为专业的程序员？ cocos2d-x小菜编程 PHP
如何要想成为一名专业的程序员？仅仅会写代码是不够的。从团队合作去解决问题到版本控制，你还得具备其他关键技能的工具包。当我们询问相关的专业开发人员，那些必备的关键技能都是什么的时候，下面是我们了解到的情况。关于如何学习代码，各种声音很多，然后很多人就被误导为成为专业开发人员懂得一门编程语言就够了？！呵呵，就像其他工作一样，光会一个技能那是远远不够的。如果你想要成为
java web开发高并发处理 BreakingBad java Web 并发开发处理高
java处理高并发高负载类网站中数据库的设计方法（java教程,java处理大量数据，java高负载数据）一：高并发高负载类网站关注点之数据库没错,首先是数据库,这是大多数应用所面临的首个SPOF。尤其是Web2.0的应用，数据库的响应是首先要解决的。一般来说MySQL是最常用的，可能最初是一个mysql主机，当数据增加到100万以上，那么，MySQL的效能急剧下降。常用的优化措施是M-S（
mysql批量更新 ekian mysql
mysql更新优化：一版的更新的话都是采用update set的方式，但是如果需要批量更新的话，只能for循环的执行更新。或者采用executeBatch的方式，执行更新。无论哪种方式，性能都不见得多好。三千多条的更新，需要3分多钟。查询了批量更新的优化，有说replace into的方式，即： replace into tableName(id,status) values
微软BI（3） 18289753290 微软BI SSIS
1) Q：该列违反了完整性约束错误；已获得 OLE DB 记录。源:“Microsoft SQL Server Native Client 11.0” Hresult: 0x80004005 说明:“不能将值 NULL 插入列 'FZCHID'，表 'JRB_EnterpriseCredit.dbo.QYFZCH'；列不允许有 Null 值。INSERT 失败。”。 A：一般这类问题的存在是
Java中的List g21121 java
List是一个有序的 collection（也称为序列）。此接口的用户可以对列表中每个元素的插入位置进行精确地控制。用户可以根据元素的整数索引（在列表中的位置）访问元素，并搜索列表中的元素。与 set 不同，列表通常允许重复
读书笔记永夜-极光读书笔记
1. K是一家加工厂,需要采购原材料,有A,B,C,D 4家供应商,其中A给出的价格最低,性价比最高,那么假如你是这家企业的采购经理,你会如何决策? 传统决策: A:100%订单 B,C,D:0% &nbs
centos 安装 Codeblocks 随便小屋 codeblocks
1.安装gcc,需要c和c++两部分,默认安装下,CentOS不安装编译器的,在终端输入以下命令即可yum install gccyum install gcc-c++ 2.安装gtk2-devel,因为默认已经安装了正式产品需要的支持库,但是没有安装开发所需要的文档.yum install gtk2* 3. 安装wxGTK yum search w
23种设计模式的形象比喻 aijuans 设计模式
1、ABSTRACT FACTORY—追MM少不了请吃饭了，麦当劳的鸡翅和肯德基的鸡翅都是MM爱吃的东西，虽然口味有所不同，但不管你带MM去麦当劳或肯德基，只管向服务员说“来四个鸡翅”就行了。麦当劳和肯德基就是生产鸡翅的Factory 　　工厂模式：客户类和工厂类分开。消费者任何时候需要某种产品，只需向工厂请求即可。消费者无须修改就可以接纳新产品。缺点是当产品修改时，工厂类也要做相应的修改。如：
开发管理 CheckLists aoyouzi 开发管理 CheckLists
开发管理 CheckLists(23) -使项目组度过完整的生命周期开发管理 CheckLists(22) -组织项目资源开发管理 CheckLists(21) -控制项目的范围开发管理 CheckLists(20) -项目利益相关者责任开发管理 CheckLists(19) -选择合适的团队成员开发管理 CheckLists(18) -敏捷开发 Scrum Master 工作开发管理 C
js实现切换百合不是茶 JavaScript 栏目切换
js主要功能之一就是实现页面的特效,窗体的切换可以减少页面的大小,被门户网站大量应用思路: 1,先将要显示的设置为display:bisible 否则设为none 2,设置栏目的id ,js获取栏目的id,如果id为Null就设置为显示 3,判断js获取的id名字;再设置是否显示代码实现: html代码: <di
周鸿祎在360新员工入职培训上的讲话 bijian1013 感悟项目管理人生职场
这篇文章也是最近偶尔看到的，考虑到原博客发布者可能将其删除等原因，也更方便个人查找，特将原文拷贝再发布的。“学东西是为自己的，不要整天以混的姿态来跟公司博弈，就算是混，我觉得你要是能在混的时间里，收获一些别的有利于人生发展的东西，也是不错的，看你怎么把握了”，看了之后，对这句话记忆犹新。 &
前端Web开发的页面效果 Bill_chen html Web Microsoft
1.IE6下png图片的透明显示： <img src="图片地址" border="0" style="Filter.Alpha(Opacity)=数值(100),style=数值(3)"/> 或在<head></head>间加一段JS代码让透明png图片正常显示。 2.<li>标
【JVM五】老年代垃圾回收：并发标记清理GC(CMS GC) bit1129 垃圾回收
CMS概述并发标记清理垃圾回收(Concurrent Mark and Sweep GC）算法的主要目标是在GC过程中，减少暂停用户线程的次数以及在不得不暂停用户线程的请夸功能，尽可能短的暂停用户线程的时间。这对于交互式应用，比如web应用来说，是非常重要的。 CMS垃圾回收针对新生代和老年代采用不同的策略。相比同吞吐量垃圾回收，它要复杂的多。吞吐量垃圾回收在执
Struts2技术总结白糖_ struts2
必备jar文件早在struts2.0.*的时候，struts2的必备jar包需要如下几个： commons-logging-*.jar Apache旗下commons项目的log日志包 freemarker-*.jar
Jquery easyui layout应用注意事项 bozch jquery 浏览器 easyui layout
在jquery easyui中提供了easyui-layout布局，他的布局比较局限，类似java中GUI的border布局。下面对其使用注意事项作简要介绍：如果在现有的工程中前台界面均应用了jquery easyui，那么在布局的时候最好应用jquery eaysui的layout布局，否则在表单页面（编辑、查看、添加等等）在不同的浏览器会出
java-拷贝特殊链表：有一个特殊的链表，其中每个节点不但有指向下一个节点的指针pNext，还有一个指向链表中任意节点的指针pRand，如何拷贝这个特殊链表？ bylijinnan java
public class CopySpecialLinkedList { /** * 题目：有一个特殊的链表，其中每个节点不但有指向下一个节点的指针pNext，还有一个指向链表中任意节点的指针pRand，如何拷贝这个特殊链表？拷贝pNext指针非常容易，所以题目的难点是如何拷贝pRand指针。假设原来链表为A1 -> A2 ->... -> An，新拷贝
color Chen.H JavaScript html css
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML> <HEAD>&nbs
[信息与战争]移动通讯与网络 comsci 网络
两个坚持:手机的电池必须可以取下来光纤不能够入户,只能够到楼宇建议大家找这本书看看:<&
oracle flashback query(闪回查询) daizj oracle flashback query flashback table
在Oracle 10g中，Flash back家族分为以下成员： Flashback Database Flashback Drop Flashback Table Flashback Query(分Flashback Query,Flashback Version Query，Flashback Transaction Query) 下面介绍一下Flashback Drop 和Flas
zeus持久层DAO单元测试 deng520159 单元测试
zeus代码测试正紧张进行中,但由于工作比较忙,但速度比较慢.现在已经完成读写分离单元测试了,现在把几种情况单元测试的例子发出来,希望有人能进出意见,让它走下去. 本文是zeus的dao单元测试: 1.单元测试直接上代码 package com.dengliang.zeus.webdemo.test; import org.junit.Test; import o
C语言学习三printf函数和scanf函数学习 dcj3sjt126com c printf scanf language
printf函数 /* 2013年3月10日20:42:32 地点：北京潘家园功能：目的：测试%x %X %#x %#X的用法 */ # include <stdio.h> int main(void) { printf("哈哈！\n"); // \n表示换行 int i = 10; printf
那你为什么小时候不好好读书? dcj3sjt126com life
dady, 我今天捡到了十块钱, 不过我还给那个人了 good girl! 那个人有没有和你讲thank you啊没有啦....他拉我的耳朵我才把钱还给他的, 他哪里会和我讲thank you 爸爸, 如果地上有一张5块一张10块你拿哪一张呢.... 当然是拿十块的咯... 爸爸你很笨的, 你不会两张都拿爸爸为什么上个月那个人来跟你讨钱, 你告诉他没
iptables开放端口 Fanyucai linux iptables 端口
1，找到配置文件 vi /etc/sysconfig/iptables 2，添加端口开放，增加一行，开放18081端口 -A INPUT -m state --state NEW -m tcp -p tcp --dport 18081 -j ACCEPT 3，保存 ESC :wq! 4，重启服务 service iptables
Ehcache（05）——缓存的查询 234390216 排序 ehcache 统计 query
缓存的查询目录 1. 使Cache可查询 1.1 基于Xml配置 1.2 基于代码的配置 2 指定可搜索的属性 2.1 可查询属性类型 2.2 &
通过hashset找到数组中重复的元素 jackyrong hashset
如何在hashset中快速找到重复的元素呢?方法很多，下面是其中一个办法： int[] array = {1,1,2,3,4,5,6,7,8,8}; Set<Integer> set = new HashSet<Integer>(); for(int i = 0
使用ajax和window.history.pushState无刷新改变页面内容和地址栏URL lanrikey history
后退时关闭当前页面 <script type="text/javascript"> jQuery(document).ready(function ($) { if (window.history && window.history.pushState) {
应用程序的通信成本 netkiller.github.com 虚拟机应用服务器陈景峰 netkiller neo
应用程序的通信成本什么是通信一个程序中两个以上功能相互传递信号或数据叫做通信。什么是成本这是是指时间成本与空间成本。时间就是传递数据所花费的时间。空间是指传递过程耗费容量大小。都有哪些通信方式全局变量线程间通信共享内存共享文件管道 Socket 硬件（串口，USB）等等全局变量全局变量是成本最低通信方法，通过设置
一维数组与二维数组的声明与定义恋洁e生二维数组一维数组定义声明初始化
/** * */ package test20111005; /** * @author FlyingFire * @date:2011-11-18 上午04:33:36 * @author ：代码整理 * @introduce :一维数组与二维数组的初始化 *summary： */ public c
Spring Mybatis独立事务配置 toknowme mybatis
在项目中有很多地方会使用到独立事务，下面以获取主键为例（1）修改配置文件spring-mybatis.xml  <tx:annotation-driven transaction-manager="transactionManager" /> &n
更新Anadroid SDK Tooks之后，Eclipse提示No update were found xp9802 eclipse
使用Android SDK Manager 更新了Anadroid SDK Tooks 之后，打开eclipse提示 This Android SDK requires Android Developer Toolkit version 23.0.0 or above, 点击Check for Updates 检测一会后提示 No update were found