Kafka源码分析-Server-网络层(1)

AbstractServerThread

Acceptor和Processor都继承了AbstractServerThread,AbstractServerThread是实现了Runnable接口的抽象类。AbstractServerThread为Acceptor和Processor都提供了相关启动关闭的控制类方法。


AbstractServerThread类图.png

AbstractServerThread重要的字段:

  • alive:表示当前线程是否存活,在初始化时设置为true,在shutdown()方法中会将alive设置为false。
  • shutdownLatch:count为1的CountDownLatch对象,标识了当前线程的shutdown操作是否完成。
  • startupLatch:count为1的CountDownLatch对象,标识了当前线程的startup操作是否完成.
  • 在awaitStartup()和shutdown()方法中会调用CountDownLatch.await()方法,阻塞等待启动和关闭操作完成。在startupComplete()和shutdownComplete()方法中调用CountDownLatch.countDown()方法,唤醒阻塞的线程。
  • connectionQuotas: 在close()方法中,根据传入的connectionId,关闭SocketChannel并减少connectionQuotas中记录的连接数。

AbstractServerThread中比较常用的方法:

/**
 * A base class with some helper variables and methods
 */
private[kafka] abstract class AbstractServerThread(connectionQuotas: ConnectionQuotas) extends Runnable with Logging {

  private val startupLatch = new CountDownLatch(1)
  private val shutdownLatch = new CountDownLatch(1)
  private val alive = new AtomicBoolean(true)
   //抽象方法
  def wakeup()

  /**
   * Initiates a graceful shutdown by signaling to stop and waiting for the shutdown to complete
   */
  def shutdown(): Unit = {
    alive.set(false)//修改运行状态
    wakeup()//唤醒当前的AbstractServerThread
    shutdownLatch.await()//阻塞等待关闭完成
  }

  /**
   * Wait for the thread to completely start up
    * 阻塞等待启动操作完成
   */
  def awaitStartup(): Unit = startupLatch.await

  /**
    * 标明启动操作完成,同时唤醒被阻塞的线程
   * Record that the thread startup is complete
   */
  protected def startupComplete() = {
    startupLatch.countDown()
  }

  /**
   * Record that the thread shutdown is complete
    * 阻塞等待关闭操作完成
   */
  protected def shutdownComplete() = shutdownLatch.countDown()

  /**
   * Is the server still running?
   */
  protected def isRunning = alive.get

  /**
    * 关闭指定连接
   * Close the connection identified by `connectionId` and decrement the connection count.
   */
  def close(selector: KSelector, connectionId: String) {
    val channel = selector.channel(connectionId)
    if (channel != null) {
      debug(s"Closing selector connection $connectionId")
      val address = channel.socketAddress
      if (address != null)
        connectionQuotas.dec(address)//修改connectionQuotas记录的连接数
      selector.close(connectionId)//关闭连接
    }
  }

  /**
   * Close `channel` and decrement the connection count.
   */
  def close(channel: SocketChannel) {
    if (channel != null) {
      debug("Closing connection from " + channel.socket.getRemoteSocketAddress())
      connectionQuotas.dec(channel.socket.getInetAddress)
      swallowError(channel.socket().close())
      swallowError(channel.close())
    }
  }
}

Acceptor

Acceptor的功能是接收客户端建立连接的请求,创建Socket连接并分配给Processor处理。

重要字段

  • Java nio 的selector。
  • 用于接受客户端请求的ServerSocketChannel对象。
    在创建Acceptor时会初始化上面两个字段,同时还会创建并启动其管理的Processors线程。
/**
 * Thread that accepts and configures new connections. There is one of these per endpoint.
 */
private[kafka] class Acceptor(val endPoint: EndPoint,
                              val sendBufferSize: Int,
                              val recvBufferSize: Int,
                              brokerId: Int,
                              processors: Array[Processor],
                              connectionQuotas: ConnectionQuotas) extends AbstractServerThread(connectionQuotas) with KafkaMetricsGroup {
  //创建nioSelector
  private val nioSelector = NSelector.open()
  //创建ServerSocketChannel
  val serverChannel = openServerSocket(endPoint.host, endPoint.port)

  this.synchronized {//同步
    //为其对应的每个Processor都创建对应的线程并启动
    processors.foreach { processor =>
      Utils.newThread("kafka-network-thread-%d-%s-%d".format(brokerId, endPoint.protocolType.toString, processor.id), processor, false).start()
    }
  }

Acceptor.run()方法是Acceptor的核心逻辑,其中完成了对OP_ACCEPT事件的处理。代码如下:

 /**
   * Accept loop that checks for new connection attempts
   */
  def run() {
    //注册OP_ACCEPT事件
    serverChannel.register(nioSelector, SelectionKey.OP_ACCEPT)
    startupComplete()//识别当前线程启动线程已经完成
    try {
      var currentProcessor = 0
      while (isRunning) {//检测线程运行状态
        try {
          val ready = nioSelector.select(500)//等待关注的事件
          if (ready > 0) {
            val keys = nioSelector.selectedKeys()
            val iter = keys.iterator()
            while (iter.hasNext && isRunning) {
              try {
                val key = iter.next
                iter.remove()
                if (key.isAcceptable)//调用accept()方法处理OP_ACCEPT事件
                  accept(key, processors(currentProcessor))
                else//如果不是OP_ACCEPT事件,就报错
                  throw new IllegalStateException("Unrecognized key state for acceptor thread.")

                // round robin to the next processor thread
                //更新currentProcessor,这里使用了Round-Robin的方法选择Processor
                currentProcessor = (currentProcessor + 1) % processors.length
              } catch {
                case e: Throwable => error("Error while accepting connection", e)
              }
            }
          }
        }
        catch {
          // We catch all the throwables to prevent the acceptor thread from exiting on exceptions due
          // to a select operation on a specific channel or a bad request. We don't want the
          // the broker to stop responding to requests from other clients in these scenarios.
          case e: ControlThrowable => throw e
          case e: Throwable => error("Error occurred", e)
        }
      }
    } finally {
      debug("Closing server socket and selector.")
      swallowError(serverChannel.close())
      swallowError(nioSelector.close())
      shutdownComplete()//线程关闭已经完成
    }
  }

Acceptor.accept()方法实现了OP_ACCEPT事件的处理,它会创建SocketChannel并将其交给Processor.accept()方法处理,同时还会增加ConnectionQuotas中记录的连接数。accept()方法的代码如下:

/*
   * Accept a new connection
   */
  def accept(key: SelectionKey, processor: Processor) {
    val serverSocketChannel = key.channel().asInstanceOf[ServerSocketChannel]
    val socketChannel = serverSocketChannel.accept()//创建SocketChannel
    try {
      //增加connectionQuotas中记录的连接数
      connectionQuotas.inc(socketChannel.socket().getInetAddress)
      socketChannel.configureBlocking(false)
      socketChannel.socket().setTcpNoDelay(true)
      socketChannel.socket().setKeepAlive(true)
      socketChannel.socket().setSendBufferSize(sendBufferSize)

      debug("Accepted connection from %s on %s and assigned it to processor %d, sendBufferSize [actual|requested]: [%d|%d] recvBufferSize [actual|requested]: [%d|%d]"
            .format(socketChannel.socket.getRemoteSocketAddress, socketChannel.socket.getLocalSocketAddress, processor.id,
                  socketChannel.socket.getSendBufferSize, sendBufferSize,
                  socketChannel.socket.getReceiveBufferSize, recvBufferSize))
      //将SocketChannel交给processor处理
      processor.accept(socketChannel)
    } catch {
      case e: TooManyConnectionsException =>
        info("Rejected connection from %s, address already has the configured maximum of %d connections.".format(e.ip, e.count))
        close(socketChannel)//关闭socketChannel
    }
  }

Processor

Processor主要用于完成读取请求和写回响应的操作,Processor不参与具体业务逻辑的处理。Processor的字段如下,在创建Processor对象时会初始化这些字段。
*newConnection: ConcurrentLinkedQueue[SocketChannel]类型,其中保存了由此Processor处理的新建的SocketChannel。

  • inflightResponses:保存未发送的响应。inflightResponses和客户端的InFlightRequest有些类似,但是也是有区别的,客户端不会对服务端发送的响应消息再次发送确认,所以inflightResponse中的响应会在发送成功后移除,但是InFlightRequest中的请求是在收到响应后才移除。
  • selector: KSelector类型,负责管理网络连接。
  • requestChannel: Processor与Handler线程之间传递数据的队列。
    在Acceptor.accept()方法中创建的SocketChannel会通过Processor.accept()方法交给Processor进行处理。Processor.accept()方法接受到一个新的SocketChannel时会先将其放入newConnections队列中,然后会唤醒Processor线程来处理newConnections队列。newConnections队列是被Acceptor线程和Processor线程并发操作的所以选择ConcurrentLinkedQueue。下面是accept()方法的代码:
/**
   * Queue up a new connection for reading
   */
  def accept(socketChannel: SocketChannel) {
    //将SocketChannel放入newConnections队列中
    newConnections.add(socketChannel)
    //通过调用wakeup()方法实现,最终调用java nio Selector的wakeup()方法
    wakeup()
  }

在Processor.run()方法中实现了从网络连接上读写数据的功能。run()方法流程:


Processor.run()流程.png

1)首先调用startupComplete()方法,标识Processor的初始化流程已经结束,唤醒阻塞等待此Processor初始化完成的线程。
2)处理newConnection队列中的新建SocketChannel。队列中的每个SocketChannel都会在nioSelector上注册OP_READ事件。SocketChannel会被封装成KafkaChannel,并附加(attach)到SelectionKey上,以后触发OP_READ事件时,从SelectionKey上获取的是KafkaChannel类型的对象。下面是configureNewConnections()方法的代码:

/**
   * Register any new connections that have been queued up
   */
  private def configureNewConnections() {
    while (!newConnections.isEmpty) {//遍历newConnections队列
      val channel = newConnections.poll()
      try {
        debug(s"Processor $id listening to new connection from ${channel.socket.getRemoteSocketAddress}")
        val localHost = channel.socket().getLocalAddress.getHostAddress
        val localPort = channel.socket().getLocalPort
        val remoteHost = channel.socket().getInetAddress.getHostAddress
        val remotePort = channel.socket().getPort
        //根据localHost, localPort, remoteHost, remotePort的获取创建connectionId
        val connectionId = ConnectionId(localHost, localPort, remoteHost, remotePort).toString
        selector.register(connectionId, channel)//注册OP_READ事件
      } catch {
        // We explicitly catch all non fatal exceptions and close the socket to avoid a socket leak. The other
        // throwables will be caught in processor and logged as uncaught exceptions.
        case NonFatal(e) =>
          // need to close the channel here to avoid a socket leak.
          close(channel)
          error(s"Processor $id closed connection from ${channel.getRemoteAddress}", e)
      }
    }
  }

3)获取RequestChannel中对应的responseQueue队列,并处理其中缓存的response。
如果Response是SendAction类型,表示该response需要发送给客户端,则寻找对应的KafkaChannel,为其注册OP_WRITE事件,并将KafkaChannel.send字段指向待发送的Response对象。同时还将response从responseQueue队列中移出,放入到inflightResponses中。发送完一个完整的响应后,会取消连接注册的OP_WRITE事件。
如果response是NoOpAction类型,表示连接暂时没有响应可以发送,则为KafkaChannel注册OP_READ,允许其继续读取请求。
如果Response是CloseConnectionAction类型,则关闭对应的连接。
processNewResponses()方法的代码:

private def processNewResponses() {
    /*
     在RequestChannel中使用Processor的Id绑定与responseQueue的对应关系
     获取对应的responseQueue中的响应
     */
    var curr = requestChannel.receiveResponse(id)
    while (curr != null) {
      try {
        curr.responseAction match {
          //没有响应需要发送给客户端
          case RequestChannel.NoOpAction =>
            // There is no response to send to the client, we need to read more pipelined requests
            // that are sitting in the server's socket buffer
            curr.request.updateRequestMetrics
            trace("Socket server received empty response to send, registering for read: " + curr)
            //注册OP_READ事件
            selector.unmute(curr.request.connectionId)
            //该响应需要发送给客户端
          case RequestChannel.SendAction =>
            //调用KSelector.send()方法,并将响应放入inflightResponse队列缓存
            sendResponse(curr)
          case RequestChannel.CloseConnectionAction =>
            curr.request.updateRequestMetrics
            trace("Closing socket connection actively according to the response code.")
            close(selector, curr.request.connectionId)
        }
      } finally {
        curr = requestChannel.receiveResponse(id)//继续处理responseQueue
      }
    }
  }

4)调用SocketServer.poll()方法读取请求,发送响应。poll()方法底层调用KSelector.poll()方法。

private def poll() {
    try selector.poll(300)
    catch {
      case e @ (_: IllegalStateException | _: IOException) =>
        error(s"Closing processor $id due to illegal state or IO exception")
        swallow(closeAll())
        shutdownComplete()
        throw e
    }
  }

KSelector.poll()方法每次调用都会将读取的请求,发送成功的请求以及断开的连接放入completedReceives,completedSends,disconnected队列中等待处理,下一步是处理相应的队列。
5)调用processCompletedReceives()方法处理KSelector.completedReceives队列。首先,遍历completedReceives,将NetworkReceive,ProcessorId,身份认证信息一起封装成RequestChannel.requestQueue队列中,等待Handler线程的后续处理。之后取消对应KafkaChannel注册的OP_READ事件,表示在发送请求前这个连接不能再读取任何请求了。

private def processCompletedReceives() {
    //遍历completedReceives队列
    selector.completedReceives.asScala.foreach { receive =>
      try {
        //获取对应的 KafkaChannel
        val channel = selector.channel(receive.source)
        //创建KafkaChannel对应的session对象,与权限控制相关
        val session = RequestChannel.Session(new KafkaPrincipal(KafkaPrincipal.USER_TYPE, channel.principal.getName),
          channel.socketAddress)
        //将NetworkReceive,ProcessId,身份认证信息封装成RequestChannel.Request对象
        val req = RequestChannel.Request(processor = id, connectionId = receive.source, session = session, buffer = receive.payload, startTimeMs = time.milliseconds, securityProtocol = protocol)
        //将RequestChannel.Request放入RequestChannel.requestQueue队列中等待处理
        requestChannel.sendRequest(req)
        //取消注册的OP_READ事件,连接不再读取数据
        selector.mute(receive.source)
      } catch {
        case e @ (_: InvalidRequestException | _: SchemaException) =>
          // note that even though we got an exception, we can assume that receive.source is valid. Issues with constructing a valid receive object were handled earlier
          error(s"Closing socket for ${receive.source} because of error", e)
          close(selector, receive.source)
      }
    }
  }

6)调用processCompletedSends()方法处理KSelector.completedSends队列。首先,将inflightResponses中保存的对应Response删除。然后,为对应的连接重新注册OP_READ事件,允许从该连接读取数据。

private def processCompletedSends() {
    //遍历completedSends队列
    selector.completedSends.asScala.foreach { send =>
      //这个响应已经发送出去了,从inflightResponses删除
      val resp = inflightResponses.remove(send.destination).getOrElse {
        throw new IllegalStateException(s"Send for ${send.destination} completed, but not in `inflightResponses`")
      }
      resp.request.updateRequestMetrics()
      selector.unmute(send.destination)//为对应的连接重新注册OP_READ事件,允许从该连接读取数据。
    }
  }

7)调用processDisconnected()方法处理KSelector.disconnected队列。先从inflightResponses中删除该连接对应的所有Response。然后,减少ConnectionQuotas中对应记录的连接数,为后续的新建连接做准备。

private def processDisconnected() {
    //遍历selector.disconnected队列
    selector.disconnected.asScala.foreach { connectionId =>
      val remoteHost = ConnectionId.fromString(connectionId).getOrElse {
        throw new IllegalStateException(s"connectionId has unexpected format: $connectionId")
      }.remoteHost
      //从InflightResponses中删除该连接对应的Response
      inflightResponses.remove(connectionId).foreach(_.request.updateRequestMetrics())
      // the channel has been closed by the selector but the quotas still need to be updated
      //减少ConnectionQuotas中对应记录的连接数
      connectionQuotas.dec(InetAddress.getByName(remoteHost))
    }
  }

8)当调用SocketServer.shutdown()关闭整个SocketServer时,将alive字段设置为false,这样上述循环结束。然后调用shutdownComplete()方法执行一系列关闭操作:关闭Process管理的全部连接,减少ConnectionQuotas中记录的连接数量,同时标识关闭流程已经结束,唤醒阻塞在等待该Processor结束的线程。
Run方法的代码:

override def run() {
    startupComplete()//1.标识Processor的初始化流程已经结束,唤醒阻塞等待此Processor初始化完成的线程。
    while (isRunning) {//检验alive字段标识的运行状态
      try {
        // setup any new connections that have been queued up
        // 2.处理newConnection队列中的新建SocketChannel。队列中的每个SocketChannel都会在nioSelector上注册OP_READ事件。
        configureNewConnections()
        // register any new responses for writing
        // 3.获取RequestChannel中对应的responseQueue队列,并处理其中缓存的response。

        processNewResponses()
        //4.SocketServer.poll()方法读取请求,发送响应。poll()方法底层调用KSelector.poll()方法。
        poll()
        //5.处理KSelector.completedReceives队列。
        processCompletedReceives()
        //6.处理KSelector.completedSends队列。
        processCompletedSends()
        //7.处理KSelector.disconnected队列。
        processDisconnected()
      } catch {
        // We catch all the throwables here to prevent the processor thread from exiting. We do this because
        // letting a processor exit might cause a bigger impact on the broker. Usually the exceptions thrown would
        // be either associated with a specific socket channel or a bad request. We just ignore the bad socket channel
        // or request. This behavior might need to be reviewed if we see an exception that need the entire broker to stop.
        case e: ControlThrowable => throw e
        case e: Throwable =>
          error("Processor got uncaught exception.", e)
      }
    }

    debug("Closing selector - processor " + id)
    swallowError(closeAll())
    shutdownComplete()//8.一系列关闭操作
  }

你可能感兴趣的:(Kafka源码分析-Server-网络层(1))