Kafka源码解析(一)core.kafka.server.KafkaServer

前言:四月份读了半个月kafka源码,提了几个patch,写了一个KIP,至于能不能接受就另说了。距离四月份现在已经3个月了,源码阅读时的一些领悟感觉开始渐渐淡忘了,是时候写些东西巩固巩固了,所以我想写一个系列。开始时准备先写kafka0.8.2.2,原因是代码量比较少,很裸,而且后续版本都是在这个版本的设计思想基础上搭建的。文章准备从kafka的启动开始深入,之后慢慢扩展到生产消费。

1. kafka的启动命令

在启动kafka的一个broker时,我们会使用kaka-server-start.sh脚本。

kafka bin目录下的sh脚本都会调用kaka-run-class.sh,kaka-server-start.sh也不例外。

exec $base_dir/kafka-run-class.sh $EXTRA_ARGS kafka.Kafka $@

可以看出kafka启动时是调用kafka.kafka类。

同时kaka-server-start.sh中下面的代码值得注意

if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
    export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"
fi

说明kafka的broker默认JVM大小为1G

同时我们看一下kaka-run-class.sh的这句话

if [ -z "$KAFKA_HEAP_OPTS" ]; then
  KAFKA_HEAP_OPTS="-Xmx256M"
fi

kaka-run-class.sh对其他命令,如创建topic等,默认开256M大小的JVM

2. core.kafka.kafka

core.kafka.kafka的代码比较简单。

object Kafka extends Logging {

  def main(args: Array[String]): Unit = {
    if (args.length != 1) {
      println("USAGE: java [options] %s server.properties".format(classOf[KafkaServer].getSimpleName()))
      System.exit(1)
    }

    try {
      val props = Utils.loadProps(args(0))
      val serverConfig = new KafkaConfig(props)
      KafkaMetricsReporter.startReporters(serverConfig.props)
      val kafkaServerStartable = new KafkaServerStartable(serverConfig)

      // attach shutdown handler to catch control-c
      Runtime.getRuntime().addShutdownHook(new Thread() {
        override def run() = {
          kafkaServerStartable.shutdown
        }
      })

      kafkaServerStartable.startup
      kafkaServerStartable.awaitShutdown
    }
    catch {
      case e: Throwable => fatal(e)
    }
    System.exit(0)
  }
}

开始先判断一下是否提供了server.properties,生成一个配置对象serverConfig,

同时声明了一个对象kafkaServerStartable。

使用钩子Runtime.getRuntime().addShutdownHook 在control-c时,调用kafkaServerStartable.shutdown。

执行kafkaServerStartable.startup和kafkaServerStartable.awaitShutdown

没什么值得看的,下面进入到KafkaServerStartable看看。

3. core.kafka.server.KafkaServerStartable

这个类没什么好看的,他声明了一个KafkaServer对象。

KafkaServerStartable的所用方法都是调用KafkaServer的方法。

4. core.kafka.server.KafkaServer

这个类十分重要。

先看一下定义的变量:

  private var isShuttingDown = new AtomicBoolean(false)
  private var shutdownLatch = new CountDownLatch(1)
  private var startupComplete = new AtomicBoolean(false)
  val brokerState: BrokerState = new BrokerState
  val correlationId: AtomicInteger = new AtomicInteger(0)
  var socketServer: SocketServer = null
  var requestHandlerPool: KafkaRequestHandlerPool = null
  var logManager: LogManager = null
  var offsetManager: OffsetManager = null
  var kafkaHealthcheck: KafkaHealthcheck = null
  var topicConfigManager: TopicConfigManager = null
  var replicaManager: ReplicaManager = null
  var apis: KafkaApis = null
  var kafkaController: KafkaController = null
  //后台执行各种任务的线程数,默认10个
  val kafkaScheduler = new KafkaScheduler(config.backgroundThreads)
  var zkClient: ZkClient = null

这些变量涉及的都很深,在讲到时可以回来看看。

这里面说一说shutdownLatch,这个是一个CountDownLatch对象,startup 时设为1,在core.kafka.kafka中有这句话:

kafkaServerStartable.awaitShutdown

保证了主程序不会退出,当执行shutdown时,会调用kafkaServerStartable.awaitShutdown(),使其变为0,这样主程序就不再等待,直接结束了。

4.1 startup

先看一看代码:

def startup() {
    try {
      info("starting")
      brokerState.newState(Starting)
      isShuttingDown = new AtomicBoolean(false)//0
      shutdownLatch = new CountDownLatch(1)
      /* start scheduler */
      kafkaScheduler.startup()
      /* setup zookeeper */
      zkClient = initZk()
      /* start log manager */
      logManager = createLogManager(zkClient, brokerState)
      logManager.startup()

      socketServer = new SocketServer(config.brokerId,
                                      config.hostName,
                                      config.port,
                                      config.numNetworkThreads,
                                      config.queuedMaxRequests,
                                      config.socketSendBufferBytes,
                                      config.socketReceiveBufferBytes,
                                      config.socketRequestMaxBytes,
                                      config.maxConnectionsPerIp,
                                      config.connectionsMaxIdleMs,
                                      config.maxConnectionsPerIpOverrides)
      socketServer.startup()
      replicaManager = new ReplicaManager(config, time, zkClient, kafkaScheduler, logManager, isShuttingDown)
      /* start offset manager */
      offsetManager = createOffsetManager()
      kafkaController = new KafkaController(config, zkClient, brokerState)    
      /* start processing requests */
      apis = new KafkaApis(socketServer.requestChannel, replicaManager, offsetManager, zkClient, config.brokerId, config, kafkaController)
      requestHandlerPool = new KafkaRequestHandlerPool(config.brokerId, socketServer.requestChannel, apis, config.numIoThreads)
      brokerState.newState(RunningAsBroker)   
      Mx4jLoader.maybeLoad()
      replicaManager.startup()
      kafkaController.startup()
      topicConfigManager = new TopicConfigManager(zkClient, logManager)
      topicConfigManager.startup()
      /* tell everyone we are alive */
      kafkaHealthcheck = new KafkaHealthcheck(config.brokerId, config.advertisedHostName, config.advertisedPort, config.zkSessionTimeoutMs, zkClient)
      kafkaHealthcheck.startup()
      registerStats()
      startupComplete.set(true)
      info("started")
    }
    catch {
      case e: Throwable =>
        fatal("Fatal error during KafkaServer startup. Prepare to shutdown", e)
        shutdown()
        throw e
    }
  }

我们来看看startup()都干了什么。

首先设置了brokerState的状态,将其设为Starting。brokerState是一个BrokerState对象,有7种状态。如下:

 *
 *                +-----------+
 *                |Not Running|
 *                +-----+-----+
 *                      |
 *                      v
 *                +-----+-----+
 *                |Starting   +--+
 *                +-----+-----+  | +----+------------+
 *                      |        +>+RecoveringFrom   |
 *                      v          |UncleanShutdown  |
 * +----------+     +-----+-----+  +-------+---------+
 * |RunningAs |     |RunningAs  |            |
 * |Controller+<--->+Broker     +<-----------+
 * +----------+     +-----+-----+
 *        |              |
 *        |              v
 *        |       +-----+------------+
 *        |-----> |PendingControlled |
 *                |Shutdown          |
 *                +-----+------------+
 *                      |
 *                      v
 *               +-----+----------+
 *               |BrokerShutting  |
 *               |Down            |
 *               +-----+----------+
 *                     |
 *                     v
 *               +-----+-----+
 *               |Not Running|
 *               +-----------+
 *

这七种状态分别是

NotRunning:未运行
Starting :开始
RecoveringFromUncleanShutdown:从uncleanshutdown恢复
RunningAsBroker:作为broker运行
RunningAsController:作为controller运行
PendingControlledShutdown:向controller报告关闭
BrokerShuttingDown:正在关闭

之后是创建各个对象。有kafkaScheduler,logManger,SockectServer,replicaManager,KafkaController,apis,TopicConfigManager,kafkaHealthcheck。这些之后会一一介绍。

registerStats()没啥用是关于jmx的。

之后将startupComplete 置为true。

4.2 zookeeper的连接

在startup时创建了一个zookeeper客户端对象,调用了initZk()方法,代码如下:

private def initZk(): ZkClient = {
    info("Connecting to zookeeper on " + config.zkConnect)

    val chroot = {
      // 如果配置文件中的zookeeper有"/"截取后面的,其实就是获得kafka的zookeeper文件夹名
      if (config.zkConnect.indexOf("/") > 0)
        config.zkConnect.substring(config.zkConnect.indexOf("/"))
      else
        ""
    }
    //这一段是对指定文件夹的zookeeper地址的处理,没啥有用的
    if (chroot.length > 1) {
      val zkConnForChrootCreation = config.zkConnect.substring(0, config.zkConnect.indexOf("/"))
      val zkClientForChrootCreation = new ZkClient(zkConnForChrootCreation, config.zkSessionTimeoutMs, config.zkConnectionTimeoutMs, ZKStringSerializer)
      ZkUtils.makeSurePersistentPathExists(zkClientForChrootCreation, chroot)
      info("Created zookeeper path " + chroot)
      zkClientForChrootCreation.close()
    }

    val zkClient = new ZkClient(config.zkConnect, config.zkSessionTimeoutMs, config.zkConnectionTimeoutMs, ZKStringSerializer)
    ZkUtils.setupCommonPaths(zkClient)
    zkClient
  }

zookeeper连接的代码,没啥值得注意的,特殊处理的就是指定zookeeper存kafka目录的情况,如:

127.0.0.1:2181/kafka 这种情况。

4.3 shutdown

def shutdown() {
    try {
      info("shutting down")
      //保证只执行一次shutdown
      val canShutdown = isShuttingDown.compareAndSet(false, true)
      if (canShutdown) {
        // swallow的意思是执行参数函数,报错的话catch住,在log中打出来
        Utils.swallow(controlledShutdown())
        brokerState.newState(BrokerShuttingDown)
        if(socketServer != null)
          Utils.swallow(socketServer.shutdown())
        if(requestHandlerPool != null)
          Utils.swallow(requestHandlerPool.shutdown())
        if(offsetManager != null)
          offsetManager.shutdown()
        Utils.swallow(kafkaScheduler.shutdown())
        if(apis != null)
          Utils.swallow(apis.close())
        if(replicaManager != null)
          Utils.swallow(replicaManager.shutdown())
        if(logManager != null)
          Utils.swallow(logManager.shutdown())
        if(kafkaController != null)
          Utils.swallow(kafkaController.shutdown())
        if(zkClient != null)
          Utils.swallow(zkClient.close())

        brokerState.newState(NotRunning)
        shutdownLatch.countDown()
        startupComplete.set(false)
        info("shut down completed")
      }
    }
    catch {
      case e: Throwable =>
        fatal("Fatal error during KafkaServer shutdown.", e)
        throw e
    }
  }

执行中调用了各个模块的shutdown方法,在讲解各个模块时会一一讲解。

4.4 controlledShutdown

在执行shutdown方法时会先执行controlledShutdown方法。这个方法是向controller发一个request,告诉controller这个broker要shutdown 了

private def controlledShutdown() {
  //当告知controller shutdown时,如果发生失败,等待一个配置文件设置的补偿时间,重试一个配置的次数,如果还是失败,就放弃controlledshutdown
    if (startupComplete.get() && config.controlledShutdownEnable) {
      //配置文件中设定的失败重试次数
      var remainingRetries = config.controlledShutdownMaxRetries
      info("Starting controlled shutdown")
      var channel : BlockingChannel = null
      var prevController : Broker = null
      var shutdownSuceeded : Boolean = false
      try {
        //broker 状态修改
        brokerState.newState(PendingControlledShutdown)
        //如果放送请求成功或者超过配置的重试次数就跳出循环
        while (!shutdownSuceeded && remainingRetries > 0) {
          remainingRetries = remainingRetries - 1
          //从zookeeper上获取现在的controller 的brokerid
          val controllerId = ZkUtils.getController(zkClient)
          ZkUtils.getBrokerInfo(zkClient, controllerId) match {
            case Some(broker) =>
            //如果没有与controller 连接的channel或者没记录过controller id或者记录的controller不是最新的controller
            //也就是说如果之前和controller建立过连接而且controller没有变,就不执行下面语句。
              if (channel == null || prevController == null || !prevController.equals(broker)) {
                if (channel != null) {
                  channel.disconnect()
                }
                channel = new BlockingChannel(broker.host, broker.port,
                  BlockingChannel.UseDefaultBufferSize,
                  BlockingChannel.UseDefaultBufferSize,
                  config.controllerSocketTimeoutMs)
                channel.connect()
                prevController = broker
              }
            case None=>
              //忽视,重试
          }
          //发送request给controller
          if (channel != null) {
            var response: Receive = null
            try {
              val request = new ControlledShutdownRequest(correlationId.getAndIncrement, config.brokerId)
              channel.send(request)

              response = channel.receive()
              val shutdownResponse = ControlledShutdownResponse.readFrom(response.buffer)
              //如果没有问题,标记发送成功
              if (shutdownResponse.errorCode == ErrorMapping.NoError && shutdownResponse.partitionsRemaining != null &&
                  shutdownResponse.partitionsRemaining.size == 0) {
                shutdownSuceeded = true
                info ("Controlled shutdown succeeded")
              }
              else {
                info("Remaining partitions to move: %s".format(shutdownResponse.partitionsRemaining.mkString(",")))
                info("Error code from controller: %d".format(shutdownResponse.errorCode))
              }
            }
            catch {
              case ioe: java.io.IOException =>
                channel.disconnect()
                channel = null
                warn("Error during controlled shutdown, possibly because leader movement took longer than the configured socket.timeout.ms: %s".format(ioe.getMessage))
            }
          }
          //如果发送失败sleep设置的补偿时间
          if (!shutdownSuceeded) {
            Thread.sleep(config.controlledShutdownRetryBackoffMs)
            warn("Retrying controlled shutdown after the previous attempt failed...")
          }
        }
      }
      finally {
        if (channel != null) {
          channel.disconnect()
          channel = null
        }
      }
      if (!shutdownSuceeded) {
        warn("Proceeding to do an unclean shutdown as all the controlled shutdown attempts failed")
      }
    }
  }

至于ControlledShutdownRequest请求究竟干了什么,我们之后的讲解会介绍。

你可能感兴趣的:(大数据研发,kafka)