前言:四月份读了半个月kafka源码,提了几个patch,写了一个KIP,至于能不能接受就另说了。距离四月份现在已经3个月了,源码阅读时的一些领悟感觉开始渐渐淡忘了,是时候写些东西巩固巩固了,所以我想写一个系列。开始时准备先写kafka0.8.2.2,原因是代码量比较少,很裸,而且后续版本都是在这个版本的设计思想基础上搭建的。文章准备从kafka的启动开始深入,之后慢慢扩展到生产消费。
在启动kafka的一个broker时,我们会使用kaka-server-start.sh脚本。
kafka bin目录下的sh脚本都会调用kaka-run-class.sh,kaka-server-start.sh也不例外。
exec $base_dir/kafka-run-class.sh $EXTRA_ARGS kafka.Kafka $@
可以看出kafka启动时是调用kafka.kafka类。
同时kaka-server-start.sh中下面的代码值得注意
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"
fi
说明kafka的broker默认JVM大小为1G
同时我们看一下kaka-run-class.sh的这句话
if [ -z "$KAFKA_HEAP_OPTS" ]; then
KAFKA_HEAP_OPTS="-Xmx256M"
fi
kaka-run-class.sh对其他命令,如创建topic等,默认开256M大小的JVM
core.kafka.kafka的代码比较简单。
object Kafka extends Logging {
def main(args: Array[String]): Unit = {
if (args.length != 1) {
println("USAGE: java [options] %s server.properties".format(classOf[KafkaServer].getSimpleName()))
System.exit(1)
}
try {
val props = Utils.loadProps(args(0))
val serverConfig = new KafkaConfig(props)
KafkaMetricsReporter.startReporters(serverConfig.props)
val kafkaServerStartable = new KafkaServerStartable(serverConfig)
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread() {
override def run() = {
kafkaServerStartable.shutdown
}
})
kafkaServerStartable.startup
kafkaServerStartable.awaitShutdown
}
catch {
case e: Throwable => fatal(e)
}
System.exit(0)
}
}
开始先判断一下是否提供了server.properties,生成一个配置对象serverConfig,
同时声明了一个对象kafkaServerStartable。
使用钩子Runtime.getRuntime().addShutdownHook 在control-c时,调用kafkaServerStartable.shutdown。
执行kafkaServerStartable.startup和kafkaServerStartable.awaitShutdown
没什么值得看的,下面进入到KafkaServerStartable看看。
这个类没什么好看的,他声明了一个KafkaServer对象。
KafkaServerStartable的所用方法都是调用KafkaServer的方法。
这个类十分重要。
先看一下定义的变量:
private var isShuttingDown = new AtomicBoolean(false)
private var shutdownLatch = new CountDownLatch(1)
private var startupComplete = new AtomicBoolean(false)
val brokerState: BrokerState = new BrokerState
val correlationId: AtomicInteger = new AtomicInteger(0)
var socketServer: SocketServer = null
var requestHandlerPool: KafkaRequestHandlerPool = null
var logManager: LogManager = null
var offsetManager: OffsetManager = null
var kafkaHealthcheck: KafkaHealthcheck = null
var topicConfigManager: TopicConfigManager = null
var replicaManager: ReplicaManager = null
var apis: KafkaApis = null
var kafkaController: KafkaController = null
//后台执行各种任务的线程数,默认10个
val kafkaScheduler = new KafkaScheduler(config.backgroundThreads)
var zkClient: ZkClient = null
这些变量涉及的都很深,在讲到时可以回来看看。
这里面说一说shutdownLatch,这个是一个CountDownLatch对象,startup 时设为1,在core.kafka.kafka中有这句话:
kafkaServerStartable.awaitShutdown
保证了主程序不会退出,当执行shutdown时,会调用kafkaServerStartable.awaitShutdown(),使其变为0,这样主程序就不再等待,直接结束了。
先看一看代码:
def startup() {
try {
info("starting")
brokerState.newState(Starting)
isShuttingDown = new AtomicBoolean(false)//0
shutdownLatch = new CountDownLatch(1)
/* start scheduler */
kafkaScheduler.startup()
/* setup zookeeper */
zkClient = initZk()
/* start log manager */
logManager = createLogManager(zkClient, brokerState)
logManager.startup()
socketServer = new SocketServer(config.brokerId,
config.hostName,
config.port,
config.numNetworkThreads,
config.queuedMaxRequests,
config.socketSendBufferBytes,
config.socketReceiveBufferBytes,
config.socketRequestMaxBytes,
config.maxConnectionsPerIp,
config.connectionsMaxIdleMs,
config.maxConnectionsPerIpOverrides)
socketServer.startup()
replicaManager = new ReplicaManager(config, time, zkClient, kafkaScheduler, logManager, isShuttingDown)
/* start offset manager */
offsetManager = createOffsetManager()
kafkaController = new KafkaController(config, zkClient, brokerState)
/* start processing requests */
apis = new KafkaApis(socketServer.requestChannel, replicaManager, offsetManager, zkClient, config.brokerId, config, kafkaController)
requestHandlerPool = new KafkaRequestHandlerPool(config.brokerId, socketServer.requestChannel, apis, config.numIoThreads)
brokerState.newState(RunningAsBroker)
Mx4jLoader.maybeLoad()
replicaManager.startup()
kafkaController.startup()
topicConfigManager = new TopicConfigManager(zkClient, logManager)
topicConfigManager.startup()
/* tell everyone we are alive */
kafkaHealthcheck = new KafkaHealthcheck(config.brokerId, config.advertisedHostName, config.advertisedPort, config.zkSessionTimeoutMs, zkClient)
kafkaHealthcheck.startup()
registerStats()
startupComplete.set(true)
info("started")
}
catch {
case e: Throwable =>
fatal("Fatal error during KafkaServer startup. Prepare to shutdown", e)
shutdown()
throw e
}
}
我们来看看startup()都干了什么。
首先设置了brokerState的状态,将其设为Starting。brokerState是一个BrokerState对象,有7种状态。如下:
*
* +-----------+
* |Not Running|
* +-----+-----+
* |
* v
* +-----+-----+
* |Starting +--+
* +-----+-----+ | +----+------------+
* | +>+RecoveringFrom |
* v |UncleanShutdown |
* +----------+ +-----+-----+ +-------+---------+
* |RunningAs | |RunningAs | |
* |Controller+<--->+Broker +<-----------+
* +----------+ +-----+-----+
* | |
* | v
* | +-----+------------+
* |-----> |PendingControlled |
* |Shutdown |
* +-----+------------+
* |
* v
* +-----+----------+
* |BrokerShutting |
* |Down |
* +-----+----------+
* |
* v
* +-----+-----+
* |Not Running|
* +-----------+
*
这七种状态分别是
NotRunning:未运行
Starting :开始
RecoveringFromUncleanShutdown:从uncleanshutdown恢复
RunningAsBroker:作为broker运行
RunningAsController:作为controller运行
PendingControlledShutdown:向controller报告关闭
BrokerShuttingDown:正在关闭
之后是创建各个对象。有kafkaScheduler,logManger,SockectServer,replicaManager,KafkaController,apis,TopicConfigManager,kafkaHealthcheck。这些之后会一一介绍。
registerStats()没啥用是关于jmx的。
之后将startupComplete 置为true。
在startup时创建了一个zookeeper客户端对象,调用了initZk()方法,代码如下:
private def initZk(): ZkClient = {
info("Connecting to zookeeper on " + config.zkConnect)
val chroot = {
// 如果配置文件中的zookeeper有"/"截取后面的,其实就是获得kafka的zookeeper文件夹名
if (config.zkConnect.indexOf("/") > 0)
config.zkConnect.substring(config.zkConnect.indexOf("/"))
else
""
}
//这一段是对指定文件夹的zookeeper地址的处理,没啥有用的
if (chroot.length > 1) {
val zkConnForChrootCreation = config.zkConnect.substring(0, config.zkConnect.indexOf("/"))
val zkClientForChrootCreation = new ZkClient(zkConnForChrootCreation, config.zkSessionTimeoutMs, config.zkConnectionTimeoutMs, ZKStringSerializer)
ZkUtils.makeSurePersistentPathExists(zkClientForChrootCreation, chroot)
info("Created zookeeper path " + chroot)
zkClientForChrootCreation.close()
}
val zkClient = new ZkClient(config.zkConnect, config.zkSessionTimeoutMs, config.zkConnectionTimeoutMs, ZKStringSerializer)
ZkUtils.setupCommonPaths(zkClient)
zkClient
}
zookeeper连接的代码,没啥值得注意的,特殊处理的就是指定zookeeper存kafka目录的情况,如:
127.0.0.1:2181/kafka 这种情况。
def shutdown() {
try {
info("shutting down")
//保证只执行一次shutdown
val canShutdown = isShuttingDown.compareAndSet(false, true)
if (canShutdown) {
// swallow的意思是执行参数函数,报错的话catch住,在log中打出来
Utils.swallow(controlledShutdown())
brokerState.newState(BrokerShuttingDown)
if(socketServer != null)
Utils.swallow(socketServer.shutdown())
if(requestHandlerPool != null)
Utils.swallow(requestHandlerPool.shutdown())
if(offsetManager != null)
offsetManager.shutdown()
Utils.swallow(kafkaScheduler.shutdown())
if(apis != null)
Utils.swallow(apis.close())
if(replicaManager != null)
Utils.swallow(replicaManager.shutdown())
if(logManager != null)
Utils.swallow(logManager.shutdown())
if(kafkaController != null)
Utils.swallow(kafkaController.shutdown())
if(zkClient != null)
Utils.swallow(zkClient.close())
brokerState.newState(NotRunning)
shutdownLatch.countDown()
startupComplete.set(false)
info("shut down completed")
}
}
catch {
case e: Throwable =>
fatal("Fatal error during KafkaServer shutdown.", e)
throw e
}
}
执行中调用了各个模块的shutdown方法,在讲解各个模块时会一一讲解。
在执行shutdown方法时会先执行controlledShutdown方法。这个方法是向controller发一个request,告诉controller这个broker要shutdown 了
private def controlledShutdown() {
//当告知controller shutdown时,如果发生失败,等待一个配置文件设置的补偿时间,重试一个配置的次数,如果还是失败,就放弃controlledshutdown
if (startupComplete.get() && config.controlledShutdownEnable) {
//配置文件中设定的失败重试次数
var remainingRetries = config.controlledShutdownMaxRetries
info("Starting controlled shutdown")
var channel : BlockingChannel = null
var prevController : Broker = null
var shutdownSuceeded : Boolean = false
try {
//broker 状态修改
brokerState.newState(PendingControlledShutdown)
//如果放送请求成功或者超过配置的重试次数就跳出循环
while (!shutdownSuceeded && remainingRetries > 0) {
remainingRetries = remainingRetries - 1
//从zookeeper上获取现在的controller 的brokerid
val controllerId = ZkUtils.getController(zkClient)
ZkUtils.getBrokerInfo(zkClient, controllerId) match {
case Some(broker) =>
//如果没有与controller 连接的channel或者没记录过controller id或者记录的controller不是最新的controller
//也就是说如果之前和controller建立过连接而且controller没有变,就不执行下面语句。
if (channel == null || prevController == null || !prevController.equals(broker)) {
if (channel != null) {
channel.disconnect()
}
channel = new BlockingChannel(broker.host, broker.port,
BlockingChannel.UseDefaultBufferSize,
BlockingChannel.UseDefaultBufferSize,
config.controllerSocketTimeoutMs)
channel.connect()
prevController = broker
}
case None=>
//忽视,重试
}
//发送request给controller
if (channel != null) {
var response: Receive = null
try {
val request = new ControlledShutdownRequest(correlationId.getAndIncrement, config.brokerId)
channel.send(request)
response = channel.receive()
val shutdownResponse = ControlledShutdownResponse.readFrom(response.buffer)
//如果没有问题,标记发送成功
if (shutdownResponse.errorCode == ErrorMapping.NoError && shutdownResponse.partitionsRemaining != null &&
shutdownResponse.partitionsRemaining.size == 0) {
shutdownSuceeded = true
info ("Controlled shutdown succeeded")
}
else {
info("Remaining partitions to move: %s".format(shutdownResponse.partitionsRemaining.mkString(",")))
info("Error code from controller: %d".format(shutdownResponse.errorCode))
}
}
catch {
case ioe: java.io.IOException =>
channel.disconnect()
channel = null
warn("Error during controlled shutdown, possibly because leader movement took longer than the configured socket.timeout.ms: %s".format(ioe.getMessage))
}
}
//如果发送失败sleep设置的补偿时间
if (!shutdownSuceeded) {
Thread.sleep(config.controlledShutdownRetryBackoffMs)
warn("Retrying controlled shutdown after the previous attempt failed...")
}
}
}
finally {
if (channel != null) {
channel.disconnect()
channel = null
}
}
if (!shutdownSuceeded) {
warn("Proceeding to do an unclean shutdown as all the controlled shutdown attempts failed")
}
}
}
至于ControlledShutdownRequest请求究竟干了什么,我们之后的讲解会介绍。