如果broker端参数auto.create.topics.enable设置为true,则生产者或者消费者先未知topic进行操作时将自动创建这个topic(建议将这个参数设置为true)
创建topic的kafka命令
kafka-topics.sh --zookeeper localhost:2181/kafka --topic topic-create --partitions 4 --replication-factor 2
# 返回结果:Created topic "topic-create".
根据上面操作所设置的参数,现在topic-create这个topic一共有4的分区,每个分区有2个副本,也就是说一共有8个副本。那么在我们的集群机器上将会有8个文件夹,每个文件夹代表一个副本。
文件夹的名称一般都是
当创建后topic后,我们可以在zookeeper里面查看到在相关的元数据:
在/kafka/brokers/topics这个目录下将会创建一个topic-create的子节点
zkCli.sh
get /kafka/brokers/topics/topic-create
# 以json格式返回了分区的情况,"2":[1,2],分区id为2的分区在brokerId为1和2的机器上
{"version","partitions":{"2":[1,2],"1":[0,1],"3":[2,1],"0":[2,0]}}
如果想自己分配分区以及副本到具体的机器上,可以使用参数eplica-assignment
kafka-topics.sh --zookeeper localhost:2181/kafka --create --topic topic-create-same --replica-assignment 2:0,0:1,1:2,2:1
# 不再需要指定partitions和replication-factor这两个参数,2:0,0:1,1:2,2:1代表0分区在2和0机器上,1分区在0和1机器上。:分割相同分区的不同副本所在机器的brokerId,分割不同分区且分区按顺序排
在创建主题的时候,我们也可以手动修改一些config使用参数–config k=v的方式,如果是多个就设置多个–config
kafka-topics.sh --zookeeper localhost:2181/kafka --create --topic topic-config ---partitions 1 --replication-factor 1 --config cleanup.policy=compact --config max.message.bytes=10000
当设置了–config后,我们可以在zookeeper里面看到相关信息
zkCli.sh
get /kafka/config/topics/topic-config
{"version","config":{"max.message.bytes":"10000","cleanup.policy":"compact"}}
在创建topic时,要避免topic的名称相同,可以加上–if-not-exists这个参数。如果没有这topic就创建,如果有就不做任何执行
如果集群对机架敏感的话,需要配置broker.rack=#{机架名称}。副本创建也是所机架的影响,如果想忽略机架的影响,可以在创建topic的时候加上–disable-rack-aware这个参数(默认情况下都是不受机架影响的,只有你标注了机架信息后)
分区副本的分配方式分为有机架和无机架,可以从源码core/src/main/scala/kafka/admin/AdminUtils.scala这个类里面查看
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package kafka.admin
import java.util.Random
import kafka.utils.Logging
import org.apache.kafka.common.errors.{InvalidPartitionsException, InvalidReplicationFactorException}
import collection.{Map, mutable, _}
object AdminUtils extends Logging {
val rand = new Random
val AdminClientId = "__admin_client"
/**
* There are 3 goals of replica assignment:
*
*
* - Spread the replicas evenly among brokers.
* - For partitions assigned to a particular broker, their other replicas are spread over the other brokers.
* - If all brokers have rack information, assign the replicas for each partition to different racks if possible
*
*
* To achieve this goal for replica assignment without considering racks, we:
*
* - Assign the first replica of each partition by round-robin, starting from a random position in the broker list.
* - Assign the remaining replicas of each partition with an increasing shift.
*
*
* Here is an example of assigning
*
* broker-0 broker-1 broker-2 broker-3 broker-4
* p0 p1 p2 p3 p4 (1st replica)
* p5 p6 p7 p8 p9 (1st replica)
* p4 p0 p1 p2 p3 (2nd replica)
* p8 p9 p5 p6 p7 (2nd replica)
* p3 p4 p0 p1 p2 (3nd replica)
* p7 p8 p9 p5 p6 (3nd replica)
*
*
*
* To create rack aware assignment, this API will first create a rack alternated broker list. For example,
* from this brokerID -> rack mapping:
* 0 -> "rack1", 1 -> "rack3", 2 -> "rack3", 3 -> "rack2", 4 -> "rack2", 5 -> "rack1"
*
*
* The rack alternated list will be:
*
* 0, 3, 1, 5, 4, 2
*
*
* Then an easy round-robin assignment can be applied. Assume 6 partitions with replication factor of 3, the assignment
* will be:
*
* 0 -> 0,3,1
* 1 -> 3,1,5
* 2 -> 1,5,4
* 3 -> 5,4,2
* 4 -> 4,2,0
* 5 -> 2,0,3
*
*
* Once it has completed the first round-robin, if there are more partitions to assign, the algorithm will start
* shifting the followers. This is to ensure we will not always get the same set of sequences.
* In this case, if there is another partition to assign (partition #6), the assignment will be:
*
* 6 -> 0,4,2 (instead of repeating 0,3,1 as partition 0)
*
*
* The rack aware assignment always chooses the 1st replica of the partition using round robin on the rack alternated
* broker list. For rest of the replicas, it will be biased towards brokers on racks that do not have
* any replica assignment, until every rack has a replica. Then the assignment will go back to round-robin on
* the broker list.
*
*
*
* As the result, if the number of replicas is equal to or greater than the number of racks, it will ensure that
* each rack will get at least one replica. Otherwise, each rack will get at most one replica. In a perfect
* situation where the number of replicas is the same as the number of racks and each rack has the same number of
* brokers, it guarantees that the replica distribution is even across brokers and racks.
*
* @return a Map from partition id to replica ids
* @throws AdminOperationException If rack information is supplied but it is incomplete, or if it is not possible to
* assign each replica to a unique rack.
*
*/
def assignReplicasToBrokers(brokerMetadatas: Seq[BrokerMetadata],
nPartitions: Int,
replicationFactor: Int,
fixedStartIndex: Int = -1,
startPartitionId: Int = -1): Map[Int, Seq[Int]] = {
if (nPartitions <= 0)
throw new InvalidPartitionsException("Number of partitions must be larger than 0.")
if (replicationFactor <= 0)
throw new InvalidReplicationFactorException("Replication factor must be larger than 0.")
if (replicationFactor > brokerMetadatas.size)
throw new InvalidReplicationFactorException(s"Replication factor: $replicationFactor larger than available brokers: ${brokerMetadatas.size}.")
if (brokerMetadatas.forall(_.rack.isEmpty))
assignReplicasToBrokersRackUnaware(nPartitions, replicationFactor, brokerMetadatas.map(_.id), fixedStartIndex,
startPartitionId)
else {
if (brokerMetadatas.exists(_.rack.isEmpty))
throw new AdminOperationException("Not all brokers have rack information for replica rack aware assignment.")
assignReplicasToBrokersRackAware(nPartitions, replicationFactor, brokerMetadatas, fixedStartIndex,
startPartitionId)
}
}
private def assignReplicasToBrokersRackUnaware(nPartitions: Int,
replicationFactor: Int,
brokerList: Seq[Int],
fixedStartIndex: Int,
startPartitionId: Int): Map[Int, Seq[Int]] = {
val ret = mutable.Map[Int, Seq[Int]]()
val brokerArray = brokerList.toArray
val startIndex = if (fixedStartIndex >= 0) fixedStartIndex else rand.nextInt(brokerArray.length)
var currentPartitionId = math.max(0, startPartitionId)
var nextReplicaShift = if (fixedStartIndex >= 0) fixedStartIndex else rand.nextInt(brokerArray.length)
for (_ <- 0 until nPartitions) {
if (currentPartitionId > 0 && (currentPartitionId % brokerArray.length == 0))
nextReplicaShift += 1
val firstReplicaIndex = (currentPartitionId + startIndex) % brokerArray.length
val replicaBuffer = mutable.ArrayBuffer(brokerArray(firstReplicaIndex))
for (j <- 0 until replicationFactor - 1)
replicaBuffer += brokerArray(replicaIndex(firstReplicaIndex, nextReplicaShift, j, brokerArray.length))
ret.put(currentPartitionId, replicaBuffer)
currentPartitionId += 1
}
ret
}
private def assignReplicasToBrokersRackAware(nPartitions: Int,
replicationFactor: Int,
brokerMetadatas: Seq[BrokerMetadata],
fixedStartIndex: Int,
startPartitionId: Int): Map[Int, Seq[Int]] = {
val brokerRackMap = brokerMetadatas.collect { case BrokerMetadata(id, Some(rack)) =>
id -> rack
}.toMap
val numRacks = brokerRackMap.values.toSet.size
val arrangedBrokerList = getRackAlternatedBrokerList(brokerRackMap)
val numBrokers = arrangedBrokerList.size
val ret = mutable.Map[Int, Seq[Int]]()
val startIndex = if (fixedStartIndex >= 0) fixedStartIndex else rand.nextInt(arrangedBrokerList.size)
var currentPartitionId = math.max(0, startPartitionId)
var nextReplicaShift = if (fixedStartIndex >= 0) fixedStartIndex else rand.nextInt(arrangedBrokerList.size)
for (_ <- 0 until nPartitions) {
if (currentPartitionId > 0 && (currentPartitionId % arrangedBrokerList.size == 0))
nextReplicaShift += 1
val firstReplicaIndex = (currentPartitionId + startIndex) % arrangedBrokerList.size
val leader = arrangedBrokerList(firstReplicaIndex)
val replicaBuffer = mutable.ArrayBuffer(leader)
val racksWithReplicas = mutable.Set(brokerRackMap(leader))
val brokersWithReplicas = mutable.Set(leader)
var k = 0
for (_ <- 0 until replicationFactor - 1) {
var done = false
while (!done) {
val broker = arrangedBrokerList(replicaIndex(firstReplicaIndex, nextReplicaShift * numRacks, k, arrangedBrokerList.size))
val rack = brokerRackMap(broker)
// Skip this broker if
// 1. there is already a broker in the same rack that has assigned a replica AND there is one or more racks
// that do not have any replica, or
// 2. the broker has already assigned a replica AND there is one or more brokers that do not have replica assigned
if ((!racksWithReplicas.contains(rack) || racksWithReplicas.size == numRacks)
&& (!brokersWithReplicas.contains(broker) || brokersWithReplicas.size == numBrokers)) {
replicaBuffer += broker
racksWithReplicas += rack
brokersWithReplicas += broker
done = true
}
k += 1
}
}
ret.put(currentPartitionId, replicaBuffer)
currentPartitionId += 1
}
ret
}
/**
* Given broker and rack information, returns a list of brokers alternated by the rack. Assume
* this is the rack and its brokers:
*
* rack1: 0, 1, 2
* rack2: 3, 4, 5
* rack3: 6, 7, 8
*
* This API would return the list of 0, 3, 6, 1, 4, 7, 2, 5, 8
*
* This is essential to make sure that the assignReplicasToBrokers API can use such list and
* assign replicas to brokers in a simple round-robin fashion, while ensuring an even
* distribution of leader and replica counts on each broker and that replicas are
* distributed to all racks.
*/
private[admin] def getRackAlternatedBrokerList(brokerRackMap: Map[Int, String]): IndexedSeq[Int] = {
val brokersIteratorByRack = getInverseMap(brokerRackMap).map { case (rack, brokers) =>
(rack, brokers.iterator)
}
val racks = brokersIteratorByRack.keys.toArray.sorted
val result = new mutable.ArrayBuffer[Int]
var rackIndex = 0
while (result.size < brokerRackMap.size) {
val rackIterator = brokersIteratorByRack(racks(rackIndex))
if (rackIterator.hasNext)
result += rackIterator.next()
rackIndex = (rackIndex + 1) % racks.length
}
result
}
private[admin] def getInverseMap(brokerRackMap: Map[Int, String]): Map[String, Seq[Int]] = {
brokerRackMap.toSeq.map { case (id, rack) => (rack, id) }
.groupBy { case (rack, _) => rack }
.map { case (rack, rackAndIdList) => (rack, rackAndIdList.map { case (_, id) => id }.sorted) }
}
private def replicaIndex(firstReplicaIndex: Int, secondReplicaShift: Int, replicaIndex: Int, nBrokers: Int): Int = {
val shift = 1 + (secondReplicaShift + replicaIndex) % (nBrokers - 1)
(firstReplicaIndex + shift) % nBrokers
}
}
无机架的我们主要看assignReplicasToBrokersRackUnaware()这个方法就可以了
先来看下查看具体topic信息的命令
# 多个topic名称有逗号隔开
kafka-topics.sh --zookeeper localhost:2181/kafka --describe --topic topic-create,topic-demo
使用参数–under-replicated-partitions可以显示未完成同步的副本,就是ISR集合将会少于AR集合。
使用参数–unavailable-partitions额可以查看主题中没有leader副本的分区
在kafka-topics.sh中使用–alter这个参数来修改分区副本以及一些相应的参数,但是现在kafka不支持减少分区只支持增加分区,并且一般不推荐对分区和副本就行修改,这样很可能会影响下游业务的进行。
修改配置可以使用专门的脚本kafka-configs.sh,使用这个脚本修改配置可以动态地修改就是在kafka还在运行的过程中修改
kafka-configs.sh --zookeeper localhost:2181/kafka --describe --entry-type topics --entry-name topic-config
这里的entry-type参数用于指定执行的对象,可以是topics也可以是brokers或者clients甚至是users
entry-tyoe解释 | entry-name解释 |
---|---|
topics 主题类型 | 指定主题名称,如上面topic-config |
brokers kafka实例 | 指定brokerId |
客户端类型 | 生产者或者消费者中设定的client.id编号 |
用户类型,users | 指定用户名 |
添加配置
kafka-configs.sh --zookeeper localhost:2181/kafka --alter --entry-type topics --entry-name topic-config --add-config cleanup.policy=compact,max.message.bytes=10000
删除配置就用–delete-config这个参数,该操作将会把指定的参数恢复为默认值
使用kafka-configs.sh脚本来变更配置时,会在Zookeeper中创建一个命名形式为/config/
导出内容为:{“version”:1,“config”:{"
使用参数–delete,如果不知道主题是否存在添加参数–if-exists参数
使用kafka-topics.sh脚本删除主题的行为本质上只是在Zookeeper的中/kafka/admin/delete_topics路径下创建一个与待删除主题的同名的节点,以此标记这个主题为待删除
手动删除主题的方式