引言
当我们消费kafka的一个topic时,我们知道kafka partition 是和我们设置的并行度是一一对应的;
也就是说,假如我们的topic有12个分区,那我们就设置12个并行度,这样每个并行度都能接收到数据且数据均匀;
那如果我们设置了15个并行度,那么就会有3个并行度是收不到数据的;这可以在web ui上,点开source operate 查看SubTasks的Bytes Sent,就可以发现,有三个SubTasks的Bytes Sent始终为0。
当我们消费kafka多个topic的时候,假如有两个topic,总共24个partitions,我们设置24个并行度;如果我们按照相同的想法,并行度和partition一一对应,那么就该是24个SubTasks都能消费到数据,可实际结果却不是这样的,我们发现有10多个SubTasks并没有消费到任何数据。
所以,带着问题找答案,数据到底怎么分配的呢?
源码
① 我们找到问题的入口,我们在程序里都会new 这样的对象去建立flink和kafka的联系:
new FlinkKafkaConsumer011(topics, new SimpleStringSchema(), kafkaPro)
② 我们new了个FlinkKafkaConsumer011的对象,实际最终就是new 了个 FlinkKafkaConsumerBase,我们可以看到,我们传入的topic list 被封装成了个KafkaTopicsDescriptor的对象
/**
* Base constructor.
*
* @param topics fixed list of topics to subscribe to (null, if using topic pattern)
* @param topicPattern the topic pattern to subscribe to (null, if using fixed topics)
* @param deserializer The deserializer to turn raw byte messages into Java/Scala objects.
* @param discoveryIntervalMillis the topic / partition discovery interval, in
* milliseconds (0 if discovery is disabled).
*/
public FlinkKafkaConsumerBase(
List<String> topics,
Pattern topicPattern,
KafkaDeserializationSchema<T> deserializer,
long discoveryIntervalMillis,
boolean useMetrics) {
//将topic list 被封装成了个KafkaTopicsDescriptor的对象
this.topicsDescriptor = new KafkaTopicsDescriptor(topics, topicPattern);
this.deserializer = checkNotNull(deserializer, "valueDeserializer");
checkArgument(
discoveryIntervalMillis == PARTITION_DISCOVERY_DISABLED || discoveryIntervalMillis >= 0,
"Cannot define a negative value for the topic / partition discovery interval.");
this.discoveryIntervalMillis = discoveryIntervalMillis;
this.useMetrics = useMetrics;
}
③ 查看FlinkKafkaConsumerBase.open方法,我们只需看前几行就可以了;我们可以看到,②中的KafkaTopicsDescriptor的对象和当前subtaskID和subtask总数(并行度)继续被封装成了AbstractPartitionDiscoverer的对象partitionDiscoverer ;然后partitionDiscoverer 调用了其方法:discoverPartitions
@Override
public void open(Configuration configuration) throws Exception {
// determine the offset commit mode
this.offsetCommitMode = OffsetCommitModes.fromConfiguration(
getIsAutoCommitEnabled(),
enableCommitOnCheckpoints,
((StreamingRuntimeContext) getRuntimeContext()).isCheckpointingEnabled());
// create the partition discoverer
this.partitionDiscoverer = createPartitionDiscoverer(
topicsDescriptor,
getRuntimeContext().getIndexOfThisSubtask(),
getRuntimeContext().getNumberOfParallelSubtasks());
this.partitionDiscoverer.open();
subscribedPartitionsToStartOffsets = new HashMap<>();
final List<KafkaTopicPartition> allPartitions = partitionDiscoverer.discoverPartitions();
......
}
④ 继续看partitionDiscoverer.discoverPartitions方法;我们先判断我们传入的是否为一个固定list还是一个正则匹配形式,这里我们传入的是list,所以调用了一个getAllPartitionsForTopics方法,(我们这里跳到看⑤) 现在返回的一个KafkaTopicPartition的集合那就是包含了每个topic的每个分区信息的集合;
然后我们往下可以看到把集合的每个元素传到setAndCheckDiscoveredPartition方法中(跳到⑥)。
/**
* Execute a partition discovery attempt for this subtask.
* This method lets the partition discoverer update what partitions it has discovered so far.
*
* @return List of discovered new partitions that this subtask should subscribe to.
*/
public List<KafkaTopicPartition> discoverPartitions() throws WakeupException, ClosedException {
if (!closed && !wakeup) {
try {
List<KafkaTopicPartition> newDiscoveredPartitions;
// (1) get all possible partitions, based on whether we are subscribed to fixed topics or a topic pattern
if (topicsDescriptor.isFixedTopics()) {
newDiscoveredPartitions = getAllPartitionsForTopics(topicsDescriptor.getFixedTopics());
} else {
List<String> matchedTopics = getAllTopics();
// retain topics that match the pattern
Iterator<String> iter = matchedTopics.iterator();
while (iter.hasNext()) {
if (!topicsDescriptor.isMatchingTopic(iter.next())) {
iter.remove();
}
}
if (matchedTopics.size() != 0) {
// get partitions only for matched topics
newDiscoveredPartitions = getAllPartitionsForTopics(matchedTopics);
} else {
newDiscoveredPartitions = null;
}
}
// (2) eliminate partition that are old partitions or should not be subscribed by this subtask
if (newDiscoveredPartitions == null || newDiscoveredPartitions.isEmpty()) {
throw new RuntimeException("Unable to retrieve any partitions with KafkaTopicsDescriptor: " + topicsDescriptor);
} else {
Iterator<KafkaTopicPartition> iter = newDiscoveredPartitions.iterator();
KafkaTopicPartition nextPartition;
while (iter.hasNext()) {
nextPartition = iter.next();
if (!setAndCheckDiscoveredPartition(nextPartition)) {
iter.remove();
}
}
}
return newDiscoveredPartitions;
} catch (WakeupException e) {
// the actual topic / partition metadata fetching methods
// may be woken up midway; reset the wakeup flag and rethrow
wakeup = false;
throw e;
}
} else if (!closed && wakeup) {
// may have been woken up before the method call
wakeup = false;
throw new WakeupException();
} else {
throw new ClosedException();
}
}
⑤ 通过getAllPartitionsForTopics方法,我们可以看到,这里把每个topic的每个分区做为元素,封装成一个KafkaTopicPartition对象,并加到一个list中最终返回;就好比说,我们这里最开始传入的是两个topic,每个topic有12个分区,那么这个list此时就有了24个元素。
@Override
protected List<KafkaTopicPartition> getAllPartitionsForTopics(List<String> topics) throws WakeupException, RuntimeException {
List<KafkaTopicPartition> partitions = new LinkedList<>();
try {
for (String topic : topics) {
final List<PartitionInfo> kafkaPartitions = kafkaConsumer.partitionsFor(topic);
if (kafkaPartitions == null) {
throw new RuntimeException("Could not fetch partitions for %s. Make sure that the topic exists.".format(topic));
}
for (PartitionInfo partitionInfo : kafkaPartitions) {
partitions.add(new KafkaTopicPartition(partitionInfo.topic(), partitionInfo.partition()));
}
}
} catch (org.apache.kafka.common.errors.WakeupException e) {
// rethrow our own wakeup exception
throw new WakeupException();
}
return partitions;
}
⑥ setAndCheckDiscoveredPartition方法,这个方法将告诉我们,一个topic的分区如何被一个subtask ‘预定’;
我们先看这个分区是不是已经被预定的,如果没有,则调用KafkaTopicPartitionAssigner.assign方法
/**
* Sets a partition as discovered. Partitions are considered as new
* if its partition id is larger than all partition ids previously
* seen for the topic it belongs to. Therefore, for a set of
* discovered partitions, the order that this method is invoked with
* each partition is important.
*
* If the partition is indeed newly discovered, this method also returns
* whether the new partition should be subscribed by this subtask.
*
* @param partition the partition to set and check
*
* @return {@code true}, if the partition wasn't seen before and should
* be subscribed by this subtask; {@code false} otherwise
*/
public boolean setAndCheckDiscoveredPartition(KafkaTopicPartition partition) {
if (isUndiscoveredPartition(partition)) {
discoveredPartitions.add(partition);
return KafkaTopicPartitionAssigner.assign(partition, numParallelSubtasks) == indexOfThisSubtask;
}
return false;
}
⑦ KafkaTopicPartitionAssigner.assign方法,这个方法返回了一个subtask的id,这个subtask就是topic中一个的partition被指定分配的subtask。
到这里我们应该就明白了,一个partition是如何分配给指定的subtask的:
是根据Topic名称哈希之后对并行度取余,加上分区值再次对并行度取余所决定的。
/**
* Returns the index of the target subtask that a specific Kafka partition should be
* assigned to.
*
* The resulting distribution of partitions of a single topic has the following contract:
*
* - 1. Uniformly distributed across subtasks
* - 2. Partitions are round-robin distributed (strictly clockwise w.r.t. ascending
* subtask indices) by using the partition id as the offset from a starting index
* (i.e., the index of the subtask which partition 0 of the topic will be assigned to,
* determined using the topic name).
*
*
* The above contract is crucial and cannot be broken. Consumer subtasks rely on this
* contract to locally filter out partitions that it should not subscribe to, guaranteeing
* that all partitions of a single topic will always be assigned to some subtask in a
* uniformly distributed manner.
*
* @param partition the Kafka partition
* @param numParallelSubtasks total number of parallel subtasks
*
* @return index of the target subtask that the Kafka partition should be assigned to.
*/
public static int assign(KafkaTopicPartition partition, int numParallelSubtasks) {
int startIndex = ((partition.getTopic().hashCode() * 31) & 0x7FFFFFFF) % numParallelSubtasks;
// here, the assumption is that the id of Kafka partitions are always ascending
// starting from 0, and therefore can be used directly as the offset clockwise from the start index
return (startIndex + partition.getPartition()) % numParallelSubtasks;
}
结论
所以我们根据源码,就可以知道,为什么flink消费一个topic,并行度=partitions是均匀分配的,但是消费多个topic,并行度=partitions那就是不均匀分配的了。