flink消费kafka时topic partitions 和并行度间的分配源码详解

引言

当我们消费kafka的一个topic时,我们知道kafka partition 是和我们设置的并行度是一一对应的;
也就是说,假如我们的topic有12个分区,那我们就设置12个并行度,这样每个并行度都能接收到数据且数据均匀;
那如果我们设置了15个并行度,那么就会有3个并行度是收不到数据的;这可以在web ui上,点开source operate 查看SubTasks的Bytes Sent,就可以发现,有三个SubTasks的Bytes Sent始终为0。

当我们消费kafka多个topic的时候,假如有两个topic,总共24个partitions,我们设置24个并行度;如果我们按照相同的想法,并行度和partition一一对应,那么就该是24个SubTasks都能消费到数据,可实际结果却不是这样的,我们发现有10多个SubTasks并没有消费到任何数据。

所以,带着问题找答案,数据到底怎么分配的呢?

源码

① 我们找到问题的入口,我们在程序里都会new 这样的对象去建立flink和kafka的联系:

new FlinkKafkaConsumer011(topics, new SimpleStringSchema(), kafkaPro)

② 我们new了个FlinkKafkaConsumer011的对象,实际最终就是new 了个 FlinkKafkaConsumerBase,我们可以看到,我们传入的topic list 被封装成了个KafkaTopicsDescriptor的对象

/**
	 * Base constructor.
	 *
	 * @param topics fixed list of topics to subscribe to (null, if using topic pattern)
	 * @param topicPattern the topic pattern to subscribe to (null, if using fixed topics)
	 * @param deserializer The deserializer to turn raw byte messages into Java/Scala objects.
	 * @param discoveryIntervalMillis the topic / partition discovery interval, in
	 *                                milliseconds (0 if discovery is disabled).
	 */
	public FlinkKafkaConsumerBase(
			List<String> topics,
			Pattern topicPattern,
			KafkaDeserializationSchema<T> deserializer,
			long discoveryIntervalMillis,
			boolean useMetrics) {
		//将topic list 被封装成了个KafkaTopicsDescriptor的对象
		this.topicsDescriptor = new KafkaTopicsDescriptor(topics, topicPattern);
		this.deserializer = checkNotNull(deserializer, "valueDeserializer");

		checkArgument(
			discoveryIntervalMillis == PARTITION_DISCOVERY_DISABLED || discoveryIntervalMillis >= 0,
			"Cannot define a negative value for the topic / partition discovery interval.");
		this.discoveryIntervalMillis = discoveryIntervalMillis;

		this.useMetrics = useMetrics;
	}

③ 查看FlinkKafkaConsumerBase.open方法,我们只需看前几行就可以了;我们可以看到,②中的KafkaTopicsDescriptor的对象和当前subtaskID和subtask总数(并行度)继续被封装成了AbstractPartitionDiscoverer的对象partitionDiscoverer ;然后partitionDiscoverer 调用了其方法:discoverPartitions

@Override
public void open(Configuration configuration) throws Exception {
	// determine the offset commit mode
	this.offsetCommitMode = OffsetCommitModes.fromConfiguration(
			getIsAutoCommitEnabled(),
			enableCommitOnCheckpoints,
			((StreamingRuntimeContext) getRuntimeContext()).isCheckpointingEnabled());

	// create the partition discoverer
	this.partitionDiscoverer = createPartitionDiscoverer(
			topicsDescriptor,
			getRuntimeContext().getIndexOfThisSubtask(),
			getRuntimeContext().getNumberOfParallelSubtasks());
	this.partitionDiscoverer.open();

	subscribedPartitionsToStartOffsets = new HashMap<>();
	final List<KafkaTopicPartition> allPartitions = partitionDiscoverer.discoverPartitions();
	......
	}

④ 继续看partitionDiscoverer.discoverPartitions方法;我们先判断我们传入的是否为一个固定list还是一个正则匹配形式,这里我们传入的是list,所以调用了一个getAllPartitionsForTopics方法,(我们这里跳到看⑤) 现在返回的一个KafkaTopicPartition的集合那就是包含了每个topic的每个分区信息的集合;
然后我们往下可以看到把集合的每个元素传到setAndCheckDiscoveredPartition方法中(跳到⑥)。

/**
	 * Execute a partition discovery attempt for this subtask.
	 * This method lets the partition discoverer update what partitions it has discovered so far.
	 *
	 * @return List of discovered new partitions that this subtask should subscribe to.
	 */
	public List<KafkaTopicPartition> discoverPartitions() throws WakeupException, ClosedException {
		if (!closed && !wakeup) {
			try {
				List<KafkaTopicPartition> newDiscoveredPartitions;

				// (1) get all possible partitions, based on whether we are subscribed to fixed topics or a topic pattern
				if (topicsDescriptor.isFixedTopics()) {
					newDiscoveredPartitions = getAllPartitionsForTopics(topicsDescriptor.getFixedTopics());
				} else {
					List<String> matchedTopics = getAllTopics();

					// retain topics that match the pattern
					Iterator<String> iter = matchedTopics.iterator();
					while (iter.hasNext()) {
						if (!topicsDescriptor.isMatchingTopic(iter.next())) {
							iter.remove();
						}
					}

					if (matchedTopics.size() != 0) {
						// get partitions only for matched topics
						newDiscoveredPartitions = getAllPartitionsForTopics(matchedTopics);
					} else {
						newDiscoveredPartitions = null;
					}
				}

				// (2) eliminate partition that are old partitions or should not be subscribed by this subtask
				if (newDiscoveredPartitions == null || newDiscoveredPartitions.isEmpty()) {
					throw new RuntimeException("Unable to retrieve any partitions with KafkaTopicsDescriptor: " + topicsDescriptor);
				} else {
					Iterator<KafkaTopicPartition> iter = newDiscoveredPartitions.iterator();
					KafkaTopicPartition nextPartition;
					while (iter.hasNext()) {
						nextPartition = iter.next();
						if (!setAndCheckDiscoveredPartition(nextPartition)) {
							iter.remove();
						}
					}
				}

				return newDiscoveredPartitions;
			} catch (WakeupException e) {
				// the actual topic / partition metadata fetching methods
				// may be woken up midway; reset the wakeup flag and rethrow
				wakeup = false;
				throw e;
			}
		} else if (!closed && wakeup) {
			// may have been woken up before the method call
			wakeup = false;
			throw new WakeupException();
		} else {
			throw new ClosedException();
		}
	}

⑤ 通过getAllPartitionsForTopics方法,我们可以看到,这里把每个topic的每个分区做为元素,封装成一个KafkaTopicPartition对象,并加到一个list中最终返回;就好比说,我们这里最开始传入的是两个topic,每个topic有12个分区,那么这个list此时就有了24个元素。

@Override
	protected List<KafkaTopicPartition> getAllPartitionsForTopics(List<String> topics) throws WakeupException, RuntimeException {
		List<KafkaTopicPartition> partitions = new LinkedList<>();

		try {
			for (String topic : topics) {
				final List<PartitionInfo> kafkaPartitions = kafkaConsumer.partitionsFor(topic);

				if (kafkaPartitions == null) {
					throw new RuntimeException("Could not fetch partitions for %s. Make sure that the topic exists.".format(topic));
				}

				for (PartitionInfo partitionInfo : kafkaPartitions) {
					partitions.add(new KafkaTopicPartition(partitionInfo.topic(), partitionInfo.partition()));
				}
			}
		} catch (org.apache.kafka.common.errors.WakeupException e) {
			// rethrow our own wakeup exception
			throw new WakeupException();
		}

		return partitions;
	}

⑥ setAndCheckDiscoveredPartition方法,这个方法将告诉我们,一个topic的分区如何被一个subtask ‘预定’;
我们先看这个分区是不是已经被预定的,如果没有,则调用KafkaTopicPartitionAssigner.assign方法

/**
	 * Sets a partition as discovered. Partitions are considered as new
	 * if its partition id is larger than all partition ids previously
	 * seen for the topic it belongs to. Therefore, for a set of
	 * discovered partitions, the order that this method is invoked with
	 * each partition is important.
	 *
	 * 

If the partition is indeed newly discovered, this method also returns * whether the new partition should be subscribed by this subtask. * * @param partition the partition to set and check * * @return {@code true}, if the partition wasn't seen before and should * be subscribed by this subtask; {@code false} otherwise */ public boolean setAndCheckDiscoveredPartition(KafkaTopicPartition partition) { if (isUndiscoveredPartition(partition)) { discoveredPartitions.add(partition); return KafkaTopicPartitionAssigner.assign(partition, numParallelSubtasks) == indexOfThisSubtask; } return false; }

⑦ KafkaTopicPartitionAssigner.assign方法,这个方法返回了一个subtask的id,这个subtask就是topic中一个的partition被指定分配的subtask。
到这里我们应该就明白了,一个partition是如何分配给指定的subtask的:
是根据Topic名称哈希之后对并行度取余,加上分区值再次对并行度取余所决定的。

/**
	 * Returns the index of the target subtask that a specific Kafka partition should be
	 * assigned to.
	 *
	 * 

The resulting distribution of partitions of a single topic has the following contract: *

    *
  • 1. Uniformly distributed across subtasks
  • *
  • 2. Partitions are round-robin distributed (strictly clockwise w.r.t. ascending * subtask indices) by using the partition id as the offset from a starting index * (i.e., the index of the subtask which partition 0 of the topic will be assigned to, * determined using the topic name).
  • *
* *

The above contract is crucial and cannot be broken. Consumer subtasks rely on this * contract to locally filter out partitions that it should not subscribe to, guaranteeing * that all partitions of a single topic will always be assigned to some subtask in a * uniformly distributed manner. * * @param partition the Kafka partition * @param numParallelSubtasks total number of parallel subtasks * * @return index of the target subtask that the Kafka partition should be assigned to. */ public static int assign(KafkaTopicPartition partition, int numParallelSubtasks) { int startIndex = ((partition.getTopic().hashCode() * 31) & 0x7FFFFFFF) % numParallelSubtasks; // here, the assumption is that the id of Kafka partitions are always ascending // starting from 0, and therefore can be used directly as the offset clockwise from the start index return (startIndex + partition.getPartition()) % numParallelSubtasks; }

结论

所以我们根据源码,就可以知道,为什么flink消费一个topic,并行度=partitions是均匀分配的,但是消费多个topic,并行度=partitions那就是不均匀分配的了。

你可能感兴趣的:(flink,flink,大数据)