FlinkKafkaProducerBase的子类可以使用默认的KafkaPartitioner FixedPartitioner(只向partition 0中写数据)也可以使用自己定义的Partitioner(继承KafkaPartitioner),我觉得实现比较复杂.
构造FlinkKafkaProducerBase的子类的2种情况
public FlinkKafkaProducer09(String topicId, SerializationSchema serializationSchema,
Properties producerConfig) {
this(topicId, new KeyedSerializationSchemaWrapper<>(serializationSchema),
producerConfig, new FixedPartitioner());
}
public FlinkKafkaProducer09(String topicId, SerializationSchema serializationSchema,
Properties producerConfig, KafkaPartitioner customPartitioner) {
this(topicId, new KeyedSerializationSchemaWrapper<>(serializationSchema),
producerConfig, customPartitioner);
}
默认的FixedPartitioner
public class FixedPartitioner<T> extends KafkaPartitioner<T> implements Serializable {
private static final long serialVersionUID = 1627268846962918126L;
private int targetPartition = -1;
@Override
public void open(int parallelInstanceId, int parallelInstances, int[] partitions) {
if (parallelInstanceId < 0 || parallelInstances <= 0 ||
partitions.length == 0) {
throw new IllegalArgumentException();
}
this.targetPartition = partitions[parallelInstanceId % partitions.length];
}
@Override
public int partition(T next, byte[] serializedKey, byte[] serializedValue,
int numPartitions) {
if (targetPartition >= 0) {
return targetPartition;
} else {
throw new RuntimeException("The partitioner has not been initialized properly");
}
}
}
在构造FlinkKafkaProducerBase的子类时,可以传递一个值为null的KafkaPartitioner,这样就可以使用Kafka Client默认的Partitioner,默认的Paritioner就是将数据均匀分配到各个partition中.
protected FlinkKafkaProducerBase createSink(String topic, KeyedSerializationSchema
deserializationSchema, Properties properties) {
String classFullName = "";
if (kafkaVersion.startsWith("0.8")) {
classFullName =
"org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer08";
} else if (kafkaVersion.startsWith("0.9")) {
classFullName =
"org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer09";
} else if (kafkaVersion.startsWith("0.10")) {
classFullName =
"org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer09";
} else {
throw new RuntimeException("not support the "+
"version kafka = " + kafkaVersion);
}
FlinkKafkaProducerBase sink = null;
try {
Class clazz = Class.forName(classFullName);
Constructor constructor = clazz.getConstructor(String.class,
KeyedSerializationSchema.class, Properties.class, KafkaPartitioner.class);
sink = (FlinkKafkaProducerBase) constructor.newInstance(topic,
deserializationSchema, properties,(KafkaPartitioner)null);
} catch (Throwable e) {
e.printStackTrace();
}
return sink;
}
Kafka Client中默认的Partitioner
public class DefaultPartitioner implements Partitioner {
private final ConcurrentMap topicCounterMap =
new ConcurrentHashMap<>();
public void configure(Map configs) {}
public int partition(String topic, Object key, byte[] keyBytes, Object value,
byte[] valueBytes, Cluster cluster) {
List partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
if (keyBytes == null) {
int nextValue = nextValue(topic);
List availablePartitions =
cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() > 0) {
int part = Utils.toPositive(nextValue) % availablePartitions.size();
return availablePartitions.get(part).partition();
} else {
// no partitions are available, give a non-available partition
return Utils.toPositive(nextValue) % numPartitions;
}
} else {
// hash the keyBytes to choose a partition
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
}
private int nextValue(String topic) {
AtomicInteger counter = topicCounterMap.get(topic);
if (null == counter) {
counter = new AtomicInteger(ThreadLocalRandom.current().nextInt());
AtomicInteger currentCounter = topicCounterMap.putIfAbsent(topic, counter);
if (currentCounter != null) {
counter = currentCounter;
}
}
return counter.getAndIncrement();
}
public void close() {}
}
调用过程
在调用FlinkKafkaProducerBase中的invoke方法时,会判断partitioner是否为空,如果为空则构建一个partition属性为空的ProducerRecord对象,否则使用partitioner获得partition构造ProducerRecord对象.
public void invoke(IN next) throws Exception {
// propagate asynchronous errors
checkErroneous();
byte[] serializedKey = schema.serializeKey(next);
byte[] serializedValue = schema.serializeValue(next);
String targetTopic = schema.getTargetTopic(next);
if (targetTopic == null) {
targetTopic = defaultTopicId;
}
ProducerRecord<byte[], byte[]> record;
if (partitioner == null) {
record = new ProducerRecord<>(targetTopic, serializedKey,
serializedValue);
} else {
record = new ProducerRecord<>(targetTopic,
partitioner.partition(next, serializedKey, serializedValue,
partitions.length), serializedKey, serializedValue);
}
if (flushOnCheckpoint) {
synchronized (pendingRecordsLock) {
pendingRecords++;
}
}
producer.send(record, callback);
}
在调用KafkaProducer的send方法的时候,方法里面会调用partition方法决定数据放到哪个分区,如果ProducerRecord的partition属性存在并且合法,则使用该值,否则使用KafkaProducer中的partitioner进行分区
private int partition(ProducerRecord record, byte[] serializedKey ,
byte[] serializedValue, Cluster cluster) {
Integer partition = record.partition();
if (partition != null) {
List partitions = cluster.partitionsForTopic(record.topic());
int numPartitions = partitions.size();
// they have given us a partition, use it
if (partition < 0 || partition >= numPartitions)
throw new IllegalArgumentException("Invalid partition given with record: " +
partition
+ " is not in the range [0..."
+ numPartitions
+ "].");
return partition;
}
return this.partitioner.partition(record.topic(), record.key(),
serializedKey, record.value(), serializedValue,cluster);
}
KafkaProducer的partitioner是通过读取配置获取的,默认为DefaultPartitioner,可以在properties中put partitioner.class指定要使用的partitioner
this.partitioner = config.getConfiguredInstance(
ProducerConfig.PARTITIONER_CLASS_CONFIG, Partitioner.class);