目录
消息生产者、消费者以及消息发布的不同模型
Kafka Producer
Kafka Producer消息发送架构图
Kafka Consumer
Kafka Consumer Group
Kafka High Level Consumer Rebalance(重新分配消费)
Low Level Consumer
public final class RecordAccumulator {
private final ConcurrentMap> batches;
...
}
//KafkaProducer
public Future send(ProducerRecord record, Callback callback) {
try {
// first make sure the metadata for the topic is available
long waitedOnMetadataMs = waitOnMetadata(record.topic(), this.maxBlockTimeMs);
...
RecordAccumulator.RecordAppendResult result = accumulator.append(tp, serializedKey, serializedValue, callback, remainingWaitMs); //核心函数:把消息放入队列
if (result.batchIsFull || result.newBatchCreated) {
log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
this.sender.wakeup();
}
return result.future;
从上面代码可以看到,batch逻辑都在accumulator.append函数里面
public RecordAppendResult append(TopicPartition tp, byte[] key, byte[] value, Callback callback, long maxTimeToBlock) throws InterruptedException {
appendsInProgress.incrementAndGet();
try {
if (closed)
throw new IllegalStateException("Cannot send after the producer is closed.");
Deque dq = dequeFor(tp); //找到该topicPartiton对应的消息队列
synchronized (dq) {
RecordBatch last = dq.peekLast(); //拿出队列的最后1个元素
if (last != null) {
FutureRecordMetadata future = last.tryAppend(key, value, callback, time.milliseconds()); //最后一个元素, 即RecordBatch不为空,把该Record加入该RecordBatch
if (future != null)
return new RecordAppendResult(future, dq.size() > 1 || last.records.isFull(), false);
}
}
int size = Math.max(this.batchSize, Records.LOG_OVERHEAD + Record.recordSize(key, value));
log.trace("Allocating a new {} byte message buffer for topic {} partition {}", size, tp.topic(), tp.partition());
ByteBuffer buffer = free.allocate(size, maxTimeToBlock);
synchronized (dq) {
// Need to check if producer is closed again after grabbing the dequeue lock.
if (closed)
throw new IllegalStateException("Cannot send after the producer is closed.");
RecordBatch last = dq.peekLast();
if (last != null) {
FutureRecordMetadata future = last.tryAppend(key, value, callback, time.milliseconds());
if (future != null) {
// Somebody else found us a batch, return the one we waited for! Hopefully this doesn't happen often...
free.deallocate(buffer);
return new RecordAppendResult(future, dq.size() > 1 || last.records.isFull(), false);
}
}
//队列里面没有RecordBatch,建一个新的,然后把Record放进去
MemoryRecords records = MemoryRecords.emptyRecords(buffer, compression, this.batchSize);
RecordBatch batch = new RecordBatch(tp, records, time.milliseconds());
FutureRecordMetadata future = Utils.notNull(batch.tryAppend(key, value, callback, time.milliseconds()));
dq.addLast(batch);
incomplete.add(batch);
return new RecordAppendResult(future, dq.size() > 1 || batch.records.isFull(), true);
}
} finally {
appendsInProgress.decrementAndGet();
}
}
private Deque dequeFor(TopicPartition tp) {
Deque d = this.batches.get(tp);
if (d != null)
return d;
d = new ArrayDeque<>();
Deque previous = this.batches.putIfAbsent(tp, d);
if (previous == null)
return d;
else
return previous;
}
从上面代码可以看出batch的策略:
1.如果是同步发送,每次去队列取,RecordBatch都会为空,这个时候消息就不会被batch,一个Record作为一个RecordBatch
2.当Producer入队速率 < Sender出队速率 && lingerMs=0
3.Producer 入队速率 > Sender出对速率, 消息会被batch
4.lingerMs > 0,这个时候Sender会等待,直到lingerMs > 0 或者 队列满了,或者超过了一个RecordBatch的最大值,就会发送。这个逻辑在RecordAccumulator的ready函数里面。
ReadyCheckResult ready(Cluster cluster, long nowMs) {
Set readyNodes = new HashSet();
long nextReadyCheckDelayMs = Long.MAX_VALUE;
boolean unknownLeadersExist = false;
boolean exhausted = this.free.queued() > 0;
for (Map.Entry> entry : this.batches.entrySet()) {
TopicPartition part = entry.getKey();
Deque deque = entry.getValue();
Node leader = cluster.leaderFor(part);
if (leader == null) {
unknownLeadersExist = true;
} else if (!readyNodes.contains(leader)) {
synchronized (deque) {
RecordBatch batch = deque.peekFirst();
if (batch != null) {
boolean backingOff = batch.attempts > 0 && batch.lastAttemptMs + retryBackoffMs > nowMs;
long waitedTimeMs = nowMs - batch.lastAttemptMs;
long timeToWaitMs = backingOff ? retryBackoffMs : lingerMs;
long timeLeftMs = Math.max(timeToWaitMs - waitedTimeMs, 0);
boolean full = deque.size() > 1 || batch.records.isFull();
boolean expired = waitedTimeMs >= timeToWaitMs;
boolean sendable = full || expired || exhausted || closed || flushInProgress(); //关键的一句话
if (sendable && !backingOff) {
readyNodes.add(leader);
} else {
nextReadyCheckDelayMs = Math.min(timeLeftMs, nextReadyCheckDelayMs);
}
}
}
}
}
return new ReadyCheckResult(readyNodes, nextReadyCheckDelayMs, unknownLeadersExist);
}
(这一部分暂未彻底弄懂,先记结论)
API | 原理 | 优点 | 缺点 |
High Level Consumer API(入口类:ConsumerConnector) | 将底层具体获取数据、更新offset、设置偏移量等操作屏蔽掉,直接将操作数据流的处理工作提供给编写程序的人员 | 操作简单 | 可操作性差,无法按照自己的业务场景选择处理方式 |
Lower Level Consumer API(入口类:SimpleConsumer) | 通过直接操作底层API获取数据的方式获取Kafka中的数据,需要自行给定分区、偏移量等属性 | 可操作性强 | 代码比较复杂 |
当有Consumer加入或退出、coodinator挂了(0.9之后用于管理Consumer Group的角色)、以及partition的改变(如broker加入或退出)时会触发rebalance,Consumer Group通过Rebalance提供HA特性
举例:
topic有4个partition[p0,p1,p2,p3],2个consumer[c0,c1]
将所有partition排序,存在集合P中,Consumer排序也存在集合C中
N=size(P)/size(C)=4/2=2
根据公式可以知道分配
C[0]->p0,p1
C[2]->p2,p3
当consumer加入[c0,c1,c2,c3]
N=4/4=1
Rebalance之后
C[0]->p0
C[1]->p1
C[2]->p2
C[3]->p3
使用Low Level Consumer (Simple Consumer)的主要原因是:用户希望比Consumer Group更好的控制数据的消费,如:
与High Level Consumer相对,Low Level Consumer要求用户做大量的额外工作:
参考文章:
Kafka源码深度解析:https://blog.csdn.net/chunlongyu/article/category/6417583
kafka Consumer Pull vs Push & Low level API vs High level API:https://blog.csdn.net/qq_37502106/article/details/80260546
kafka学习笔记:知识点整理:https://www.cnblogs.com/cyfonly/p/5954614.html