为什么使用SimpleConsumer?
使用SimpleConsumer实现的主要原因是你想要更好地控制分区消耗比消费者组给你。
例如,您想要:
- 多次读取消息
- 仅消耗进程中某个主题的分区的子集
- 管理事务以确保邮件仅处理一次且仅处理一次
使用SimpleConsumer的缺点
SimpleConsumer需要在消费者组中不需要大量的工作:
- 您必须跟踪应用程序中的偏移量,以了解您在哪里停止使用。
- 您必须弄清楚哪个Broker是主题和分区的主要经纪人
- 您必须处理代理领导更改
使用SimpleConsumer的步骤
- 找到一个活跃的经纪人,并找出哪个经纪人是您的主题和分区的领导者
- 确定副本代理是针对您的主题和分区的
- 构建定义您感兴趣的数据的请求
- 获取数据
- 识别和恢复领导更改
为主题和分区找到主要经纪人
最简单的方法是通过一个属性文件或命令行将一组已知的Brokers传递给你的逻辑。这些不一定是集群中的所有代理,而只是一个集合,您可以开始寻找一个活的Broker来查询Leader信息。
private
PartitionMetadata findLeader(List
int
a_port, String a_topic,
int
a_partition) {
PartitionMetadata returnMetaData =
null
;
loop:
for
(String seed : a_seedBrokers) {
SimpleConsumer consumer =
null
;
try
{
consumer =
new
SimpleConsumer(seed, a_port,
100000
,
64
*
1024
,
"leaderLookup"
);
List
TopicMetadataRequest req =
new
TopicMetadataRequest(topics);
kafka.javaapi.TopicMetadataResponse resp = consumer.send(req);
List
for
(TopicMetadata item : metaData) {
for
(PartitionMetadata part : item.partitionsMetadata()) {
if
(part.partitionId() == a_partition) {
returnMetaData = part;
break
loop;
}
}
}
}
catch
(Exception e) {
System.out.println(
"Error communicating with Broker ["
+ seed +
"] to find Leader for ["
+ a_topic
+
", "
+ a_partition +
"] Reason: "
+ e);
}
finally
{
if
(consumer !=
null
) consumer.close();
}
}
if
(returnMetaData !=
null
) {
m_replicaBrokers.clear();
for
(kafka.cluster.Broker replica : returnMetaData.replicas()) {
m_replicaBrokers.add(replica.host());
}
}
return
returnMetaData;
}
|
对主题的调用Metadata()询问您连接到的代理以获取关于我们感兴趣的主题的所有细节。
分区Metadata上的循环遍历所有分区,直到找到我们想要的分区。一旦我们找到它,我们可以打破所有的循环。
查找读取的起始偏移
现在定义从哪里开始读取数据。Kafka包括两个常量帮助,kafka.api.OffsetRequest.EarliestTime()查找日志中的数据的开始,并从那里开始流式传输,kafka.api.OffsetRequest.LatestTime()将只流式传输新的消息。不要假定偏移0是开始偏移,因为消息随着时间推移而超出日志。
public
static
long
getLastOffset(SimpleConsumer consumer, String topic,
int
partition,
long
whichTime, String clientName) {
TopicAndPartition topicAndPartition =
new
TopicAndPartition(topic, partition);
Map
new
HashMap
requestInfo.put(topicAndPartition,
new
PartitionOffsetRequestInfo(whichTime,
1
));
kafka.javaapi.OffsetRequest request =
new
kafka.javaapi.OffsetRequest(requestInfo, kafka.api.OffsetRequest.CurrentVersion(),clientName);
OffsetResponse response = consumer.getOffsetsBefore(request);
if
(response.hasError()) {
System.out.println(
"Error fetching data Offset Data the Broker. Reason: "
+ response.errorCode(topic, partition) );
return
0
;
}
long
[] offsets = response.offsets(topic, partition);
return
offsets[
0
];
}
|
错误处理
由于SimpleConsumer不处理Lead Broker故障,你必须写一点代码来处理它。
if
(fetchResponse.hasError()) {
numErrors++;
// Something went wrong!
short
code = fetchResponse.errorCode(a_topic, a_partition);
System.out.println(
"Error fetching data from the Broker:"
+ leadBroker +
" Reason: "
+ code);
if
(numErrors >
5
)
break
;
if
(code == ErrorMapping.OffsetOutOfRangeCode()) {
// We asked for an invalid offset. For simple case ask for the last element to reset
readOffset = getLastOffset(consumer,a_topic, a_partition, kafka.api.OffsetRequest.LatestTime(), clientName);
continue
;
}
consumer.close();
consumer =
null
;
leadBroker = findNewLeader(leadBroker, a_topic, a_partition, a_port);
continue
;
}
|
这里,一旦获取返回错误,我们记录原因,关闭消费者,然后尝试找出新的领导者是谁。
private
String findNewLeader(String a_oldLeader, String a_topic,
int
a_partition,
int
a_port)
throws
Exception {
for
(
int
i =
0
; i <
3
; i++) {
boolean
goToSleep =
false
;
PartitionMetadata metadata = findLeader(m_replicaBrokers, a_port, a_topic, a_partition);
if
(metadata ==
null
) {
goToSleep =
true
;
}
else
if
(metadata.leader() ==
null
) {
goToSleep =
true
;
}
else
if
(a_oldLeader.equalsIgnoreCase(metadata.leader().host()) && i ==
0
) {
// first time through if the leader hasn't changed give ZooKeeper a second to recover
// second time, assume the broker did recover before failover, or it was a non-Broker issue
//
goToSleep =
true
;
}
else
{
return
metadata.leader().host();
}
if
(goToSleep) {
try
{
Thread.sleep(
1000
);
}
catch
(InterruptedException ie) {
}
}
}
System.out.println(
"Unable to find new leader after Broker failure. Exiting"
);
throw
new
Exception(
"Unable to find new leader after Broker failure. Exiting"
);
}
|
这个方法使用我们之前定义的findLeader()逻辑来找到新的领导者,除了这里我们只尝试连接到主题/分区的副本之一。这种方式如果我们不能达到任何经纪人与我们感兴趣的数据,我们放弃和退出硬。
由于ZooKeeper可能需要很短时间来检测领导者丢失并分配一个新的领导者,所以如果我们没有得到答案,我们就睡着了。在现实中ZooKeeper经常做故障转移非常快,所以你从来不睡觉。
读取数据
最后,我们读取流传回的数据,并将其写出。
// When calling FetchRequestBuilder, it's important NOT to call .replicaId(), which is meant for internal use only.
// Setting the replicaId incorrectly will cause the brokers to behave incorrectly.
FetchRequest req =
new
FetchRequestBuilder()
.clientId(clientName)
.addFetch(a_topic, a_partition, readOffset,
100000
)
.build();
FetchResponse fetchResponse = consumer.fetch(req);
if
(fetchResponse.hasError()) {
// See code in previous section
}
numErrors =
0
;
long
numRead =
0
;
for
(MessageAndOffset messageAndOffset : fetchResponse.messageSet(a_topic, a_partition)) {
long
currentOffset = messageAndOffset.offset();
if
(currentOffset < readOffset) {
System.out.println(
"Found an old offset: "
+ currentOffset +
" Expecting: "
+ readOffset);
continue
;
}
readOffset = messageAndOffset.nextOffset();
ByteBuffer payload = messageAndOffset.message().payload();
byte
[] bytes =
new
byte
[payload.limit()];
payload.get(bytes);
System.out.println(String.valueOf(messageAndOffset.offset()) +
": "
+
new
String(bytes,
"UTF-8"
));
numRead++;
a_maxReads--;
}
if
(numRead ==
0
) {
try
{
Thread.sleep(
1000
);
}
catch
(InterruptedException ie) {
}
}
|
注意,'readOffset'询问最后读取的消息下一个偏移将是什么。这样当消息块被处理时,我们知道在哪里请求Kafka在哪里开始下一次提取。
还要注意,我们显式地检查被读取的偏移量不小于我们请求的偏移量。这是需要的,因为如果Kafka正在压缩消息,则即使所请求的偏移不是压缩块的开始,取出请求也将返回整个压缩块。因此,我们可能会再次返回我们之前看到的消息。还要注意,我们要求一个fetchSize为100000字节。如果Kafka生产者编写大批量,这可能不够,并可能返回一个空消息集。在这种情况下,应该增加fetchSize,直到返回非空集。
最后,我们跟踪读取的邮件数。如果我们在最后一个请求中没有读到任何东西,我们就睡一会儿,所以我们没有在没有数据的时候打Kafka。
运行示例
The example expects the following parameters:
- Maximum number of messages to read (so we don’t loop forever)
- Topic to read from
- Partition to read from
- One broker to use for Metadata lookup
- Port the brokers listen on
Full Source Code
package
com.test.simple;
import
kafka.api.FetchRequest;
import
kafka.api.FetchRequestBuilder;
import
kafka.api.PartitionOffsetRequestInfo;
import
kafka.common.ErrorMapping;
import
kafka.common.TopicAndPartition;
import
kafka.javaapi.*;
import
kafka.javaapi.consumer.SimpleConsumer;
import
kafka.message.MessageAndOffset;
import
java.nio.ByteBuffer;
import
java.util.ArrayList;
import
java.util.Collections;
import
java.util.HashMap;
import
java.util.List;
import
java.util.Map;
public
class
SimpleExample {
public
static
void
main(String args[]) {
SimpleExample example =
new
SimpleExample();
long
maxReads = Long.parseLong(args[
0
]);
String topic = args[
1
];
int
partition = Integer.parseInt(args[
2
]);
List
new
ArrayList
seeds.add(args[
3
]);
int
port = Integer.parseInt(args[
4
]);
try
{
example.run(maxReads, topic, partition, seeds, port);
}
catch
(Exception e) {
System.out.println(
"Oops:"
+ e);
e.printStackTrace();
}
}
private
List
new
ArrayList
public
SimpleExample() {
m_replicaBrokers =
new
ArrayList
}
public
void
run(
long
a_maxReads, String a_topic,
int
a_partition, List
int
a_port)
throws
Exception {
// find the meta data about the topic and partition we are interested in
//
PartitionMetadata metadata = findLeader(a_seedBrokers, a_port, a_topic, a_partition);
if
(metadata ==
null
) {
System.out.println(
"Can't find metadata for Topic and Partition. Exiting"
);
return
;
}
if
(metadata.leader() ==
null
) {
System.out.println(
"Can't find Leader for Topic and Partition. Exiting"
);
return
;
}
String leadBroker = metadata.leader().host();
String clientName =
"Client_"
+ a_topic +
"_"
+ a_partition;
SimpleConsumer consumer =
new
SimpleConsumer(leadBroker, a_port,
100000
,
64
*
1024
, clientName);
long
readOffset = getLastOffset(consumer,a_topic, a_partition, kafka.api.OffsetRequest.EarliestTime(), clientName);
int
numErrors =
0
;
while
(a_maxReads >
0
) {
if
(consumer ==
null
) {
consumer =
new
SimpleConsumer(leadBroker, a_port,
100000
,
64
*
1024
, clientName);
}
FetchRequest req =
new
FetchRequestBuilder()
.clientId(clientName)
.addFetch(a_topic, a_partition, readOffset,
100000
)
// Note: this fetchSize of 100000 might need to be increased if large batches are written to Kafka
.build();
FetchResponse fetchResponse = consumer.fetch(req);
if
(fetchResponse.hasError()) {
numErrors++;
// Something went wrong!
short
code = fetchResponse.errorCode(a_topic, a_partition);
System.out.println(
"Error fetching data from the Broker:"
+ leadBroker +
" Reason: "
+ code);
if
(numErrors >
5
)
break
;
if
(code == ErrorMapping.OffsetOutOfRangeCode()) {
// We asked for an invalid offset. For simple case ask for the last element to reset
readOffset = getLastOffset(consumer,a_topic, a_partition, kafka.api.OffsetRequest.LatestTime(), clientName);
continue
;
}
consumer.close();
consumer =
null
;
leadBroker = findNewLeader(leadBroker, a_topic, a_partition, a_port);
continue
;
}
numErrors =
0
;
long
numRead =
0
;
for
(MessageAndOffset messageAndOffset : fetchResponse.messageSet(a_topic, a_partition)) {
long
currentOffset = messageAndOffset.offset();
if
(currentOffset < readOffset) {
System.out.println(
"Found an old offset: "
+ currentOffset +
" Expecting: "
+ readOffset);
continue
;
}
readOffset = messageAndOffset.nextOffset();
ByteBuffer payload = messageAndOffset.message().payload();
byte
[] bytes =
new
byte
[payload.limit()];
payload.get(bytes);
System.out.println(String.valueOf(messageAndOffset.offset()) +
": "
+
new
String(bytes,
"UTF-8"
));
numRead++;
a_maxReads--;
}
if
(numRead ==
0
) {
try
{
Thread.sleep(
1000
);
}
catch
(InterruptedException ie) {
}
}
}
if
(consumer !=
null
) consumer.close();
}
public
static
long
getLastOffset(SimpleConsumer consumer, String topic,
int
partition,
long
whichTime, String clientName) {
TopicAndPartition topicAndPartition =
new
TopicAndPartition(topic, partition);
Map
new
HashMap
requestInfo.put(topicAndPartition,
new
PartitionOffsetRequestInfo(whichTime,
1
));
kafka.javaapi.OffsetRequest request =
new
kafka.javaapi.OffsetRequest(
requestInfo, kafka.api.OffsetRequest.CurrentVersion(), clientName);
OffsetResponse response = consumer.getOffsetsBefore(request);
if
(response.hasError()) {
System.out.println(
"Error fetching data Offset Data the Broker. Reason: "
+ response.errorCode(topic, partition) );
return
0
;
}
long
[] offsets = response.offsets(topic, partition);
return
offsets[
0
];
}
private
String findNewLeader(String a_oldLeader, String a_topic,
int
a_partition,
int
a_port)
throws
Exception {
for
(
int
i =
0
; i <
3
; i++) {
boolean
goToSleep =
false
;
PartitionMetadata metadata = findLeader(m_replicaBrokers, a_port, a_topic, a_partition);
if
(metadata ==
null
) {
goToSleep =
true
;
}
else
if
(metadata.leader() ==
null
) {
goToSleep =
true
;
}
else
if
(a_oldLeader.equalsIgnoreCase(metadata.leader().host()) && i ==
0
) {
// first time through if the leader hasn't changed give ZooKeeper a second to recover
// second time, assume the broker did recover before failover, or it was a non-Broker issue
//
goToSleep =
true
;
}
else
{
return
metadata.leader().host();
}
if
(goToSleep) {
try
{
Thread.sleep(
1000
);
}
catch
(InterruptedException ie) {
}
}
}
System.out.println(
"Unable to find new leader after Broker failure. Exiting"
);
throw
new
Exception(
"Unable to find new leader after Broker failure. Exiting"
);
}
private
PartitionMetadata findLeader(List
int
a_port, String a_topic,
int
a_partition) {
PartitionMetadata returnMetaData =
null
;
loop:
for
(String seed : a_seedBrokers) {
SimpleConsumer consumer =
null
;
try
{
consumer =
new
SimpleConsumer(seed, a_port,
100000
,
64
*
1024
,
"leaderLookup"
);
List
TopicMetadataRequest req =
new
TopicMetadataRequest(topics);
kafka.javaapi.TopicMetadataResponse resp = consumer.send(req);
List
for
(TopicMetadata item : metaData) {
for
(PartitionMetadata part : item.partitionsMetadata()) {
if
(part.partitionId() == a_partition) {
returnMetaData = part;
break
loop;
}
}
}
}
catch
(Exception e) {
System.out.println(
"Error communicating with Broker ["
+ seed +
"] to find Leader for ["
+ a_topic
+
", "
+ a_partition +
"] Reason: "
+ e);
}
finally
{
if
(consumer !=
null
) consumer.close();
}
}
if
(returnMetaData !=
null
) {
m_replicaBrokers.clear();
for
(kafka.cluster.Broker replica : returnMetaData.replicas()) {
m_replicaBrokers.add(replica.host());
}
}
return
returnMetaData;
}
}
|