Kafka-Spark Streaming 异常: dead for group td_topic_advert_impress_blacklist

 

最近在编写Spark Streaming 作业的时候,遇到了一个比较奇怪的问题,表现如下:

 

在本地连接Kafka 集群执行作业:

18/10/31 17:42:58 INFO AbstractCoordinator: Discovered coordinator kafka1:9092 (id: 2147483574 rack: null) for group td_topic_advert_impress_blacklist.
18/10/31 17:43:00 INFO AbstractCoordinator: Marking the coordinator kafka1:9092 (id: 2147483574 rack: null) dead for group td_topic_advert_impress_blacklist

18/10/31 17:42:58 INFO AbstractCoordinator: Discovered coordinator kafka1:9092 (id: 2147483574 rack: null) for group td_topic_advert_impress_blacklist.
18/10/31 17:43:00 INFO AbstractCoordinator: Marking the coordinator kafka1:9092 (id: 2147483574 rack: null) dead for group td_topic_advert_impress_blacklist

 

仅从日志我们极难定位问题的原因。经过了一大堆查找,baidu, github..

我将spark-streaming 的日志级别 从 INFO 换成了 DEBUG 终于找到了问题的原因 ,

 

设置日志级别的代码如下

        JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
        javaSparkContext.setLogLevel("DEBUG");

 

出现的问题:

18/10/31 17:40:08 DEBUG KafkaConsumer: Starting the Kafka consumer
18/10/31 17:40:08 DEBUG Metadata: Updated cluster metadata version 1 to Cluster(id = null, nodes = [10.170.0.8:9092 (id: -3 rack: null), 10.170.0.7:9092 (id: -2 rack: null), 10.170.0.6:9092 (id: -1 rack: null)], partitions = [])
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name fetch-throttle-time
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name connections-closed:
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name connections-created:
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name bytes-sent-received:
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name bytes-sent:
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name bytes-received:
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name select-time:
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name io-time:
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name heartbeat-latency
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name join-latency
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name sync-latency
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name commit-latency
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name bytes-fetched
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name records-fetched
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name fetch-latency
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name records-lag
18/10/31 17:40:08 INFO AppInfoParser: Kafka version : 0.11.0.0
18/10/31 17:40:08 INFO AppInfoParser: Kafka commitId : cb8625948210849f
18/10/31 17:40:08 DEBUG KafkaConsumer: Kafka consumer created
18/10/31 17:40:08 DEBUG KafkaConsumer: Subscribed to topic(s): topic_advert_impress
18/10/31 17:40:08 DEBUG AbstractCoordinator: Sending GroupCoordinator request for group td_topic_advert_impress_blacklist to broker 10.170.0.6:9092 (id: -1 rack: null)
18/10/31 17:40:08 DEBUG NetworkClient: Initiating connection to node -1 at 10.170.0.6:9092.
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name node--1.bytes-sent
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name node--1.bytes-received
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name node--1.latency
18/10/31 17:40:08 DEBUG Selector: Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node -1
18/10/31 17:40:08 DEBUG NetworkClient: Completed connection to node -1.  Fetching API versions.
18/10/31 17:40:08 DEBUG NetworkClient: Initiating API versions fetch from node -1.
18/10/31 17:40:08 DEBUG NetworkClient: Initialize connection to node -3 for sending metadata request
18/10/31 17:40:08 DEBUG NetworkClient: Initiating connection to node -3 at 10.170.0.8:9092.
18/10/31 17:40:08 DEBUG NetworkClient: Recorded API versions for node -1: (Produce(0): 0 to 3 [usable: 3], Fetch(1): 0 to 5 [usable: 5], Offsets(2): 0 to 2 [usable: 2], Metadata(3): 0 to 4 [usable: 4], LeaderAndIsr(4): 0 [usable: 0], StopReplica(5): 0 [usable: 0], UpdateMetadata(6): 0 to 3 [usable: 3], ControlledShutdown(7): 1 [usable: 1], OffsetCommit(8): 0 to 3 [usable: 3], OffsetFetch(9): 0 to 3 [usable: 3], FindCoordinator(10): 0 to 1 [usable: 1], JoinGroup(11): 0 to 2 [usable: 2], Heartbeat(12): 0 to 1 [usable: 1], LeaveGroup(13): 0 to 1 [usable: 1], SyncGroup(14): 0 to 1 [usable: 1], DescribeGroups(15): 0 to 1 [usable: 1], ListGroups(16): 0 to 1 [usable: 1], SaslHandshake(17): 0 [usable: 0], ApiVersions(18): 0 to 1 [usable: 1], CreateTopics(19): 0 to 2 [usable: 2], DeleteTopics(20): 0 to 1 [usable: 1], DeleteRecords(21): 0 [usable: 0], InitProducerId(22): 0 [usable: 0], OffsetForLeaderEpoch(23): 0 [usable: 0], AddPartitionsToTxn(24): 0 [usable: 0], AddOffsetsToTxn(25): 0 [usable: 0], EndTxn(26): 0 [usable: 0], WriteTxnMarkers(27): 0 [usable: 0], TxnOffsetCommit(28): 0 [usable: 0], DescribeAcls(29): 0 [usable: 0], CreateAcls(30): 0 [usable: 0], DeleteAcls(31): 0 [usable: 0], DescribeConfigs(32): 0 [usable: 0], AlterConfigs(33): 0 [usable: 0])
18/10/31 17:40:08 DEBUG NetworkClient: Sending metadata request (type=MetadataRequest, topics=topic_advert_impress) to node -1
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name node--3.bytes-sent
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name node--3.bytes-received
18/10/31 17:40:08 DEBUG Metrics: Added sensor with name node--3.latency
18/10/31 17:40:08 DEBUG Selector: Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node -3
18/10/31 17:40:08 DEBUG NetworkClient: Completed connection to node -3.  Fetching API versions.
18/10/31 17:40:08 DEBUG NetworkClient: Initiating API versions fetch from node -3.
18/10/31 17:40:08 DEBUG Metadata: Updated cluster metadata version 2 to Cluster(id = -dsyi_nCTMGfFcaeYx-9Wg, nodes = [kafka3:9092 (id: 72 rack: null), kafka2:9092 (id: 71 rack: null), kafka1:9092 (id: 73 rack: null)], partitions = [Partition(topic = topic_advert_impress, partition = 3, leader = 72, replicas = [72,71,73], isr = [71,72,73]), Partition(topic = topic_advert_impress, partition = 2, leader = 71, replicas = [71,72,73], isr = [71,72,73]), Partition(topic = topic_advert_impress, partition = 5, leader = 71, replicas = [71,73,72], isr = [71,72,73]), Partition(topic = topic_advert_impress, partition = 4, leader = 73, replicas = [73,72,71], isr = [71,72,73]), Partition(topic = topic_advert_impress, partition = 1, leader = 73, replicas = [73,71,72], isr = [71,72,73]), Partition(topic = topic_advert_impress, partition = 0, leader = 72, replicas = [72,73,71], isr = [71,72,73]), Partition(topic = topic_advert_impress, partition = 7, leader = 73, replicas = [73,71,72], isr = [71,72,73]), Partition(topic = topic_advert_impress, partition = 6, leader = 72, replicas = [72,73,71], isr = [71,72,73]), Partition(topic = topic_advert_impress, partition = 8, leader = 71, replicas = [71,72,73], isr = [71,72,73])])
18/10/31 17:40:08 DEBUG NetworkClient: Recorded API versions for node -3: (Produce(0): 0 to 3 [usable: 3], Fetch(1): 0 to 5 [usable: 5], Offsets(2): 0 to 2 [usable: 2], Metadata(3): 0 to 4 [usable: 4], LeaderAndIsr(4): 0 [usable: 0], StopReplica(5): 0 [usable: 0], UpdateMetadata(6): 0 to 3 [usable: 3], ControlledShutdown(7): 1 [usable: 1], OffsetCommit(8): 0 to 3 [usable: 3], OffsetFetch(9): 0 to 3 [usable: 3], FindCoordinator(10): 0 to 1 [usable: 1], JoinGroup(11): 0 to 2 [usable: 2], Heartbeat(12): 0 to 1 [usable: 1], LeaveGroup(13): 0 to 1 [usable: 1], SyncGroup(14): 0 to 1 [usable: 1], DescribeGroups(15): 0 to 1 [usable: 1], ListGroups(16): 0 to 1 [usable: 1], SaslHandshake(17): 0 [usable: 0], ApiVersions(18): 0 to 1 [usable: 1], CreateTopics(19): 0 to 2 [usable: 2], DeleteTopics(20): 0 to 1 [usable: 1], DeleteRecords(21): 0 [usable: 0], InitProducerId(22): 0 [usable: 0], OffsetForLeaderEpoch(23): 0 [usable: 0], AddPartitionsToTxn(24): 0 [usable: 0], AddOffsetsToTxn(25): 0 [usable: 0], EndTxn(26): 0 [usable: 0], WriteTxnMarkers(27): 0 [usable: 0], TxnOffsetCommit(28): 0 [usable: 0], DescribeAcls(29): 0 [usable: 0], CreateAcls(30): 0 [usable: 0], DeleteAcls(31): 0 [usable: 0], DescribeConfigs(32): 0 [usable: 0], AlterConfigs(33): 0 [usable: 0])
18/10/31 17:40:08 DEBUG AbstractCoordinator: Received GroupCoordinator response ClientResponse(receivedTimeMs=1540978808909, latencyMs=125, disconnected=false, requestHeader={api_key=10,api_version=1,correlation_id=0,client_id=consumer-1}, responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=NONE, node=kafka1:9092 (id: 73 rack: null))) for group td_topic_advert_impress_blacklist
18/10/31 17:40:08 INFO AbstractCoordinator: Discovered coordinator kafka1:9092 (id: 2147483574 rack: null) for group td_topic_advert_impress_blacklist.
18/10/31 17:40:08 DEBUG NetworkClient: Initiating connection to node 2147483574 at kafka1:9092.
18/10/31 17:40:20 DEBUG NetworkClient: Error connecting to node 2147483574 at kafka1:9092:
java.io.IOException: Can't resolve address: kafka1:9092
	at org.apache.kafka.common.network.Selector.connect(Selector.java:195)
	at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:762)
	at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:224)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.tryConnect(ConsumerNetworkClient.java:462)
	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$GroupCoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:598)
	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$GroupCoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:579)
	at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
	at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
	at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:488)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:348)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:208)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:184)
	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:214)
	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:200)
	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286)
	at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
	at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
	at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.paranoidPoll(DirectKafkaInputDStream.scala:165)
	at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.start(DirectKafkaInputDStream.scala:243)
	at org.apache.spark.streaming.DStreamGraph$$anonfun$start$7.apply(DStreamGraph.scala:54)
	at org.apache.spark.streaming.DStreamGraph$$anonfun$start$7.apply(DStreamGraph.scala:54)
	at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach_quick(ParArray.scala:143)
	at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:136)
	at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:972)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
	at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
	at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:969)
	at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
	at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.nio.channels.UnresolvedAddressException
	at sun.nio.ch.Net.checkAddress(Net.java:123)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
	at org.apache.kafka.common.network.Selector.connect(Selector.java:192)
	... 37 more
18/10/31 17:40:20 INFO AbstractCoordinator: Marking the coordinator kafka1:9092 (id: 2147483574 rack: null) dead for group td_topic_advert_impress_blacklist
18/10/31 17:40:20 DEBUG AbstractCoordinator: Sending GroupCoordinator request for group td_topic_advert_impress_blacklist to broker kafka3:9092 (id: 72 rack: null)

 

这样我们就定位到了最终的问题:

java.io.IOException: Can't resolve address: kafka1:9092
	at org.apache.kafka.common.network.Selector.connect(Selector.java:195)
	at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:762)
	at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:224)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.tryConnect(Co

 

可以看到就是域名不解析的问题:

我们只需要修改 windows 下的  host 文件即可

 

你可能感兴趣的:(Spark,Kafka)