1、Exception in thread "main" org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:org.apache.spark.SparkContext.
解决方法:我这边是因为自己的原因创建了多个sparkContext,我将创建sparkContext的代码修改为只创建一次,程序就可以执行了。
2、Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
解决方法:在jar包的core-default.xml中添加如下
fs.hdfs.impl
org.apache.hadoop.hdfs.DistributedFileSystem
The FileSystem for hdfs: uris.
fs.file.impl
org.apache.hadoop.fs.LocalFileSystem
The FileSystem for hdfs: uris.
3、打包到服务器运行之后出现:
解决方法:删除jar包中如下文件即可
sudo zip -d xxx.jar META-INF/*.RSA META-INF/*.DSA META-INF/*.SF
4、使用spark2-submit提交任务的之后出现任务一直阻塞
解决方法1,减少每秒处理条数的个数:
val sparkConf = new SparkConf()
.setAppName("KafkaReceiver")
//反压
.set("spark.streaming.backpressure.enabled", "true")
//每秒钟最多处理条数
.set("spark.streaming.kafka.maxRatePerPartition", "1000")
解决方法2,使用多线程(推荐使用此方法):
message.foreachRDD(rdd => {
if (!rdd.isEmpty()) {
rdd.foreachPartition(iter => {
val pool = Executors.newCachedThreadPool()
pool.execute(new Runnable {
override def run(): Unit = {
//程序代码逻辑
}
})
})
}
})
5、submit jar包到yarn时出现
Exception in thread "main" java.lang.IllegalStateException: No current assignment for partition canal_topic-0
Exception in thread "main" java.lang.IllegalStateException: No current assignment for partition canal_topic-0
at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:259)
at org.apache.kafka.clients.consumer.internals.SubscriptionState.seek(SubscriptionState.java:264)
at org.apache.kafka.clients.consumer.KafkaConsumer.seek(KafkaConsumer.java:1501)
at org.apache.spark.streaming.kafka010.Subscribe.$anonfun$onStart$2(ConsumerStrategy.scala:107)
at org.apache.spark.streaming.kafka010.Subscribe.$anonfun$onStart$2$adapted(ConsumerStrategy.scala:106)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.streaming.kafka010.Subscribe.onStart(ConsumerStrategy.scala:106)
at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.consumer(DirectKafkaInputDStream.scala:73)
at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.start(DirectKafkaInputDStream.scala:259)
at org.apache.spark.streaming.DStreamGraph.$anonfun$start$7(DStreamGraph.scala:54)
at org.apache.spark.streaming.DStreamGraph.$anonfun$start$7$adapted(DStreamGraph.scala:54)
at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:145)
at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:974)
at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:971)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:153)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
at ... run in separate thread using org.apache.spark.util.ThreadUtils ... ()
at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:578)
at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:572)
at com.gumi.KafkaDirect$.main(KafkaDirect.scala:101)
at com.gumi.KafkaDirect.main(KafkaDirect.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
错误原因:因为我在集群上跑着消费程序,又开启了同一个groupID的出现,结果就出现了同一个 groupID 在同一时刻多次消费同一个 topic,引发 offset 记录问题
6、执行程序脚本出现
org.apache.spark.sql.AnalysisException: java.lang.NoSuchMethodError: org.apache.hadoop.hive.conf.HiveConf.getTimeVar(Lorg/apache/hadoop/hive/conf/HiveConf$ConfVars;Ljava/util/concurrent/TimeUnit;)J;
错误原因:版本冲突
解决方法:注释我出现中的maven依赖
org.apache.spark
spark-hive_${scala.v}
${spark.version}
org.spark-project.hive
hive-jdbc
0.13.1