大数据常见错误(持续更新中)

1、Exception in thread "main" org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:org.apache.spark.SparkContext.

解决方法:我这边是因为自己的原因创建了多个sparkContext,我将创建sparkContext的代码修改为只创建一次,程序就可以执行了。

2、Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs

解决方法:在jar包的core-default.xml中添加如下


        fs.hdfs.impl
        org.apache.hadoop.hdfs.DistributedFileSystem
        The FileSystem for hdfs: uris.


        fs.file.impl
        org.apache.hadoop.fs.LocalFileSystem
        The FileSystem for hdfs: uris.

3、打包到服务器运行之后出现:

大数据常见错误(持续更新中)_第1张图片

解决方法:删除jar包中如下文件即可

sudo zip -d xxx.jar META-INF/*.RSA META-INF/*.DSA META-INF/*.SF

4、使用spark2-submit提交任务的之后出现任务一直阻塞

大数据常见错误(持续更新中)_第2张图片

解决方法1,减少每秒处理条数的个数:

val sparkConf = new SparkConf()
                .setAppName("KafkaReceiver")
                //反压
                .set("spark.streaming.backpressure.enabled", "true")
                //每秒钟最多处理条数
                .set("spark.streaming.kafka.maxRatePerPartition", "1000")

解决方法2,使用多线程(推荐使用此方法):

message.foreachRDD(rdd => {
    if (!rdd.isEmpty()) {
        rdd.foreachPartition(iter => {
            val pool = Executors.newCachedThreadPool()
            pool.execute(new Runnable {
                override def run(): Unit = {
                    //程序代码逻辑
                }
            })
        })
    }
})

5、submit jar包到yarn时出现

Exception in thread "main" java.lang.IllegalStateException: No current assignment for partition canal_topic-0
 

Exception in thread "main" java.lang.IllegalStateException: No current assignment for partition canal_topic-0
        at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:259)
        at org.apache.kafka.clients.consumer.internals.SubscriptionState.seek(SubscriptionState.java:264)
        at org.apache.kafka.clients.consumer.KafkaConsumer.seek(KafkaConsumer.java:1501)
        at org.apache.spark.streaming.kafka010.Subscribe.$anonfun$onStart$2(ConsumerStrategy.scala:107)
        at org.apache.spark.streaming.kafka010.Subscribe.$anonfun$onStart$2$adapted(ConsumerStrategy.scala:106)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at org.apache.spark.streaming.kafka010.Subscribe.onStart(ConsumerStrategy.scala:106)
        at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.consumer(DirectKafkaInputDStream.scala:73)
        at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.start(DirectKafkaInputDStream.scala:259)
        at org.apache.spark.streaming.DStreamGraph.$anonfun$start$7(DStreamGraph.scala:54)
        at org.apache.spark.streaming.DStreamGraph.$anonfun$start$7$adapted(DStreamGraph.scala:54)
        at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:145)
        at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:974)
        at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
        at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
        at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
        at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:971)
        at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:153)
        at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
        at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
        at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
        at ... run in separate thread using org.apache.spark.util.ThreadUtils ... ()
        at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:578)
        at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:572)
        at com.gumi.KafkaDirect$.main(KafkaDirect.scala:101)
        at com.gumi.KafkaDirect.main(KafkaDirect.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

错误原因:因为我在集群上跑着消费程序,又开启了同一个groupID的出现,结果就出现了同一个 groupID 在同一时刻多次消费同一个 topic,引发 offset 记录问题

6、执行程序脚本出现

org.apache.spark.sql.AnalysisException: java.lang.NoSuchMethodError: org.apache.hadoop.hive.conf.HiveConf.getTimeVar(Lorg/apache/hadoop/hive/conf/HiveConf$ConfVars;Ljava/util/concurrent/TimeUnit;)J;

错误原因:版本冲突

解决方法:注释我出现中的maven依赖


    org.apache.spark
    spark-hive_${scala.v}
    ${spark.version}



    org.spark-project.hive
    hive-jdbc
    0.13.1

 

你可能感兴趣的:(大数据)