用IDEA开发spark应用,发生java.lang.ClassNotFoundException的解决

用IDEA开发spark应用的配置方法。在完成集群配置之后,我写了下面的demo进行测试

        如果把“spark://master:7077”变为local[2]就能正常运行,但是修改为spark集群就报错

demo案例如下:

package com.keduox

import org.apache.spark.{SparkConf, SparkContext}

object Spark1 {

  def main(args: Array[String]): Unit = {
    //为spark取一个名字,和主机master
    val conf = new SparkConf().setAppName("helloscala").setMaster("spark://master:7077")
    val sc = new SparkContext(conf)
    sc.textFile("hdfs://master:9000/order.txt").flatMap(_.split("\t")).
    //输出格式为(zs,1)
      map((_,1)).reduceByKey((x,y)=>(x+y)).saveAsTextFile("hdfs://master:9000/order")
//    map((_,1)).reduceByKey(_+_).foreach(println(_))
    //输出格式为例如:zs 3
//      map((_,1)).reduceByKey(_+_).foreach(x=>println(x._1+" "+x._2))
    sc.stop()
  }
}

        出现问题现象:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/05/06 01:59:33 INFO SparkContext: Running Spark version 1.6.3
18/05/06 01:59:35 INFO SecurityManager: Changing view acls to: Administrator
18/05/06 01:59:35 INFO SecurityManager: Changing modify acls to: Administrator
18/05/06 01:59:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); users with modify permissions: Set(Administrator)
18/05/06 01:59:37 INFO Utils: Successfully started service 'sparkDriver' on port 55343.
18/05/06 01:59:38 INFO Slf4jLogger: Slf4jLogger started
18/05/06 01:59:38 INFO Remoting: Starting remoting
18/05/06 01:59:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:55356]
18/05/06 01:59:38 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 55356.
18/05/06 01:59:38 INFO SparkEnv: Registering MapOutputTracker
18/05/06 01:59:38 INFO SparkEnv: Registering BlockManagerMaster
18/05/06 01:59:38 INFO DiskBlockManager: Created local directory at C:\Users\Administrator\AppData\Local\Temp\blockmgr-f8318643-18b8-4412-a128-bf5741bbf6f7
18/05/06 01:59:38 INFO MemoryStore: MemoryStore started with capacity 1773.8 MB
18/05/06 01:59:39 INFO SparkEnv: Registering OutputCommitCoordinator
18/05/06 01:59:39 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/05/06 01:59:39 INFO SparkUI: Started SparkUI at http://192.168.200.1:4040
18/05/06 01:59:39 INFO HttpFileServer: HTTP File server directory is C:\Users\Administrator\AppData\Local\Temp\spark-c16321b9-22c4-4c82-ab82-b5ff12cea204\httpd-89e0b194-3699-491f-94e8-a72ec70df061
18/05/06 01:59:39 INFO HttpServer: Starting HTTP Server
18/05/06 01:59:39 INFO Utils: Successfully started service 'HTTP file server' on port 55359.
18/05/06 01:59:39 ERROR SparkContext: Jar not found at hhhh
18/05/06 01:59:40 INFO AppClient$ClientEndpoint: Connecting to master spark://master:7077...
18/05/06 01:59:40 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20180506015940-0012
18/05/06 01:59:40 INFO AppClient$ClientEndpoint: Executor added: app-20180506015940-0012/0 on worker-20180506011231-192.168.200.200-37034 (192.168.200.200:37034) with 1 cores
18/05/06 01:59:40 INFO SparkDeploySchedulerBackend: Granted executor ID app-20180506015940-0012/0 on hostPort 192.168.200.200:37034 with 1 cores, 1024.0 MB RAM
18/05/06 01:59:40 INFO AppClient$ClientEndpoint: Executor added: app-20180506015940-0012/1 on worker-20180506011223-192.168.200.201-36449 (192.168.200.201:36449) with 1 cores
18/05/06 01:59:40 INFO SparkDeploySchedulerBackend: Granted executor ID app-20180506015940-0012/1 on hostPort 192.168.200.201:36449 with 1 cores, 1024.0 MB RAM
18/05/06 01:59:40 INFO AppClient$ClientEndpoint: Executor added: app-20180506015940-0012/2 on worker-20180506011224-192.168.200.202-34934 (192.168.200.202:34934) with 1 cores
18/05/06 01:59:40 INFO SparkDeploySchedulerBackend: Granted executor ID app-20180506015940-0012/2 on hostPort 192.168.200.202:34934 with 1 cores, 1024.0 MB RAM
18/05/06 01:59:40 INFO AppClient$ClientEndpoint: Executor updated: app-20180506015940-0012/1 is now RUNNING
18/05/06 01:59:40 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 55377.
18/05/06 01:59:40 INFO NettyBlockTransferService: Server created on 55377
Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1922)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1209)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1154)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1154)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1154)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1060)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1026)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:952)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:951)
	at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1457)
	at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1436)
	at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1436)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1436)
	at com.keduox.Spark1$.main(Spark1.scala:13)
	at com.keduox.Spark1.main(Spark1.scala)
Caused by: java.lang.ClassNotFoundException: com.keduox.Spark1$$anonfun$main$2
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:278)
	at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
18/05/06 01:59:53 INFO SparkContext: Invoking stop() from shutdown hook
18/05/06 01:59:53 INFO SparkUI: Stopped Spark web UI at http://192.168.200.1:4040
18/05/06 01:59:53 INFO SparkDeploySchedulerBackend: Shutting down all executors
18/05/06 01:59:53 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
18/05/06 01:59:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/05/06 01:59:53 INFO MemoryStore: MemoryStore cleared
18/05/06 01:59:53 INFO BlockManager: BlockManager stopped
18/05/06 01:59:53 INFO BlockManagerMaster: BlockManagerMaster stopped
18/05/06 01:59:53 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/05/06 01:59:53 INFO SparkContext: Successfully stopped SparkContext
18/05/06 01:59:53 INFO ShutdownHookManager: Shutdown hook called
18/05/06 01:59:53 INFO ShutdownHookManager: Deleting directory C:\Users\Administrator\AppData\Local\Temp\spark-c16321b9-22c4-4c82-ab82-b5ff12cea204
18/05/06 01:59:53 INFO ShutdownHookManager: Deleting directory C:\Users\Administrator\AppData\Local\Temp\spark-c16321b9-22c4-4c82-ab82-b5ff12cea204\httpd-89e0b194-3699-491f-94e8-a72ec70df061
18/05/06 01:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
18/05/06 01:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
18/05/06 01:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.


            解决办法:在conf后添加上该项目的jar包全路径,源文件和输出文件都最好保存在hdfs,因为分布式spark集群中,任何一台都有可能分配到任务,所以输出到磁盘,会出现问题。

 val conf = new SparkConf().setAppName("helloscala").setMaster("spark://master:7077").setJars(Array("F:\\java\\workspace\\bigdata\\sparkcore02\\target\\sparkcore02-1.0-SNAPSHOT.jar"))


你可能感兴趣的:(Spark)