1、本地运行出错及解决办法
当运行如下命令时:
./bin/spark-submit \
--class org.apache.spark.examples.mllib.JavaALS \
--master local[*] \
/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar \
/user/data/netflix_rating 10 10 /user/data/result
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:657)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:389)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
at org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546)
at org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:145)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:168)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.mllib.recommendation.ALS.run(ALS.scala:167)
at org.apache.spark.mllib.recommendation.ALS$.train(ALS.scala:599)
at org.apache.spark.mllib.recommendation.ALS.train(ALS.scala)
at org.apache.spark.examples.mllib.JavaALS.main(JavaALS.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2385)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:653)
... 34 more
正确的执行方式如下:
./bin/spark-submit \
--class org.apache.spark.examples.mllib.JavaALS \
--driver-class-path /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-hdfs/hadoop-hdfs-2.3.0-cdh5.1.2.jar \
--master local[*] \
/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar \
/user/data/netflix_rating 10 10 /user/data/result
或者
./bin/spark-submit \
--class org.apache.spark.examples.mllib.JavaALS \
--jars /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-hdfs/hadoop-hdfs-2.3.0-cdh5.1.2.jar \
--master local[*] \
/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar \
/user/data/netflix_rating 10 10 /user/data/result
当运行如下命令时:
./bin/spark-submit \
--class org.apache.spark.examples.mllib.JavaALS \
--master yarn-cluster \
/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar \
/user/data/netflix_rating 10 10 /user/data/result
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/client/api/impl/YarnClientImpl
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.spark.util.Utils$$anonfun$classIsLoadable$1.apply(Utils.scala:143)
at org.apache.spark.util.Utils$$anonfun$classIsLoadable$1.apply(Utils.scala:143)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.util.Utils$.classIsLoadable(Utils.scala:143)
at org.apache.spark.deploy.SparkSubmit$.createLaunchEnv(SparkSubmit.scala:158)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:54)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 21 more
正确的执行方式如下:
./bin/spark-submit \
--class org.apache.spark.examples.mllib.JavaALS \
--master yarn-cluster \
--driver-class-path $(echo /opt/cloudera/parcels/CDH/lib/hadoop-yarn/*.jar |sed 's/ /:/g'):/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-hdfs/hadoop-hdfs-2.3.0-cdh5.1.2.jar \
/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar \
/user/data/netflix_rating 10 10 /user/data/result
执行结果集如下图所示
/user/data/result/productFeatures/part-00000数据格式为:
3、对于spark on hadoop执行中出行的问题需要去hadoop yarn对应的job日志中去看