1,Exceptionin thread "main"org.apache.hadoop.ipc.RemoteException(java.io.IOException): File/user/pangying/.sparkStaging/application_1522735609126_0001/__spark_libs__4275647205298765018.zipcould only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) runningand no node(s) are excluded in this
2,RDD不能作为广播变量Broadcast
3,spark 序列化问题
4,Application application_1525314251630_0005failed 2 times due to AM Container for appattempt_1525314251630_0005_000002exited with exitCode: 13
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/pangying/.sparkStaging/application_1522735609126_0001/__spark_libs__4275647205298765018.zip could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1728)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2515)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:828)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:507)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486)
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1455)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1251)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
解决方案:
1,造成这种问题的原因猜测可能是之前运行spark上传的文件与当前的hadoop环境版本不兼容导致(只是看到网上是这样说的,有待核实)。解决思路是清楚HDFS name文件下的数据,并格式化namenode。具体操作如下:
a, 关闭hadoop目前的所有进程 stop-all.sh
b, 删除dfs/name 中的所有文件 hdfs dfs rm -r xx/dfs/name/
c, 格式化namenode hdfs namenode -format
d, 重新启动hadoop就可以了
参考文献: https://stackoverflow.com/questions/15571584/writing-to-hdfs-could-only-be-replicated-to-0-nodes-instead-of-minreplication
18/04/18 19:57:39 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, slave2, executor 1): org.apache.spark.SparkException: This RDD lacks a SparkContext. It could happen in the following cases:
(1) RDD transformations and actions are NOT invoked by the driver, but inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
(2) When a Spark Streaming job recovers from checkpoint, this exception will be hit if a reference to an RDD not defined by the streaming job is used in DStream operations. For more information, See SPARK-13758.
at org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$sc(RDD.scala:89)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.toLocalIterator(RDD.scala:948)
at org.apache.spark.api.java.JavaRDDLike$class.toLocalIterator(JavaRDDLike.scala:369)
at org.apache.spark.api.java.AbstractJavaRDDLike.toLocalIterator(JavaRDDLike.scala:45)
at com.py.sparklearn.example.ROSWEKA$3.call(ROSWEKA.java:123)
at com.py.sparklearn.example.ROSWEKA$3.call(ROSWEKA.java:1)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
这个问题的原因在于RDD发算子中不能再操作其他的RDD,所以一定要使用其他RDD中的数据,可以考虑将其他RDD转成别的类型,再进行广播,比如下面的代码就是讲JavaRDD
List trainPositiveList = trainPositiveRDD.collect();
final Broadcast> trainPosititive_broadcast = sc.broadcast(trainPositiveList);
18/04/19 10:18:39 ERROR yarn.ApplicationMaster: User class threw exception: com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
m_Info (weka.classifiers.trees.RandomTree)
m_Classifiers (weka.classifiers.trees.RandomForest)
underlying (scala.collection.convert.Wrappers$SeqWrapper)
com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
m_Info (weka.classifiers.trees.RandomTree)
m_Classifiers (weka.classifiers.trees.RandomForest)
underlying (scala.collection.convert.Wrappers$SeqWrapper)
at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:101)
at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:366)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:307)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:366)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:307)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at com.twitter.chill.WrappedArraySerializer.write(WrappedArraySerializer.scala:29)
at com.twitter.chill.WrappedArraySerializer.write(WrappedArraySerializer.scala:23)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$blockifyObject$2.apply(TorrentBroadcast.scala:268)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$blockifyObject$2.apply(TorrentBroadcast.scala:268)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1303)
at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:269)
at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:126)
at org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:88)
at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:56)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1411)
at org.apache.spark.api.java.JavaSparkContext.broadcast(JavaSparkContext.scala:650)
at com.py.sparklearn.example.ROSWEKACombine.main(ROSWEKACombine.java:294)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
Caused by: java.lang.NullPointerException
at weka.core.Instances.size(Instances.java:1016)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:83)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
... 34 more
这是序列输出问题引起的错误,spark采用kryo序列化方式比JavaSerializer方式更快,但是在我用的spark-2.1.0的版本上会报错。解决方案有两种:
1.在spark-defaults.conf中替换序列化方式
2.在程序中替换
第二种方式的代码如下:
SparkConf conf = new SparkConf().setAppName("ROSWEKA").set("spark.serializer", "org.apache.spark.serializer.JavaSerializer");
参考文献:关于spark运行FP-growth算法报错com.esotericsoftware.kryo.KryoException
问题描述:
18/05/03 10:47:03 INFO yarn.Client: Application report for application_1525314251630_0005 (state: FAILED)
18/05/03 10:47:03 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1525314251630_0005 failed 2 times due to AM Container for appattempt_1525314251630_0005_000002 exited with exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1525314251630_0005_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
at org.apache.hadoop.util.Shell.run(Shell.java:869)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://master:8088/cluster/app/application_1525314251630_0005 Then click on links to logs of each attempt.
. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1525315336531
final status: FAILED
tracking URL: http://master:8088/cluster/app/application_1525314251630_0005
user: pangying
Exception in thread "main" org.apache.spark.SparkException: Application application_1525314251630_0005 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1167)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1213)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/05/03 10:47:04 INFO util.ShutdownHookManager: Shutdown hook called
18/05/03 10:47:04 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-e82e4e6e-35d4-4dd8-b9c5-0db0337005a8
解决方案:
虽然报了错,但是程序其实是可以执行完成的。出现这个问题的原因是我再打包的时候忘记删掉.setMaster("local[*]"),所以要解决这个问题只需要删除相关代码,再重新打包就可以了。