更多代码请见:https://github.com/xubo245/SparkLearning
Spark生态之Alluxio学习 版本:alluxio(tachyon) 0.7.1,spark-1.5.2,hadoop-2.6.0
当scala通过JNI调用c时,使用spark-submit提交时,会出现错误:
no JNIparasail in java.library.path
或者
ERROR TaskSchedulerImpl: Lost executor 6 on Mcnode5: remote Rpc client disassociated
脚本:
hadoop@Master:~/disk2/xubo/project/alignment/SparkSW/SparkSW20161114/alluxio-1.3.0$ cat testDSW2timequery.sh
#!/bin/bash
#sbt clean
#sbt package
#/home/zgg/lib/spark-1.0.1-bin-hadoop2/bin/spark-submit \
spark-submit \
--class "org.dsa.time.DSW2QueryTime" \
--conf "spark.executor.extraJavaOptions=-Djava.library.path=/home/hadoop/disk2/xubo/lib" \
--master spark://Master:7077 \
--executor-memory 8G \
DSA.jar
hadoop@Master:~/disk2/xubo/project/alignment/SparkSW/SparkSW20161114/alluxio-1.3.0$ ./testDSW2timequery.sh
alluxio://Master:19998/xubo/project/SparkSW/BLOSUM50.txt alluxio://Master:19998/xubo/project/SparkSW/input/query/D0DP18691.file alluxio://Master:19998/xubo/project/SparkSW/input/Luniref/DL8Line.fasta 128 1 5
Exception in thread "main" java.lang.UnsatisfiedLinkError: no JNIparasail in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
at java.lang.Runtime.loadLibrary0(Runtime.java:849)
at java.lang.System.loadLibrary(System.java:1088)
at parasail.Matrix.(Matrix.java:9)
at org.dsa.core.DSW2.align(DSW2.scala:30)
at org.dsa.core.SequenceAlignment$$anonfun$run$1.apply(SequenceAlignment.scala:33)
at org.dsa.core.SequenceAlignment$$anonfun$run$1.apply(SequenceAlignment.scala:32)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.dsa.core.SequenceAlignment.run(SequenceAlignment.scala:32)
at org.dsa.core.DSW2$.main(DSW2.scala:130)
at org.dsa.time.DSW2QueryTime$$anonfun$main$1$$anonfun$apply$mcVI$sp$1.apply$mcVI$sp(DSW2QueryTime.scala:19)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.dsa.time.DSW2QueryTime$$anonfun$main$1.apply$mcVI$sp(DSW2QueryTime.scala:14)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.dsa.time.DSW2QueryTime$.main(DSW2QueryTime.scala:13)
at org.dsa.time.DSW2QueryTime.main(DSW2QueryTime.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
hadoop@Master:~/disk2/xubo/project/alignment/SparkSW/SparkSW20161114/alluxio-1.3.0$ ./testDSW2timequery.sh
alluxio://Master:19998/xubo/project/SparkSW/BLOSUM50.txt alluxio://Master:19998/xubo/project/SparkSW/input/query/D0DP18691.file alluxio://Master:19998/xubo/project/SparkSW/input/Luniref/DL8Line.fasta 128 1 5
16/12/25 13:47:42 ERROR TaskSchedulerImpl: Lost executor 6 on Mcnode5: remote Rpc client disassociated
16/12/25 13:47:42 ERROR TaskSchedulerImpl: Lost executor 1 on Mcnode6: remote Rpc client disassociated
[Stage 1:> (0 + 16) / 128]16/12/25 13:47:46 ERROR TaskSchedulerImpl: Lost executor 8 on Mcnode5: remote Rpc client disassociated
[Stage 1:> (0 + 14) / 128]16/12/25 13:47:47 ERROR TaskSchedulerImpl: Lost executor 9 on Mcnode6: remote Rpc client disassociated
[Stage 1:> (0 + 16) / 128]16/12/25 13:47:51 ERROR TaskSchedulerImpl: Lost executor 10 on Mcnode5: remote Rpc client disassociated
[Stage 1:> (0 + 14) / 128]16/12/25 13:47:51 ERROR TaskSchedulerImpl: Lost executor 11 on Mcnode6: remote Rpc client disassociated
[Stage 1:> (0 + 16) / 128]16/12/25 13:47:55 ERROR TaskSchedulerImpl: Lost executor 12 on Mcnode5: remote Rpc client disassociated
16/12/25 13:47:55 ERROR TaskSetManager: Task 4 in stage 1.0 failed 4 times; aborting job
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 1.0 failed 4 times, most recent failure: Lost task 4.3 in stage 1.0 (TID 26, Mcnode5): ExecutorLostFailure (executor 12 lost)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1007)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:989)
at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1.apply(RDD.scala:1370)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1357)
at org.apache.spark.rdd.RDD$$anonfun$top$1.apply(RDD.scala:1338)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.top(RDD.scala:1337)
at org.dsa.core.DSW2.align(DSW2.scala:39)
at org.dsa.core.SequenceAlignment$$anonfun$run$1.apply(SequenceAlignment.scala:33)
at org.dsa.core.SequenceAlignment$$anonfun$run$1.apply(SequenceAlignment.scala:32)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.dsa.core.SequenceAlignment.run(SequenceAlignment.scala:32)
at org.dsa.core.DSW2$.main(DSW2.scala:130)
at org.dsa.time.DSW2QueryTime$$anonfun$main$1$$anonfun$apply$mcVI$sp$1.apply$mcVI$sp(DSW2QueryTime.scala:19)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.dsa.time.DSW2QueryTime$$anonfun$main$1.apply$mcVI$sp(DSW2QueryTime.scala:14)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.dsa.time.DSW2QueryTime$.main(DSW2QueryTime.scala:13)
at org.dsa.time.DSW2QueryTime.main(DSW2QueryTime.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
通过
--conf "spark.executor.extraJavaOptions=-Djava.library.path=/home/hadoop/disk2/xubo/lib" \
配置,只会对sc里面的产生影响,不会对sc(RDD)外部产生影响,所喜
将需要调用JNI的代码放到RDD里面,然后通过静态变量或者伴生对象的属性来控制,使得只运行一次。
alluxio://Master:19998/xubo/project/SparkSW/BLOSUM50.txt alluxio://Master:19998/xubo/project/SparkSW/input/query/D0DP18691.file alluxio://Master:19998/xubo/project/SparkSW/input/Luniref/DL8Line.fasta 128 1 5
topK:5 Query:P18691
AlignmentRecord(UniRef100_P18691, , 67, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_A0A0E1RXE0, , 61, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_C5P0L7, , 61, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_P51640, , 58, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_A0A163CSK3, , 57, 0, 0, 0, 0, 0, 0)
alluxio://Master:19998/xubo/project/SparkSW/BLOSUM50.txt alluxio://Master:19998/xubo/project/SparkSW/input/query/D0DP18691.file alluxio://Master:19998/xubo/project/SparkSW/input/Luniref/DL8Line.fasta 128 1 5
topK:5 Query:P18691
AlignmentRecord(UniRef100_P18691, , 67, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_C5P0L7, , 61, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_A0A0E1RXE0, , 61, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_P51640, , 58, 0, 0, 0, 0, 0, 0)
参考
【1】http://spark.apache.org/docs/1.5.2/programming-guide.html
【2】https://github.com/xubo245/SparkLearning