Spark问题9之Spark通过JNI调用c的问题解决

更多代码请见:https://github.com/xubo245/SparkLearning

Spark生态之Alluxio学习 版本:alluxio(tachyon) 0.7.1,spark-1.5.2,hadoop-2.6.0

1.问题描述

1.1 描述

当scala通过JNI调用c时,使用spark-submit提交时,会出现错误:

no JNIparasail in java.library.path

或者

ERROR TaskSchedulerImpl: Lost executor 6 on Mcnode5: remote Rpc client disassociated

1.2 报错

脚本:

hadoop@Master:~/disk2/xubo/project/alignment/SparkSW/SparkSW20161114/alluxio-1.3.0$ cat testDSW2timequery.sh 
#!/bin/bash

#sbt clean
#sbt package
#/home/zgg/lib/spark-1.0.1-bin-hadoop2/bin/spark-submit \
  spark-submit \
  --class "org.dsa.time.DSW2QueryTime" \
  --conf "spark.executor.extraJavaOptions=-Djava.library.path=/home/hadoop/disk2/xubo/lib" \
  --master spark://Master:7077 \
  --executor-memory 8G \
  DSA.jar

1.2.1

hadoop@Master:~/disk2/xubo/project/alignment/SparkSW/SparkSW20161114/alluxio-1.3.0$ ./testDSW2timequery.sh 
alluxio://Master:19998/xubo/project/SparkSW/BLOSUM50.txt    alluxio://Master:19998/xubo/project/SparkSW/input/query/D0DP18691.file  alluxio://Master:19998/xubo/project/SparkSW/input/Luniref/DL8Line.fasta 128 1   5   
Exception in thread "main" java.lang.UnsatisfiedLinkError: no JNIparasail in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
    at java.lang.Runtime.loadLibrary0(Runtime.java:849)
    at java.lang.System.loadLibrary(System.java:1088)
    at parasail.Matrix.(Matrix.java:9)
    at org.dsa.core.DSW2.align(DSW2.scala:30)
    at org.dsa.core.SequenceAlignment$$anonfun$run$1.apply(SequenceAlignment.scala:33)
		at org.dsa.core.SequenceAlignment$$anonfun$run$1.apply(SequenceAlignment.scala:32)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at org.dsa.core.SequenceAlignment.run(SequenceAlignment.scala:32)
    at org.dsa.core.DSW2$.main(DSW2.scala:130)
    at org.dsa.time.DSW2QueryTime$$anonfun$main$1$$anonfun$apply$mcVI$sp$1.apply$mcVI$sp(DSW2QueryTime.scala:19)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
    at org.dsa.time.DSW2QueryTime$$anonfun$main$1.apply$mcVI$sp(DSW2QueryTime.scala:14)
		at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
		at org.dsa.time.DSW2QueryTime$.main(DSW2QueryTime.scala:13)
		at org.dsa.time.DSW2QueryTime.main(DSW2QueryTime.scala)
		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.lang.reflect.Method.invoke(Method.java:606)
		at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

1.2.2

hadoop@Master:~/disk2/xubo/project/alignment/SparkSW/SparkSW20161114/alluxio-1.3.0$ ./testDSW2timequery.sh 
alluxio://Master:19998/xubo/project/SparkSW/BLOSUM50.txt    alluxio://Master:19998/xubo/project/SparkSW/input/query/D0DP18691.file  alluxio://Master:19998/xubo/project/SparkSW/input/Luniref/DL8Line.fasta 128 1   5   
16/12/25 13:47:42 ERROR TaskSchedulerImpl: Lost executor 6 on Mcnode5: remote Rpc client disassociated
16/12/25 13:47:42 ERROR TaskSchedulerImpl: Lost executor 1 on Mcnode6: remote Rpc client disassociated
[Stage 1:>                                                       (0 + 16) / 128]16/12/25 13:47:46 ERROR TaskSchedulerImpl: Lost executor 8 on Mcnode5: remote Rpc client disassociated
[Stage 1:>                                                       (0 + 14) / 128]16/12/25 13:47:47 ERROR TaskSchedulerImpl: Lost executor 9 on Mcnode6: remote Rpc client disassociated
[Stage 1:>                                                       (0 + 16) / 128]16/12/25 13:47:51 ERROR TaskSchedulerImpl: Lost executor 10 on Mcnode5: remote Rpc client disassociated
[Stage 1:>                                                       (0 + 14) / 128]16/12/25 13:47:51 ERROR TaskSchedulerImpl: Lost executor 11 on Mcnode6: remote Rpc client disassociated
[Stage 1:>                                                       (0 + 16) / 128]16/12/25 13:47:55 ERROR TaskSchedulerImpl: Lost executor 12 on Mcnode5: remote Rpc client disassociated
16/12/25 13:47:55 ERROR TaskSetManager: Task 4 in stage 1.0 failed 4 times; aborting job
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 1.0 failed 4 times, most recent failure: Lost task 4.3 in stage 1.0 (TID 26, Mcnode5): ExecutorLostFailure (executor 12 lost)
Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
		at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
		at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
		at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
		at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
		at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
		at scala.Option.foreach(Option.scala:236)
		at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
		at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
		at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
		at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
		at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
    at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1007)
		at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
		at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
		at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
		at org.apache.spark.rdd.RDD.reduce(RDD.scala:989)
		at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1.apply(RDD.scala:1370)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
    at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1357)
    at org.apache.spark.rdd.RDD$$anonfun$top$1.apply(RDD.scala:1338)
		at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
		at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
		at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
		at org.apache.spark.rdd.RDD.top(RDD.scala:1337)
		at org.dsa.core.DSW2.align(DSW2.scala:39)
		at org.dsa.core.SequenceAlignment$$anonfun$run$1.apply(SequenceAlignment.scala:33)
    at org.dsa.core.SequenceAlignment$$anonfun$run$1.apply(SequenceAlignment.scala:32)
		at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
		at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
		at org.dsa.core.SequenceAlignment.run(SequenceAlignment.scala:32)
		at org.dsa.core.DSW2$.main(DSW2.scala:130)
		at org.dsa.time.DSW2QueryTime$$anonfun$main$1$$anonfun$apply$mcVI$sp$1.apply$mcVI$sp(DSW2QueryTime.scala:19)
		at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
		at org.dsa.time.DSW2QueryTime$$anonfun$main$1.apply$mcVI$sp(DSW2QueryTime.scala:14)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
    at org.dsa.time.DSW2QueryTime$.main(DSW2QueryTime.scala:13)
    at org.dsa.time.DSW2QueryTime.main(DSW2QueryTime.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

2.解决办法:

2.1 分析

通过

--conf "spark.executor.extraJavaOptions=-Djava.library.path=/home/hadoop/disk2/xubo/lib" \

配置,只会对sc里面的产生影响,不会对sc(RDD)外部产生影响,所喜

2.2 解决办法

将需要调用JNI的代码放到RDD里面,然后通过静态变量或者伴生对象的属性来控制,使得只运行一次。

3.运行记录:

alluxio://Master:19998/xubo/project/SparkSW/BLOSUM50.txt    alluxio://Master:19998/xubo/project/SparkSW/input/query/D0DP18691.file  alluxio://Master:19998/xubo/project/SparkSW/input/Luniref/DL8Line.fasta 128 1   5   
topK:5 Query:P18691                                                             
AlignmentRecord(UniRef100_P18691, , 67, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_A0A0E1RXE0, , 61, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_C5P0L7, , 61, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_P51640, , 58, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_A0A163CSK3, , 57, 0, 0, 0, 0, 0, 0)

alluxio://Master:19998/xubo/project/SparkSW/BLOSUM50.txt    alluxio://Master:19998/xubo/project/SparkSW/input/query/D0DP18691.file  alluxio://Master:19998/xubo/project/SparkSW/input/Luniref/DL8Line.fasta 128 1   5   
topK:5 Query:P18691                                                             
AlignmentRecord(UniRef100_P18691, , 67, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_C5P0L7, , 61, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_A0A0E1RXE0, , 61, 0, 0, 0, 0, 0, 0)
AlignmentRecord(UniRef100_P51640, , 58, 0, 0, 0, 0, 0, 0)

参考

【1】http://spark.apache.org/docs/1.5.2/programming-guide.html
【2】https://github.com/xubo245/SparkLearning

你可能感兴趣的:(Spark问题)