【一小步】spark的first try,用intellij打包并spark-submit

记录常用脚本

scp -r ~/opt/hadoop-2.6.3/etc/hadoop spark@slave01:~/opt/hadoop-2.6.3/etc/
scp -r ~/opt/spark-1.6.0-bin-hadoop2.6/conf spark@slave01:~/opt/spark-1.6.0-bin-hadoop2.6/
du -sh ./* | sort -rn   

intellij打包

new scala project, new test object, new WordCount object.
file->project structure->artifacts->jar->unamed.jar->改名->加入compile output->build artifacts

放到master上spark-submit

[spark@localhost jar]$ spark-submit scalaproj.jar 
Error: Cannot load main class from JAR file:/home/spark/jar/scalaproj.jar
Run with --help for usage help or --verbose for debug output

需要指明class

[spark@localhost jar]$ spark-submit --class test scalaproj.jar 
hello world!!!

在使用wordcount的过程中发生了错误

[spark@localhost jar]$ spark-submit --class ipcount scalaproj.jar  /tmp/input/data/http_log
16/02/24 10:42:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/24 10:42:05 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 172.31.0.34 instead (on interface em1)
16/02/24 10:42:05 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
[Stage 1:====>                                                  (18 + 43) / 216]16/02/24 10:50:41 ERROR TaskSetManager: Total size of serialized results of 23 tasks (1057.0 MB) is bigger than spark.driver.    maxResultSize (1024.0 MB)

综上,主要的问题是memory leak和result size bigger than maxResultSize

ERROR Executor: Managed memory leak detected; size = 36768558 bytes, TID = 275
ERROR TaskSetManager: Total size of serialized results of 25 tasks (1149.1 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)

下面是yarn-client模式运行

[spark@localhost sbin]$ spark-submit --master yarn-client --class org.apache.spark.examples.JavaWordCount ~/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar  /tmp/input/data/http_log/http_log.log.1
16/02/24 13:26:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/24 13:26:40 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 172.31.0.34 instead (on interface em1)
16/02/24 13:26:40 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/24 13:27:03 ERROR YarnClientSchedulerBackend: Yarn application has already exited with state FAILED!
16/02/24 13:27:12 ERROR SparkContext: Error initializing SparkContext.
java.lang.NullPointerException
        at org.apache.spark.SparkContext.(SparkContext.scala:584)
        at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
        at org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:44)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" java.lang.NullPointerException
        at org.apache.spark.SparkContext.(SparkContext.scala:584)
        at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
        at org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:44)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

修改了memory和executor num之后

[spark@localhost hadoop]$ spark-submit --master yarn-client --executor-memory 2g --num-executors 3 --class org.apache.spark.examples.JavaWordCount ~/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar  /tmp/input/data/http_log/http_log.log.1
16/02/24 13:39:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/24 13:39:22 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 172.31.0.34 instead (on interface em1)
16/02/24 13:39:22 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/24 13:39:49 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
        at org.apache.spark.SparkContext.(SparkContext.scala:530)
        at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
        at org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:44)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/02/24 13:39:49 WARN MetricsSystem: Stopping a MetricsSystem that is not running
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
        at org.apache.spark.SparkContext.(SparkContext.scala:530)
        at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
        at org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:44)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

参照yarn内存优化,修改这里yarn-site.xml


    yarn.nodemanager.resource.memory-mb
    22528
    每个节点可用内存,单位MB
  
  
  
    yarn.scheduler.minimum-allocation-mb
    1500
    单个任务可申请最少内存,默认1024MB
  
  
  
    yarn.scheduler.maximum-allocation-mb
    16384
    单个任务可申请最大内存,默认8192MB
  

现在的情形是:yarn-client可以跑,yarn-cluster不能

[spark@localhost hadoop-2.6.3]$ spark-submit --master yarn-cluster --executor-memory 2g --num-executors 3 --class org.apache.spark.examples.JavaWordCount ~/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-    hadoop2.6.0.jar  /tmp/input/README.txt     
16/02/24 15:23:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.IllegalArgumentException: Required AM memory (92160+9216 MB) is above the max threshold (16384 MB) of this cluster! Please increase the value of 'yarn.scheduler.maximum-    allocation-mb'.
        at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:290)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:139)
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1016)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1076)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[spark@localhost hadoop-2.6.3]$ 


spark-submit --master yarn-client --executor-memory 2g --num-executors 3 --class org.apache.spark.examples.JavaWordCount ~/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar  /tmp/input/README.txt

你可能感兴趣的:(【一小步】spark的first try,用intellij打包并spark-submit)