记录常用脚本
scp -r ~/opt/hadoop-2.6.3/etc/hadoop spark@slave01:~/opt/hadoop-2.6.3/etc/
scp -r ~/opt/spark-1.6.0-bin-hadoop2.6/conf spark@slave01:~/opt/spark-1.6.0-bin-hadoop2.6/
du -sh ./* | sort -rn
intellij打包
new scala project, new test object, new WordCount object.
file->project structure->artifacts->jar->unamed.jar->改名->加入compile output->build artifacts
放到master上spark-submit
[spark@localhost jar]$ spark-submit scalaproj.jar
Error: Cannot load main class from JAR file:/home/spark/jar/scalaproj.jar
Run with --help for usage help or --verbose for debug output
需要指明class
[spark@localhost jar]$ spark-submit --class test scalaproj.jar
hello world!!!
在使用wordcount的过程中发生了错误
[spark@localhost jar]$ spark-submit --class ipcount scalaproj.jar /tmp/input/data/http_log
16/02/24 10:42:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/24 10:42:05 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 172.31.0.34 instead (on interface em1)
16/02/24 10:42:05 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
[Stage 1:====> (18 + 43) / 216]16/02/24 10:50:41 ERROR TaskSetManager: Total size of serialized results of 23 tasks (1057.0 MB) is bigger than spark.driver. maxResultSize (1024.0 MB)
综上,主要的问题是memory leak和result size bigger than maxResultSize
ERROR Executor: Managed memory leak detected; size = 36768558 bytes, TID = 275
ERROR TaskSetManager: Total size of serialized results of 25 tasks (1149.1 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
下面是yarn-client模式运行
[spark@localhost sbin]$ spark-submit --master yarn-client --class org.apache.spark.examples.JavaWordCount ~/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar /tmp/input/data/http_log/http_log.log.1
16/02/24 13:26:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/24 13:26:40 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 172.31.0.34 instead (on interface em1)
16/02/24 13:26:40 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/24 13:27:03 ERROR YarnClientSchedulerBackend: Yarn application has already exited with state FAILED!
16/02/24 13:27:12 ERROR SparkContext: Error initializing SparkContext.
java.lang.NullPointerException
at org.apache.spark.SparkContext.(SparkContext.scala:584)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
at org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" java.lang.NullPointerException
at org.apache.spark.SparkContext.(SparkContext.scala:584)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
at org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
修改了memory和executor num之后
[spark@localhost hadoop]$ spark-submit --master yarn-client --executor-memory 2g --num-executors 3 --class org.apache.spark.examples.JavaWordCount ~/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar /tmp/input/data/http_log/http_log.log.1
16/02/24 13:39:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/24 13:39:22 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 172.31.0.34 instead (on interface em1)
16/02/24 13:39:22 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/24 13:39:49 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
at org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/02/24 13:39:49 WARN MetricsSystem: Stopping a MetricsSystem that is not running
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
at org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
参照yarn内存优化,修改这里yarn-site.xml
yarn.nodemanager.resource.memory-mb
22528
每个节点可用内存,单位MB
yarn.scheduler.minimum-allocation-mb
1500
单个任务可申请最少内存,默认1024MB
yarn.scheduler.maximum-allocation-mb
16384
单个任务可申请最大内存,默认8192MB
现在的情形是:yarn-client可以跑,yarn-cluster不能
[spark@localhost hadoop-2.6.3]$ spark-submit --master yarn-cluster --executor-memory 2g --num-executors 3 --class org.apache.spark.examples.JavaWordCount ~/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0- hadoop2.6.0.jar /tmp/input/README.txt
16/02/24 15:23:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.IllegalArgumentException: Required AM memory (92160+9216 MB) is above the max threshold (16384 MB) of this cluster! Please increase the value of 'yarn.scheduler.maximum- allocation-mb'.
at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:290)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:139)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1016)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1076)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[spark@localhost hadoop-2.6.3]$
spark-submit --master yarn-client --executor-memory 2g --num-executors 3 --class org.apache.spark.examples.JavaWordCount ~/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar /tmp/input/README.txt