前一篇博文(创建Spark 2.1.0 Docker镜像)我们讲了如何基于sequenceiq/hadoop-docker:2.6.0镜像构建Spark 2.1.0的Docker镜像。由于当前sequenceiq/hadoop-docker:2.6.0中使用Java 7,而使用Java7运行Spark 2.1.0会出现以下WARN Log:
17/01/11 05:00:45 WARN spark.SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0
如果不想看到这条碍眼的Log的话,就需要将Java版本的升级到Java 8。
从Oracle官网下载JRE安装包,我下载的是最新版的Server端JRE。
Server JRE (Java SE Runtime Environment) 8u112
http://www.oracle.com/technetwork/java/javase/downloads/server-jre8-downloads-2133154.html注意:下载的文件需要放到docker-spark目录下。
-rwxrwx--- 1 farawayzheng farawayzheng 59909235 1月 13 12:37 server-jre-8u112-linux-x64.tar.gz
#update to java 8
ADD server-jre-8u112-linux-x64.tar.gz /usr/java/
RUN cd /usr/java && rm -f latest && ln -s ./jdk1.8.0_112 latest && rm -rf jdk1.7.0_51
注1:需要根据实际下载的JRE版本不同,修改上面相关的内容。
注2:Dockerfile文件内容请参照上一篇博文:”创建Spark 2.1.0 Docker镜像“。
$ docker build --rm -t farawayzheng/spark:2.1.0 .
构建完成后,可以用docker images查看镜像的更新日期,来确定新镜像是否构建成功。
升级Java8之后,使用Local模式启动Spark是没有问题的。
bash-4.1# spark-shell --master local --driver-memory 1g --executor-memory 1g --executor-cores 1
不过令人遗憾的是,使用Yarn模式启动Spark会出现异常。
bash-4.1# spark-shell --master yarn --deploy-mode client --driver-memory 512m --executor-memory 512m --executor-cores 1
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/01/13 04:20:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/01/13 04:20:36 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
17/01/13 04:20:36 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:614)
at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:169)
at org.apache.spark.SparkContext.(SparkContext.scala:567)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
......
由于Java 8与Hapdoop 2.6.0的Yarn存在某些不兼容,造成内存的溢出,导致程序异常终止。
相同的现象请参照:
Spark fail when running pi.py example with yarn-client mode”
http://stackoverflow.com/questions/27792839/spark-fail-when-running-pi-py-example-with-yarn-client-mode
Spark Pi Example in Cluster mode with Yarn: Association lost
http://stackoverflow.com/questions/29512565/spark-pi-example-in-cluster-mode-with-yarn-association-lost
原因的讨论:
参照上面的资料,有一种解决方法验证可行。
bash-4.1# vi /usr/local/hadoop/etc/hadoop/yarn-site.xml
加入以下属性值:
<property>
<name>yarn.nodemanager.pmem-check-enabledname>
<value>falsevalue>
property>
<property>
<name>yarn.nodemanager.vmem-check-enabledname>
<value>falsevalue>
property>
bash-4.1# /usr/local/hadoop/sbin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [sandbox]
sandbox: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
localhost: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
bash-4.1# /etc/bootstrap.sh
rm: cannot remove `/tmp/*.pid': No such file or directory
/usr/local/hadoop/sbin
Starting namenodes on [sandbox]
sandbox: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-sandbox.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-sandbox.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-sandbox.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-sandbox.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-sandbox.out
bash-4.1# spark-shell --master yarn --deploy-mode client --driver-memory 512m --executor-memory 512m --executor-cores 1
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/01/13 04:27:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/01/13 04:27:34 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/usr/local/spark/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/usr/local/spark-2.1.0-bin-hadoop2.6/jars/datanucleus-rdbms-3.2.9.jar."
17/01/13 04:27:34 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/usr/local/spark/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/usr/local/spark-2.1.0-bin-hadoop2.6/jars/datanucleus-core-3.2.10.jar."
17/01/13 04:27:34 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/usr/local/spark/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/usr/local/spark-2.1.0-bin-hadoop2.6/jars/datanucleus-api-jdo-3.2.6.jar."
17/01/13 04:27:38 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/01/13 04:27:38 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
17/01/13 04:27:39 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://172.17.0.2:4040
Spark context available as 'sc' (master = yarn, app id = application_1484299580554_0001).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
(完)