升级Spark 2.1.0 Docker镜像到Java 8

前言

前一篇博文(创建Spark 2.1.0 Docker镜像)我们讲了如何基于sequenceiq/hadoop-docker:2.6.0镜像构建Spark 2.1.0的Docker镜像。由于当前sequenceiq/hadoop-docker:2.6.0中使用Java 7,而使用Java7运行Spark 2.1.0会出现以下WARN Log:

17/01/11 05:00:45 WARN spark.SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0

如果不想看到这条碍眼的Log的话,就需要将Java版本的升级到Java 8。

升级过程

1) 准备Java 8运行时的安装包

从Oracle官网下载JRE安装包,我下载的是最新版的Server端JRE。

Server JRE (Java SE Runtime Environment) 8u112
http://www.oracle.com/technetwork/java/javase/downloads/server-jre8-downloads-2133154.html

注意:下载的文件需要放到docker-spark目录下。

-rwxrwx--- 1 farawayzheng farawayzheng  59909235  113 12:37 server-jre-8u112-linux-x64.tar.gz


2) 在Dockerfile中添加升级Java的命令行

  • 编辑Dockerfile,在MAINTAINER行下面添加如下内容:
#update to java 8
ADD server-jre-8u112-linux-x64.tar.gz /usr/java/
RUN cd /usr/java && rm -f latest && ln -s ./jdk1.8.0_112 latest && rm -rf jdk1.7.0_51

注1:需要根据实际下载的JRE版本不同,修改上面相关的内容。
注2:Dockerfile文件内容请参照上一篇博文:”创建Spark 2.1.0 Docker镜像“。

3) 重新构建Docker镜像:

$ docker build --rm -t farawayzheng/spark:2.1.0 .

构建完成后,可以用docker images查看镜像的更新日期,来确定新镜像是否构建成功。


限制

问题点

升级Java8之后,使用Local模式启动Spark是没有问题的。

bash-4.1# spark-shell --master local --driver-memory 1g --executor-memory 1g --executor-cores 1

不过令人遗憾的是,使用Yarn模式启动Spark会出现异常。

bash-4.1# spark-shell --master yarn --deploy-mode client --driver-memory 512m --executor-memory 512m --executor-cores 1
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/01/13 04:20:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/01/13 04:20:36 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
17/01/13 04:20:36 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
    at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:614)
    at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:169)
    at org.apache.spark.SparkContext.(SparkContext.scala:567)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
......


原因

由于Java 8与Hapdoop 2.6.0的Yarn存在某些不兼容,造成内存的溢出,导致程序异常终止。

相同的现象请参照:

  • Spark fail when running pi.py example with yarn-client mode”
    http://stackoverflow.com/questions/27792839/spark-fail-when-running-pi-py-example-with-yarn-client-mode

  • Spark Pi Example in Cluster mode with Yarn: Association lost
    http://stackoverflow.com/questions/29512565/spark-pi-example-in-cluster-mode-with-yarn-association-lost

原因的讨论:

  • [Java 8] Over usage of virtual memory
    https://issues.apache.org/jira/browse/YARN-4714

解决方案

参照上面的资料,有一种解决方法验证可行。

  • 修改 yarn-site.xml文件
bash-4.1# vi /usr/local/hadoop/etc/hadoop/yarn-site.xml

加入以下属性值:

<property>
    <name>yarn.nodemanager.pmem-check-enabledname>
    <value>falsevalue>
property>

<property>
    <name>yarn.nodemanager.vmem-check-enabledname>
    <value>falsevalue>
property>
  • 重启Yarn服务
bash-4.1# /usr/local/hadoop/sbin/stop-all.sh 
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [sandbox]
sandbox: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
localhost: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop

bash-4.1# /etc/bootstrap.sh 
rm: cannot remove `/tmp/*.pid': No such file or directory
/usr/local/hadoop/sbin
Starting namenodes on [sandbox]
sandbox: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-sandbox.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-sandbox.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-sandbox.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-sandbox.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-sandbox.out
  • 启动Spark-shell
bash-4.1# spark-shell --master yarn --deploy-mode client --driver-memory 512m --executor-memory 512m --executor-cores 1
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/01/13 04:27:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/01/13 04:27:34 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/usr/local/spark/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/usr/local/spark-2.1.0-bin-hadoop2.6/jars/datanucleus-rdbms-3.2.9.jar."
17/01/13 04:27:34 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/usr/local/spark/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/usr/local/spark-2.1.0-bin-hadoop2.6/jars/datanucleus-core-3.2.10.jar."
17/01/13 04:27:34 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/usr/local/spark/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/usr/local/spark-2.1.0-bin-hadoop2.6/jars/datanucleus-api-jdo-3.2.6.jar."
17/01/13 04:27:38 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/01/13 04:27:38 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
17/01/13 04:27:39 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://172.17.0.2:4040
Spark context available as 'sc' (master = yarn, app id = application_1484299580554_0001).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

(完)

你可能感兴趣的:(Docker与容器技术,Spark)