在HDP安装的yarn基础上,自定义安装个spark on yarn

一、概述

通常用HDP+AMBARI安装的spark on yarn 模式可以正常使用,但是缺点是spark版本是HDP包中固定好的,极其不灵活,目标就是使用HDP+AMBARI安装的yarn , 然后spark自己部署,保证自己安装的spark可以运行在ambari部署的yarn上面。

二、 部署步骤

1.

进入/usr/hdp/2.5.3.0-37/hadoop-yarn/lib(hdp安装目录)目录下将jersey-client-1.9.jar和jersey-core-1.9.jar拷贝到/opt/spark-2.2.1-bin-hadoop2.7/jars目录下面,但是/opt/spark-2.2.1-bin-hadoop2.7/jars下面会有jersey-client-2.22.2.jar,只需将它从命名替换掉这个老的即可。

2.

下载安装包并且解压,比如spark-2.2.1-bin-hadoop2.7.tgz

3.

添加环境变量 SPARK_HOME

4.

修改配置文件

①.cp spark-defaults.conf.template spark-defaults.conf

添加内容如下:
spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.eventLog.dir hdfs:///spark2-history/
spark.eventLog.enabled true
spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.history.fs.logDirectory hdfs:///spark2-history/
spark.history.kerberos.keytab none
spark.history.kerberos.principal none
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.ui.port 18081
spark.yarn.historyServer.address aaa:18081
spark.yarn.queue default
默认使用default队列,根据yarn实际情况而定
上面的域名要根据实际情况选择。

②.cp spark-env.sh.template spark-env.sh

添加内容如下:
export JAVA_HOME=/opt/jdk1.8.0_111
export SCALA_HOME=/opt/scala-2.11.7
export HADOOP_HOME=/usr/hdp/current/hadoop-client
export HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf
export SPARK_HOME=/opt/spark-2.2.1-bin-hadoop2.7

5.在ambari界面修改

在HDP安装的yarn基础上,自定义安装个spark on yarn_第1张图片
版本号根据hdp-select |grep hadoop-hdfs-datanode得到的版本号而定

6.

输入hadoop fs –mkdir /spark2-history(如果conf配置了history,就要创建)

7.

在ambari上的yarn的配置选项中选项中增加以下两个配置2选一:
yarn.nodemanager.vmem-check-enable=false
yarn.nodemanager.pmem-check-enable=false

在HDP安装的yarn基础上,自定义安装个spark on yarn_第2张图片
如果已经存在相映的配置,则不需要更改。

三、常见的问题

Stack trace: ExitCodeException exitCode=1:
/hadoop/yarn/local/usercache/root/appcache/application_1522807066160_0019/container_e26_1522807066160_0019_02_000001/launch_container.sh:
line 21:
P W D : PWD: PWD:PWD/spark_conf:KaTeX parse error: Expected group after '_' at position 5: PWD/_̲_spark_libs__/*…HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/:/usr/hdp/current/hadoop-client/lib/:/usr/hdp/current/hadoop-hdfs-client/:/usr/hdp/current/hadoop-hdfs-client/lib/:/usr/hdp/current/hadoop-yarn-client/:/usr/hdp/current/hadoop-yarn-client/lib/: P W D / m r − f r a m e w o r k / h a d o o p / s h a r e / h a d o o p / m a p r e d u c e / ∗ : PWD/mr-framework/hadoop/share/hadoop/mapreduce/*: PWD/mrframework/hadoop/share/hadoop/mapreduce/:PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/: P W D / m r − f r a m e w o r k / h a d o o p / s h a r e / h a d o o p / c o m m o n / ∗ : PWD/mr-framework/hadoop/share/hadoop/common/*: PWD/mrframework/hadoop/share/hadoop/common/:PWD/mr-framework/hadoop/share/hadoop/common/lib/: P W D / m r − f r a m e w o r k / h a d o o p / s h a r e / h a d o o p / y a r n / ∗ : PWD/mr-framework/hadoop/share/hadoop/yarn/*: PWD/mrframework/hadoop/share/hadoop/yarn/:PWD/mr-framework/hadoop/share/hadoop/yarn/lib/: P W D / m r − f r a m e w o r k / h a d o o p / s h a r e / h a d o o p / h d f s / ∗ : PWD/mr-framework/hadoop/share/hadoop/hdfs/*: PWD/mrframework/hadoop/share/hadoop/hdfs/:PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/: P W D / m r − f r a m e w o r k / h a d o o p / s h a r e / h a d o o p / t o o l s / l i b / ∗ : / u s r / h d p / 2.5.3.0 − 37 / h a d o o p / l i b / h a d o o p − l z o − 0.6.0. PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.5.3.0-37/hadoop/lib/hadoop-lzo-0.6.0. PWD/mrframework/hadoop/share/hadoop/tools/lib/:/usr/hdp/2.5.3.037/hadoop/lib/hadooplzo0.6.0.{hdp.version}.jar:/etc/hadoop/conf/secure:
bad substitution

解决方案:
在HDP安装的yarn基础上,自定义安装个spark on yarn_第3张图片
版本号根据hdp-select |grep hadoop-hdfs-datanode得到的版本号而定
2.
java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
解决方案:
进入/usr/hdp/2.5.3.0-37/hadoop-yarn/lib目录下将jersey-client-1.9.jar和jersey-core-1.9.jar拷贝到/opt/spark-2.2.1-bin-hadoop2.7/jars目录下面,但是/opt/spark-2.2.1-bin-hadoop2.7/jars下面会有jersey-client-2.22.2.jar,只需将它从命名即可替换掉这个老的。

你可能感兴趣的:(spark,on,yarn,spark,yarn,ambari)