spark on yarn-消除警告

spark on yarn

log-yarn.sh脚本

export HADOOP_ROOT_LOGGER=DEBUG,console
export HADOOP_CONF_DIR=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop
$SPARK_HOME/bin/spark-submit \
--master yarn \
--class www.ruozedata.bigdata.SparkCore02.LocalServeApp \
--name LocalServeApp \
/home/hadoop/lib/g5-spark-1.0.jar_log \
hdfs://hadoop001:9000/data/logs/input/secondhomework.txt hdfs://hadoop001:9000/data/logs/output

运行脚本

当我们运行该脚本时候会出现如下两处警告,对于警告1看不清楚可以通过改日志的级别为DEBUG模式,来看到更多的信息,也可以把控制台的内容放到源码的具体类里全局搜索,找到底层源码,看其警告的原因与解决办法。对于警告2能看到是参数没有设置的问题,直接去spark官网找到yarn,全局搜索spark.yarn.jars查看该参数如何配置

[hadoop@hadoop001 shell]$ ./log-yarn.sh
15/01/14 15:36:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

15/01/14 15:36:51 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME

警告二的具体解决方法

spark on yarn-消除警告_第1张图片

如图所示,官网已经明确说出spark on yarn 的时候会去加载sparkhome 下面的jars,如果没有在spark中指定hdfs上面的这些jar包,那么它就会每次都会把这些jar包打包到hdfs上面。所以要想消除它,就手动把这些jar包传到hdfs上面的一个文件夹里,然后在spark中指定这些jar包的位置就行。如下:

[hadoop@hadoop001 shell]$ hdfs dfs -mkdir /spark
[hadoop@hadoop001 shell]$ hdfs dfs -put $SPARK_HOME/jars/ /spark/
[hadoop@hadoop001 spark-2.4.0-bin-2.6.0-cdh5.7.0]$ hdfs dfs -ls /spark
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2015-01-14 15:20 /spark/jars
[hadoop@hadoop001 conf]$ vi spark-defaults.conf
spark.yarn.jars hdfs://hadoop001:9000/spark/jars/*(一定要用*指定其下面所有的jars,多次经验发现不指定到具体的jar就会报错)
[hadoop@hadoop001 shell]$ ./log-yarn.sh(再次运行发现已经可以加载到spark jar而不需要再次打包)
15/01/14 16:35:05 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs://hadoop001:9000/spark/jars/JavaEWAH-0.3.2.jar
至此,第二个警告解决

警告一的解决的具体方法如下:

[hadoop@hadoop001 conf]$ vi .bash_profile
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
[hadoop@hadoop001 conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@hadoop001 conf]$ vi spark-env.sh
export LD_LIBRARY_PATH=$JAVA_LIBRARY_PATH(可能是上下两个路径衔接的有问题,一直不行,换成下面这种路径的配置,解决了问题)
export LD_LIBRARY_PATH=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/lib/native(一步配到位,少折腾一点,就OK了。。。)
[hadoop@hadoop001 shell]$ ./log-yarn.sh(再次运行结果如下)
15/01/14 17:17:29 INFO spark.SparkContext: Running Spark version 2.4.0
15/01/14 17:17:29 INFO spark.SparkContext: Submitted application: LocalServeApp
15/01/14 17:17:30 INFO spark.SecurityManager: Changing view acls to: hadoop
至此,消除了第一个警告

你可能感兴趣的:(yarn,spark,大数据)