Spark ON YARN 的官方文档,基于Spark 2.1.1
http://spark.apache.org/docs/2.1.1/running-on-yarn.html
To use a custom log4j configuration for the application master or executors, here are the options:
log4j.properties
using spark-submit
, by adding it to the --files
list of files to be uploaded with the application.-Dlog4j.configuration=
to spark.driver.extraJavaOptions
(for the driver) or spark.executor.extraJavaOptions
(for executors). Note that if using a file, the file:
protocol should be explicitly provided, and the file needs to exist locally on all the nodes.$SPARK_CONF_DIR/log4j.properties
file and it will be automatically uploaded along with the other configurations. Note that other 2 options has higher priority than this option if multiple options are specified.Note that for the first option, both executors and the application master will share the same log4j configuration, which may cause issues when they run on the same node (e.g. trying to write to the same log file).
通过以上的描述,可以看到可以通过 --files 传预设值的配置
例如 :
--files log4j.config 这种
那么如果提交多个文件怎么办呢,此时我们需要用 ( , ) 逗号进行分割。
--files redis.config,mysql.config
我们将提交多个文件写成脚本 :
ROOT_PATH=$(dirname $(readlink -f $0))
## config , Job config files
config=""
for file in ${ROOT_PATH}/config/*
do
config="${file},${config}"
done
nohup /usr/bin/spark2-submit \
--class ${class_name} \
--name ${JOB_NAME} \
--files ${config} \
--master yarn \
--driver-memory 2G \
--driver-cores 1 \
--num-executors 3 \
--executor-cores 2 \
--executor-memory 2G \
--jars ${classpath} \
${ROOT_PATH}/libs/${APP_NAME}-${libVersion}-SNAPSHOT.jar online ${config} \
> ${ROOT_PATH}/logs/start.error 2> ${ROOT_PATH}/logs/start.log &
可以看到真实的路径为如下描述:
--- Test ---
config files : /data-hdd/00/project/cloudera-scm/spark-workspace/onlineJob/TD-clickImp-blacklist-redis/0.0.6/config/redis_cluster.conf,/data-hdd/00/project/cloudera-scm/spark-workspace/onlineJob/TD-clickImp-blacklist-redis/0.0.6/config/LAN_ip,/data-hdd/00/project/cloudera-scm/spark-workspace/onlineJob/TD-clickImp-blacklist-redis/0.0.6/config/kafka_cluster.conf,
拆分为这3个文件:
/data-hdd/00/project/cloudera-scm/spark-workspace/onlineJob/TD-clickImp-blacklist-redis/0.0.6/config/redis_cluster.conf,
/data-hdd/00/project/cloudera-scm/spark-workspace/onlineJob/TD-clickImp-blacklist-redis/0.0.6/config/LAN_ip,
/data-hdd/00/project/cloudera-scm/spark-workspace/onlineJob/TD-clickImp-blacklist-redis/0.0.6/config/kafka_cluster.conf,
首先,注意我们把这些路径通过参数的方式传递进 main 方法中。
//获取上传的文件名
Set fileFullPathContainer = null;
if (args.length < 2 && (GlobalVariable.env != EnvType.IDE)) {
return;
} else {
fileFullPathContainer = FileUtil.filesSplit(args[1], ",");
}
filesSplit 方法:
public static Set filesSplit(String filesString, String separator) {
Set resultSet = new HashSet<>();
String[] tmpArr = filesString.split(separator);
resultSet.addAll(Arrays.asList(tmpArr));
return resultSet;
}
这样我们就可以通过绝对路径读取到配置文件了,下面是一个示例:
if (GlobalVariable.env == EnvType.IDE) {
ConfigCenter.init(GlobalVariable.env);
configCenter = ConfigCenter.getInstance();
} else {
ConfigCenter.init(null);
configCenter = ConfigCenter.getInstance();
//读取Kafka 配置
Properties kafkaProps = new Properties();
kafkaProps.load(new FileInputStream(FileUtil.findFileFullPath(fileFullPathContainer, "kafka_cluster.conf")));
configCenter.setKafkaConfig(kafkaProps);
//读取Redis 配置
Properties redisProps = new Properties();
redisProps.load(new FileInputStream(FileUtil.findFileFullPath(fileFullPathContainer, "redis_cluster.conf")));
configCenter.setRedisConfig(redisProps);
}
findFileFullPath 方法:
public static String findFileFullPath(Set container, String shortName) {
for (String tmpString : container) {
if (tmpString.contains(shortName)) {
return tmpString;
}
}
return null;
}