Spark任务提交-json参数踩坑

Spark提交任务时,需要传递两个参数,其中一个是json字段串

json 参数如下:

{
"dest_catalog":"测试文件1",
"site":"tencent",
"song_settings":{"lower_case":"true","remove_brackets":"true","simple_chinese":"true","remove_blank":"true","remove_special":"true"},
"artist_settings":{"lower_case":"true","remove_brackets":"true","simple_chinese":"true","remove_blank":"true","remove_special":"true","ignore_order":"true","match_signal":"true"}
}

踩坑1:不能直接传入上面的json参数

/data/app/spark/bin/spark-submit --name 测试 --class com.karakal.lanchao.process.ProcessData --driver-memory 6g --master yarn --deploy-mode cluster --executor-memory 8g --num-executors 2 --executor-cores 2 --files hdfs://hadoop-cluster-ha/lanchao/bigdata_support/songlist/testdata.txt hdfs://hadoop-cluster-ha/lanchao/bigdata_support/sparkjar.jar 
/lanchao/bigdata_support/songlist/testdata.txt {"dest_catalog":"测试文件1","site":"tencent","song_settings":{"lower_case":"true","remove_brackets":"true","simple_chinese":"true","remove_blank":"true","remove_special":"true"},"artist_settings":{"lower_case":"true","remove_brackets":"true","simple_chinese":"true","remove_blank":"true","remove_special":"true","ignore_order":"true","match_signal":"true"}}

以上这样传入参数,在spark程序中接收到的参数是错误的:

args(0)=/lanchao/bigdata_support/songlist/testdata.txt
args(1)={"dest_catalog"
args(2)="测试文件1"
args(3)="site"
....

解决办法:
1.有双引号将整体包裹起来
2.里面的双引号需要加\转义

如下:

/data/app/spark/bin/spark-submit --name 测试 --class com.karakal.lanchao.process.ProcessData --driver-memory 6g --master yarn --deploy-mode cluster --executor-memory 8g --num-executors 2 --executor-cores 2 \
--files hdfs://hadoop-cluster-ha/lanchao/bigdata_support/songlist/testdata.txt hdfs://hadoop-cluster-ha/lanchao/bigdata_support/sparkjar.jar \
/lanchao/bigdata_support/songlist/testdata.txt \
"{\"dest_catalog\":\"测试文件1\",\"site\":\"tencent\",\"song_settings\":{\"lower_case\":\"true\",\"remove_brackets\":\"true\",\"simple_chinese\":\"true\",\"remove_blank\":\"true\",\"remove_special\":\"true\"},\"artist_settings\":{\"lower_case\":\"true\",\"remove_brackets\":\"true\",\"simple_chinese\":\"true\",\"remove_blank\":\"true\",\"remove_special\":\"true\",\"ignore_order\":\"true\",\"match_signal\":\"true\"}}"

这样是解决了作为一个整体传入,但是又遇到了接下来得一个坑:
程序中打印参数:
Spark任务提交-json参数踩坑_第1张图片

查看日志内容:
这里写图片描述

LogType:stdout
Log Upload Time:Fri Dec 08 14:32:01 +0800 2017
LogLength:423
Log Contents:
*********************************json参数输出***********************************
{"dest_catalog":"测试文件1","site":"tencent","song_settings":{"lower_case":"true","remove_brackets":"true","simple_chinese":"true","remove_blank":"true","remove_special":"true"},"artist_settings":{"lower_case":"true","remove_brackets":"true","simple_chinese":"true","remove_blank":"true","remove_special":"true","ignore_order":"true"
End of LogType:stdout

纳尼?最后两个大括号呢?“}}”消失了!!!!
开始第一反应还以为是spark提交脚本参数长度有限制!然而经过测试验证并不是!!!
几经测试,发现只传如一个”}”,后台可以接收到,只要两个大括号在一起就出现bug!!!!没有找到好的解决方案,目前只能先将两个大括号隔开

/data/app/spark/bin/spark-submit --name 测试 --class com.karakal.lanchao.process.ProcessData --driver-memory 6g --master yarn --deploy-mode cluster --executor-memory 8g --num-executors 2 --executor-cores 2 \
--files hdfs://hadoop-cluster-ha/lanchao/bigdata_support/songlist/testdata.txt hdfs://hadoop-cluster-ha/lanchao/bigdata_support/sparkjar.jar \
/lanchao/bigdata_support/songlist/testdata.txt \
"{\"dest_catalog\":\"测试文件1\",\"song_settings\":{\"lower_case\":\"true\",\"remove_brackets\":\"true\",\"simple_chinese\":\"true\",\"remove_blank\":\"true\",\"remove_special\":\"true\"},\"artist_settings\":{\"lower_case\":\"true\",\"remove_brackets\":\"true\",\"simple_chinese\":\"true\",\"remove_blank\":\"true\",\"remove_special\":\"true\",\"ignore_order\":\"true\",\"match_signal\":\"true\"},\"site\":\"tencent\"}"

这样程序接收到完整的json参数
这里写图片描述

LogType:stdout
Log Upload Time:Fri Dec 08 14:50:32 +0800 2017
LogLength:447
Log Contents:
*********************************json参数输出***********************************
{"dest_catalog":"测试文件1","song_settings":{"lower_case":"true","remove_brackets":"true","simple_chinese":"true","remove_blank":"true","remove_special":"true"},"artist_settings":{"lower_case":"true","remove_brackets":"true","simple_chinese":"true","remove_blank":"true","remove_special":"true","ignore_order":"true","match_signal":"true"},"site":"tencent"}
End of LogType:stdout

这不是最佳解决方案,有知道的朋友欢迎,留言交流!

你可能感兴趣的:(spark,spark,json)