最近有个spark任务涉及到scala操作json,大概流程是这样:从hbase取数据,每条数据先parse json,然后删除一个多余的key,最后在弄成json字符串,输出到hdfs。
json大概长这样,{“@type”:{"version":"1.0.2","name":"application-content","data":[]},"key-to-remove":[{"blah":"more blah"}],"@value":[]}
逻辑不复杂,读取hbase的部分在此略去,json相关代码如下,用fastjson解析json
package dev.json
import com.alibaba.fastjson.JSON
object Course1 {
def main(args: Array[String]): Unit = {
val key = "key-to-remove"
val s =
"""
|{"@type":{"version":"1.0.2","name":"application-content","data":[]},"key-to-remove":[{"blah":"more blah"}],"@value":[]}
|""".stripMargin
val obj = JSON.parseObject(s)
obj.remove(key)
val out = obj.toJSONString
println(out)
}
}
然后就是一顿报错
Exception in thread "main" com.alibaba.fastjson.JSONException: expect ':' at 2, actual "
at com.alibaba.fastjson.parser.DefaultJSONParser.parseObject(DefaultJSONParser.java:296)
at com.alibaba.fastjson.parser.DefaultJSONParser.parse(DefaultJSONParser.java:1401)
at com.alibaba.fastjson.parser.DefaultJSONParser.parse(DefaultJSONParser.java:1367)
at com.alibaba.fastjson.JSON.parse(JSON.java:183)
at com.alibaba.fastjson.JSON.parse(JSON.java:193)
at com.alibaba.fastjson.JSON.parse(JSON.java:149)
at com.alibaba.fastjson.JSON.parseObject(JSON.java:254)
at dev.json.Course1$.main(Course1.scala:12)
at dev.json.Course1.main(Course1.scala)
以前用fastjson从来没碰到这样的问题,一顿百度,然后才知道是里面包含了@type的key,autotype is not supported,阿里出于安全考虑,@type容易注入一些不安全操作,所以抛出错误。查了一些资料,总算是修复了,需要加上一些选项,把所在包添加白名单,从而关闭对@type的解析。代码如下:
package dev.json
import com.alibaba.fastjson.JSON
import com.alibaba.fastjson.parser.{Feature, ParserConfig}
object Course1 {
// 添加包白名单
ParserConfig.getGlobalInstance.addAccept("dev.json")
def main(args: Array[String]): Unit = {
val key = "key-to-remove"
val s =
"""
|{"@type":{"version":"1.0.2","name":"application-content","data":[]},"key-to-remove":[{"blah":"more blah"}],"@value":[]}
|""".stripMargin
// 关闭特殊key检查
val obj = JSON.parseObject(s, Feature.DisableSpecialKeyDetect)
obj.remove(key)
val out = obj.toJSONString
println(out)
}
}
然后结果就可以正常解析,输出如下:
{"@value":[],"@type":{"data":[],"name":"application-content","version":"1.0.2"}}
这次报错,加上前段时间因为fastjson漏洞事件,公司要求紧急升级fastjson版本,瞬间对fastjson印象不那么好了,说不定哪天就全面禁止在项目中使用fastjson。所以顺便尝试了jackson的scala版本,json4s。json4s使用起来也不那么顺手,但也够用。上面的功能,用json4s重写了一个版本。
package dev.json
import org.json4s.DefaultFormats
import org.json4s.JsonDSL._
import org.json4s.jackson.JsonMethods._
object Course2 {
implicit val formats = DefaultFormats
def main(args: Array[String]): Unit = {
val key = "key-to-remove"
val s =
"""
|{"@type":{"version":"1.0.2","name":"application-content","data":[]},"key-to-remove":[{"blah":"more blah"}],"@value":[]}
|""".stripMargin
val obj = parse(s)
if (null != obj) {
val obj2 = obj.removeField(_._1.equals(key))
val out = compact(render(obj2))
println(out)
}
}
}