spark sql 怎样处理日期类型、时间类型
json 每个对象 不能换行
##问题描述
json File 日期类型 怎样处理?怎样从字符型,转换为Date或DateTime类型?
json文件如下,有字符格式的日期类型
```
{ "name" : "Andy", "age" : 30, "time" :"2015-03-03T08:25:55.769Z"}
{ "name" : "Justin", "age" : 19, "time" : "2015-04-04T08:25:55.769Z" }
{ "name" : "pan", "age" : 49, "time" : "2015-05-05T08:25:55.769Z" }
{ "name" : "penny", "age" : 29, "time" : "2015-05-05T08:25:55.769Z" }
```
默认推测的Schema:
```
root
|-- _corrupt_record: string (nullable = true)
|-- age: long (nullable = true)
|-- name: string (nullable = true)
|-- time200: string (nullable = true)
```
测试代码
```
val fileName = "person.json"
val sc = SparkUtils.getScLocal("json file 测试")
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val jsonFile = sqlContext.read.json(fileName)
jsonFile.printSchema()
```
##解决方案
### 方案一、json数据 时间为 long 秒或毫秒
### 方案二、自定义schema
```
val fileName = "person.json"
val sc = SparkUtils.getScLocal("json file 测试")
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val schema: StructType = StructType(mutable.ArraySeq(
StructField("name", StringType, true),
StructField("age", StringType, true),
StructField("time", TimestampType, true)));
val jsonFile = sqlContext.read.schema(schema).json(fileName)
jsonFile.printSchema()
jsonFile.registerTempTable("person")
val now: Timestamp = new Timestamp(System.currentTimeMillis())
val teenagers = sqlContext.sql("SELECT * FROM person WHERE age >= 20 AND age <= 30 AND time <='" +now+"'")
teenagers.foreach(println)
val dataFrame = sqlContext.sql("SELECT * FROM person WHERE age >= 20 AND age <= 30 AND time <='2015-03-03 16:25:55.769'")
dataFrame.foreach(println)
```
###方案三、sql建表
创建表sql
```
CREATE TEMPORARY TABLE person IF NOT EXISTS
[(age: long ,name:string ,time:Timestamp)]
USING org.apache.spark.sql.json
OPTIONS ( path 'person.json')
语法
CREATE [TEMPORARY] TABLE [IF NOT EXISTS]
[(col-name data-type [, …])]
USING [OPTIONS ...]
[AS ]
```
### 方案四、用textfile convert
来自为知笔记(Wiz)