Scala004-DataFrame整列String转timestamp

Intro

  DataFrame中有一列是String格式,字符串类型为"yyyyMMdd",需要把它转换成"timestamp"。可能有很多方法,udf啦等等,这里放一个相对简单的。

构造数据

import org.apache.spark.sql.functions._
import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window
val df = Seq(
  ("A1", 25, 1,0.64,0.36,"20200101"),
  ("A1", 26, 1,0.34,0.66,"20200102"),
  ("B1", 27, 0,0.55,0.45,"20200103"),
  ("C1", 30, 0,0.14,0.86,"20200104")
  ).toDF("id", "age", "label","pro0","pro1","dateStr")
df.printSchema()
df.show()
Intitializing Scala interpreter ...



Spark Web UI available at http://DESKTOP-LAO32FQ:4043
SparkContext available as 'sc' (version = 2.4.4, master = local[*], app id = local-1583251961417)
SparkSession available as 'spark'



root
 |-- id: string (nullable = true)
 |-- age: integer (nullable = false)
 |-- label: integer (nullable = false)
 |-- pro0: double (nullable = false)
 |-- pro1: double (nullable = false)
 |-- dateStr: string (nullable = true)

+---+---+-----+----+----+--------+
| id|age|label|pro0|pro1| dateStr|
+---+---+-----+----+----+--------+
| A1| 25|    1|0.64|0.36|20200101|
| A1| 26|    1|0.34|0.66|20200102|
| B1| 27|    0|0.55|0.45|20200103|
| C1| 30|    0|0.14|0.86|20200104|
+---+---+-----+----+----+--------+






import org.apache.spark.sql.functions._
import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window
df: org.apache.spark.sql.DataFrame = [id: string, age: int ... 4 more fields]

列类型转换

转换之后,时分秒均为0

df.withColumn("date",unix_timestamp(col("dateStr"),"yyyyMMdd").cast("timestamp")).show()
+---+---+-----+----+----+--------+-------------------+
| id|age|label|pro0|pro1| dateStr|               date|
+---+---+-----+----+----+--------+-------------------+
| A1| 25|    1|0.64|0.36|20200101|2020-01-01 00:00:00|
| A1| 26|    1|0.34|0.66|20200102|2020-01-02 00:00:00|
| B1| 27|    0|0.55|0.45|20200103|2020-01-03 00:00:00|
| C1| 30|    0|0.14|0.86|20200104|2020-01-04 00:00:00|
+---+---+-----+----+----+--------+-------------------+

                                2020-03-04 于南京市栖霞区

你可能感兴趣的:(★★★Scala,#,★★Scala应用)