spark2.x读取csv文件

spark2.x读取csv文件,写入到文件系统中(例如hdfs s3 或者本地)

println("day:::"+day)
ss.read.format("csv").option("header", "true").option("delimiter", "\t").option("mode", "DROPMALFORMED").csv(s"D://mcd-user-$day.txt").createOrReplaceTempView("t_tbl"); //读取csv .option("header", "true")csv中包含表头 生成dataframe,并注册临时表

val df2=ss.sql(s"select country,deviceId,imsi,ip,split(version,'=')[0] as deviceband,split(version,'=')[1] as os_version from t_tbl");

df2.show(5)
ss.close()

dataframe写成text文件到hdfs上代码

//ALTER TABLE my_partition_test_table DROP IF EXISTS PARTITION (p_loctype='MHA');
ss.sql(s"alter table dmp.dmp_mcd_user drop if exists partition (day=$day)")
S3Oper.removePath(ss,s"D://dmp_mcd_user/day=$day")
df2.rdd.map(row=>row.get(0)+"\001"+row.get(1)+"\001"
 +row.get(2)+"\001"+row.get(3)
 +"\001"+row.get(4)+"\001"+row.get(5)).saveAsTextFile(s"hdfs://dmp_mcd_user/day=$day")//.mode("overwrite").save(s"hdfs://dmp_mcd_user/day=$day")
ss.sql(s"alter table dmp.dmp_mcd_user add if not exists partition (day=$day)")
println("day:::"+day)
df2.printSchema()
ss.close()

scala代码如上图

你可能感兴趣的:(spark)