每天晚上20:00YY频道现场授课频道68917580每天晚上20:00YY频道现场授课频道68917580
/* * *王家林老师授课http://weibo.com/ilovepains */
源文件
valuserData=Array(
"2016-3-27,001,http://spark.apache.org/,1000",
"2016-3-27,001,http://hadoop.apache.org/,1001",
"2016-3-27,002,http://fink.apache.org/,1002",
"2016-3-28,003,http://kafka.apache.org/,1020",
"2016-3-28,004,http://spark.apache.org/,1010",
"2016-3-28,002,http://hive.apache.org/,1200",
"2016-3-28,001,http://parquet.apache.org/,1500",
"2016-3-28,001,http://spark.apache.org/,1800"
)
userDataDF.groupBy("time").agg('time,countDistinct('id)).show()
运行结果
userDataDF.groupBy("time").agg('time,countDistinct('id))
.map(row=>Row(row(1),row(2))).collect.foreach(println)
userDataDF.groupBy("time").agg('time,sum('amount)).show()
运行结果