大数据Spark “蘑菇云”行动第72课: 基于Spark 2.0.1项目实现之二. 实战 各种小bug修复及性能调优 200并行度调整为2个task

 大数据Spark “蘑菇云”行动第72课: 基于Spark 2.0.1项目实现之二.

源数据格式 及代码的小bug修复。

 

 

大数据Spark “蘑菇云”行动第72课: 基于Spark 2.0.1项目实现之二.

 

规律:agg前一般都进行grouBy操作

 

 







大数据Spark “蘑菇云”行动第72课: 基于Spark 2.0.1项目实现之二.

 

规律:agg前一般都进行grouBy操作

 

 

 

 

 

 

{"userID":"userID5234","Name":"zhangsan","Gender":"man","Occupation":"student"}
{"userID":"userID2234","Name":"lisi","Gender":"woman","Occupation":"teacher"}
{"userID":"userID4234","Name":"wangwu","Gender":wo"man","Occupation":"student"}
{"userID":"userID5234","Name":"wangwu","Gender":"man","Occupation":"student"}


 

 

{"logID":"logID1111", "userID":"userID1234","time":"20161103","typed":"0","location":"shanghai","consumed":"100"}
{"logID":"logID2222", "userID":"userID2234","time":"20161103","typed":"0","location":"beijing","consumed":"200"}
{"logID":"logID3333", "userID":"userID3234","time":"20161103","typed":"0","location":"guangzhou","consumed":"300"}
{"logID":"logID4444", "userID"

你可能感兴趣的:(大数据蘑菇云行动)