spark统计共同好友

数据格式 ,,.....

eg: aa,bb,cc,dd,ee

      bb,aa,dd,ee

      cc,aa

      dd,aa,bb

      ee,aa,bb

通过flatmaptopair将数据变成以的格式。如下:

如此键相同时取交集即为两者共同好友。

注:相同,所以需要以一定顺序排序

代码如下:

public class CommonFrends {
    public static void main(String[] args) {
        SparkConf conf = new SparkConf();
        conf.setAppName("commonFrends").setMaster("local");
        JavaSparkContext sc = new JavaSparkContext(conf);
        JavaRDD javaRDD = sc.textFile("/opt/hadoop/frends.txt");
        JavaPairRDD pairRDD = javaRDD.flatMapToPair(x -> {
            List> l=new ArrayList>();
            String[] frends = x.split(",");
            for (int i = 1; i < frends.length; i++) {
                if (frends[0].compareTo(frends[i]) < 0) {
                    l.add(new Tuple2(frends[0] + frends[i], x.replace(frends[0]+",","")));
                } else {
                    l.add(new Tuple2(frends[i] + frends[0], x.replace(frends[0]+",","")));
                }
            }
            return l.iterator();
        }).persist(StorageLevel.MEMORY_AND_DISK());
        JavaPairRDD> rdd = pairRDD.groupByKey().mapValues(x -> {
            Map map = new HashMap();
            for (String s : x) {
                String[] frends = s.split(",");
                if (s == null || s.isEmpty()) {
                    continue;
                }
                for (String f : frends) {
                    if (map.get(f) == null) {
                        map.put(f, 1);
                    } else {
                        map.put(f, map.get(f) + 1);
                    }
                }
            }
            List commonFrends = new ArrayList();
            for (String m : map.keySet()) {
                if (map.get(m) > 1) {
                    commonFrends.add(m);
                }
            }
            return commonFrends;
        });
        pairRDD.saveAsTextFile("/opt/spark/commonFrend");
        rdd.saveAsTextFile("/opt/spark/commonFrends");


    }
}

你可能感兴趣的:(学习随笔)