Spark java实现打印 JavaPairRDD

先读取fileName,然后再将加载的结果收集一下collect,转换成List,再打印。

Configuration configuration = new Configuration();
configuration.set("io.serializations", "org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.hbase.mapreduce.ResultSerialization");
JavaPairRDD input = sc.newAPIHadoopFile(fileName, SequenceFileInputFormat.class, ImmutableBytesWritable.class, Result.class, configuration);

List, Integer>> output = input.collect()

你可能感兴趣的:(Java,大数据挖掘与大数据应用案例)