源数据(xxx的好友列表):
A:B,C,D,F,E,O
B:A,C,E,K
C:F,A,D,I
D:A,E,F,L
E:B,C,D,M,L
F:A,B,C,D,E,O,M
G:A,C,D,E,F
H:A,C,D,E,O
I:A,O
J:B,O
K:A,C,D
L:D,E,F
M:E,F,G
O:A,H,I,J
1、先求A,B,C...是谁的好友,也就是说,哪些人的友好里有A,哪些人的友好里有B,哪些人的友好里有C,......
以,前两条数据为例:
我的思路是:比对A是否在A的好友列表中,肯定不在,再不对B的好友列表,A再B的好友列表中,那么,A就是B的好友,依次比对CDEF.....
Map的代码如下:
public class FriendFirstMapper extends Mapper
HashMap
HashMap
@Override
protected void setup(Mapper
throws IOException, InterruptedException {
URI[] uris = context.getCacheFiles();
String path = uris[0].getPath().toString();
BufferedReader bReader = new BufferedReader(new InputStreamReader(new FileInputStream(path), "UTF-8"));
String line;
while ((line = bReader.readLine()) != null) {
char s = line.trim().charAt(0);
// 65279是空字符
if (s == 65279) {
if (line.length() > 1) {
line = line.substring(1);
}
}
// 2 切割
String[] fields = line.split(":");
// 3 缓存数据到集合
map.put(fields[0], fields[1]);
}
IOUtils.closeStream(bReader);
}
Text k = new Text();
Text v = new Text();
@Override
protected void map(LongWritable key, Text value, Mapper
throws IOException, InterruptedException {
String line = value.toString();
String[] fields = line.split(":");
// A:B,C,D,F,E,O
StringBuilder valueString = new StringBuilder();
for (String kString : map.keySet()) {
if (!kString.equals(fields[0])&&map.get(kString).contains(fields[0]) ) {
valueString.append(kString + ",");
}
}
k.set(fields[0]);
v.set(valueString.toString());
context.write(k, v);
}
}
Driver如下:
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException, URISyntaxException {
BasicConfigurator.configure();
args = new String[] { "f:/hadoopinput/firends.txt", "f:/hadoopoutput" };
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(FriendFirstDriver.class);
job.setMapperClass(FriendFirstMapper.class);
// job.setMapOutputKeyClass(Text.class);
// job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job, new Path("f:/hadoopinput/friends.txt"));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.addCacheFile(new URI("file:///f:/hadoopinput/friends.txt"));
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
第一步执行完的数据:
A B,C,D,F,G,H,I,K,O,
B A,E,F,J,
C A,B,E,F,G,H,K,
D A,C,E,F,G,H,K,L,
E A,B,D,F,G,H,L,M,
F A,C,D,G,L,M,
G M,
H O,
I C,O,
J O,
K B,
L D,E,
M E,F,
O A,F,H,I,J,
对输出数据的描述:后面的一列的共同好友是前面的那一列,例如第一行数据,B,C,D,F,G,H,I,K,O的共同好友是A
2、既然求的是两两的共同好友,那么B,C的共同好友肯定至少有A;B,D的共同好友肯定至少有A;B,F的共同好友肯定至少有A;B,G的共同好友肯定至少有A;B,H的共同好友肯定至少有A;B,I的共同好友肯定至少有A.......,也就是后面的数据(B,C,D,F,G,H,I,K,O)两两组合作为key,共同好友A作为value,,输出给Reducer,再通过Reducer汇总共同好友。
注意:两两组合作为key,B和C合成的key与C和B合成的key应该是一样的。两两组合之前排序B,C,D,F,G,H,I,K,O可以保证不出现CB和BC同时作为key的现象。
Map:
@Override
protected void map(LongWritable key, Text value, Mapper
throws IOException, InterruptedException {
String line = value.toString();
//A B,C,D,F,G,H,I,K,O
String[] fileds = line.split("\t");
String[] friends = fileds[1].split(",");
Arrays.sort(friends);
for (int i = 0; i < friends.length; i++) {
for (int j = i+1; j < friends.length; j++) {
k.set(friends[i]+"&"+friends[j]);
v.set(fileds[0]);
context.write(k, v);
}
}
}
Reducer:
Text v = new Text();
@Override
protected void reduce(Text key, Iterable
throws IOException, InterruptedException {
StringBuilder builder = new StringBuilder();
for (Text value : values) {
builder.append(value+",");
}
v.set(builder.toString());
context.write(key, v);
输出结果:
A&B E,C,
A&C D,F,
A&D E,F,
A&E D,B,C,
A&F O,B,C,D,E,
A&G F,E,C,D,
A&H E,C,D,O,
A&I O,
A&J O,B,
A&K D,C,
A&L F,E,D,
A&M E,F,
B&C A,
B&D A,E,
B&E C,
B&F E,A,C,
B&G C,E,A,
B&H A,E,C,
B&I A,
B&K C,A,
B&L E,
B&M E,
B&O A,
C&D A,F,
C&E D,
C&F D,A,
C&G D,F,A,
C&H D,A,
C&I A,
C&K A,D,
C&L D,F,
C&M F,
C&O I,A,
D&E L,
D&F A,E,
D&G E,A,F,
D&H A,E,
D&I A,
D&K A,
D&L E,F,
D&M F,E,
D&O A,
E&F D,M,C,B,
E&G C,D,
E&H C,D,
E&J B,
E&K C,D,
E&L D,
F&G D,C,A,E,
F&H A,D,O,E,C,
F&I O,A,
F&J B,O,
F&K D,C,A,
F&L E,D,
F&M E,
F&O A,
G&H D,C,E,A,
G&I A,
G&K D,A,C,
G&L D,F,E,
G&M E,F,
G&O A,
H&I O,A,
H&J O,
H&K A,C,D,
H&L D,E,
H&M E,
H&O A,
I&J O,
I&K A,
I&O A,
K&L D,
K&O A,
L&M E,F,