class Mapper
method Map(dociddid, doc d)
for all word w 属于 d
for all word u 属于 Window(w)
//发射出现计数 1
Emit(pair (w, u), 1)
class Reducer
method Reduce(pair p; countlist [c1, c2,..])
s = 0
for all count c in countlist [c1, c2, ...]
s = s+ c
Emit(pair p, count s)
例如,
在Map阶段,一个Map节点接受到如图所示的一个文档的内容,窗口大小为7
,那么首先窗口先覆盖了 MapReduceisanewtechniquetoprocessbigdata M a p R e d u c e i s a n e w t e c h n i q u e t o p r o c e s s b i g d a t a ,然后,该结点将键值对 ((MapReduce,is),1) ( ( M a p R e d u c e , i s ) , 1 ) , ((MapReduce,a),1) ( ( M a p R e d u c e , a ) , 1 ) , ((MapReduce,technique),1) ( ( M a p R e d u c e , t e c h n i q u e ) , 1 ) , ((MapReduce,to),1) ( ( M a p R e d u c e , t o ) , 1 ) , ((MapReduce,process),1) ( ( M a p R e d u c e , p r o c e s s ) , 1 ) 发射出去。随后窗口向后滑动一格,与上面相似,这时将 ((is,a),1) ( ( i s , a ) , 1 ) , ((is,new),1) ( ( i s , n e w ) , 1 ) , ((is,technique),1) ( ( i s , t e c h n i q u e ) , 1 ) , ((is,to),1) ( ( i s , t o ) , 1 ) , ((is,process),1) ( ( i s , p r o c e s s ) , 1 ) , ((is,big),1) ( ( i s , b i g ) , 1 ) ,发射出去。最后再向后滑动一一个单词至文档的末尾,与上面相似,发送相应的键值对出去。当窗口尾部已经到达文档尾部时,滑动窗口则通过将窗口头部向后“缩进”来进行,此过程一直进行到窗口大小为2停止。
public class WordPair implements WritableComparable{
private String wordA;
private String wordB;
public WordPair(){
}
public WordPair(String wordA,String wordB){
this.wordA = wordA;
this.wordB = wordB;
}
public String getWordA(){
return this.wordA;
}
public String getWordB(){
return this.wordB;
}
@Override
public void write(DataOutput out) throws IOException {
// TODO Auto-generated method stub
out.writeUTF(wordA);
out.writeUTF(wordB);
}
@Override
public void readFields(DataInput in) throws IOException {
// TODO Auto-generated method stub
wordA = in.readUTF();
wordB = in.readUTF();
}
@Override
public String toString(){
return wordA + "," + wordB;
}
@Override
public int compareTo(WordPair o) {
if(this.equals(o))
return 0;
else
return (wordA + wordB).compareTo(o.getWordA() + o.getWordB());
}
@Override
public boolean equals(Object o){
//无序对,不用考虑顺序
if(!(o instanceof WordPair))
return false;
WordPair w = (WordPair)o;
if((this.wordA.equals(w.wordA) && this.wordB.equals(w.wordB))
|| (this.wordB.equals(w.wordA) && this.wordA.equals(w.wordB)))
return true;
return false;
}
@Override
public int hashCode(){
return (wordA.hashCode() + wordB.hashCode()) * 17;
}
}
private int windowSize;
private Queue windowQueue = new LinkedList();
@Override
protected void setup(Context context) throws IOException,InterruptedException{
windowSize = Math.min(context.getConfiguration().getInt("window", 2) , MAX_WINDOW);
}
/**
* 输入键位文档的文件名,值为文档中的内容的字节形式。
*
*/
@Override
public void map(Text docName, BytesWritable docContent, Context context)throws
IOException, InterruptedException{
Matcher matcher = wordPattern.matcher(new String(docContent.getBytes(),"UTF-8"));
while(matcher.find()){
windowQueue.add(matcher.group());
if(windowQueue.size() >= windowSize){
//对于队列中的元素[q1,q2,q3...qn]发射[(q1,q2),1],[(q1,q3),1],
//...[(q1,qn),1]出去
Iterator it = windowQueue.iterator();
String w1 = it.next();
while(it.hasNext()){
String next = it.next();
context.write(new WordPair(w1, next), one);
}
windowQueue.remove();
}
}
if(!(windowQueue.size() <= 1)){
Iterator it = windowQueue.iterator();
String w1 = it.next();
while(it.hasNext()){
context.write(new WordPair(w1,it.next()), one);
}
}
}