/**
* Perform the actual join recursively.
*
* @param tags
* a list of input tags
* @param values
* a list of value lists, each corresponding to one input source
* @param pos
* indicating the next value list to be joined
* @param partialList
* a list of values, each from one value list considered so far.
* @param key
* @param output
* @throws IOException
*/
private void joinAndCollect(Object[] tags, ResetableIterator[] values,
int pos, Object[] partialList, Object key,
OutputCollector output, Reporter reporter) throws IOException {
if (values.length == pos) {
// get a value from each source. Combine them
TaggedMapOutput combined = combine(tags, partialList);
collect(key, combined, output, reporter);
return;
}
ResetableIterator nextValues = values[pos];
nextValues.reset();
while (nextValues.hasNext()) {
Object v = nextValues.next();
partialList[pos] = v;
joinAndCollect(tags, values, pos + 1, partialList, key, output, reporter);
}
}
tags 为join操作的数据源个数,例如
客户数据:
customer ID Name PhomeNumber
1 赵一 025-5455-566
2 钱二 025-4587-565
3 孙三 021-5845-5875
客户的订单号:
Customer ID order ID Price Data
2 1 93 2008-01-08
3 2 43 2012-01-21
1 3 43 2012-05-12
2 4 32 2012-5-14
tags 为2,partialList[ ]存放的是join 匹配到的2个数据源的数据如
partialList[0] 为 2 钱二 025-4587-565
partialList[1] 为 2 1 93 2008-01-08
需要自己实现的方法
/**
*
* @param tags
* a list of source tags
* @param values
* a value per source
* @return combined value derived from values of the sources
*/
protected abstract TaggedMapOutput combine(Object[] tags, Object[] values);
就是 TaggedMapOutput combined = combine(tags, partialList);
对join的数据进行处理