hadoop mapreduce多表关联

假设有如下两个文件,一个是表是公司和地址的序号的对应,一个表是地址的序号和地址的名称的对应。

表1:

 

A:Beijing Red Star	1

A:Shenzhen Thunder	3

A:Guangzhou Honda	2

A:Beijing Rising	1

A:Guangzhou Development Bank	2

A:Tencent	3

A:Back of Beijing	1


表2:

 

 

B:1	Beijing

B:2	Guangzhou

B:3	Shenzhen

B:4	Xian


mapreduce如下:

 

 

private static final Text typeA = new Text("A:");

	

	private static final Text typeB = new Text("B:");

	

	private static Log log = LogFactory.getLog(MTJoin.class);

	

    public static class Map extends Mapper<Object, Text, Text, MapWritable> {

    	

    	public void map(Object key, Text value, Context context)

                throws IOException, InterruptedException {

    		String valueStr = value.toString();

    		String type = valueStr.substring(0, 2);

    		String content = valueStr.substring(2);

    		log.info(content);

    		if(type.equals("A:"))

    		{

    			String[] contentArray = content.split("\t");

    			String city = contentArray[0];

    			String address = contentArray[1];

    			MapWritable map = new MapWritable();

    			map.put(typeA, new Text(city));

    			context.write(new Text(address), map);

    		}

    		else if(type.equals("B:"))

    		{

    			String[] contentArray = content.split("\t");

    			String adrNum = contentArray[0];

    			String adrName = contentArray[1];

    			MapWritable map = new MapWritable();

    			map.put(typeB, new Text(adrName));

    			context.write(new Text(adrNum), map);

    		}

    	}

    }

    

    public static class Reduce extends Reducer<Text, MapWritable, Text, Text> {

    	

    	

    	

    	 public void reduce(Text key, Iterable<MapWritable> values, Context context)

                 throws IOException, InterruptedException {

    		 Iterator<MapWritable> it = values.iterator();

    		 List<Text> cityList = new ArrayList<Text>();

    		 List<Text> adrList = new ArrayList<Text>();

    		 while(it.hasNext())

    		 {

    			 MapWritable map = it.next();

    			 if(map.containsKey(typeA))

    			 {

    				 cityList.add((Text)map.get(typeA));

    			 }

    			 else if(map.containsKey(typeB))

    			 {

    				 adrList.add((Text)map.get(typeB));

    			 }

    		 }

    		 for(int i = 0; i < cityList.size(); i++)

    		 {

    			 for(int j = 0; j < adrList.size(); j++)

    			 {

    				 context.write(cityList.get(i), adrList.get(j));

    			 }

    		 }

    	 }

    }

原理很简单,map的出口,以地址的序号作为key,然后出来的时候,公司名称放一个list,地址的名称放一个list,两个list的内容作笛卡儿积,就得到了结果。

 

输出如下:

 

Beijing Red Star	Beijing

Beijing Rising	Beijing

Back of Beijing	Beijing

Guangzhou Honda	Guangzhou

Guangzhou Development Bank	Guangzhou

Shenzhen Thunder	Shenzhen

Tencent	Shenzhen



 

 

你可能感兴趣的:(mapreduce)