join

INNER JOIN: in an inner join, records are discarded unless join criteria finds matching records in every table being  joined.

inner join,仅当数据在join两端都出现时,才会留下来;

Most of time,Hive will use a separate mapreduce for each pair of things to join. 

Join optimizations: when joining three or more tables, if every on clause uses the same join key, a single mapreduce job will be used.

如果join时使用相同的join key, hive将会使用一个mapreduce job处理。

Hive assume the last table is the largest one,it attempts to buffer the other tables and the stream the last one,while performing joins on individual records. Therefore, you should structure your joins so that the largest table stands at the last place. 

你可能感兴趣的:(join)