hive join 查询

beeline 方式连接hive

$ beeline -u jdbc:hive2://localhost:10000/test_db -n harvey -p

数据准备

  • customers 表
0: jdbc:hive2://localhost:10000/test_db> create table if not exists `customers`(
. . . . . . . . . . . . . . . . . . . .> `id` int,
. . . . . . . . . . . . . . . . . . . .> `name` string,
. . . . . . . . . . . . . . . . . . . .> `age` int,
. . . . . . . . . . . . . . . . . . . .> `address` string,
. . . . . . . . . . . . . . . . . . . .> `salary` double)
. . . . . . . . . . . . . . . . . . . .> row format delimited fields terminated by '\t';

load data local inpath '/Users/harvey/data/customers.txt' overwrite into table test_db.customers;

select * from test_db.customers;
+---------------+-----------------+----------------+--------------------+-------------------+
| customers.id  | customers.name  | customers.age  | customers.address  | customers.salary  |
+---------------+-----------------+----------------+--------------------+-------------------+
| 1             | ross            | 32             | ahmedabad          | 2000.0            |
| 2             | rachel          | 25             | delhi              | 1500.0            |
| 3             | chandler        | 23             | kota               | 2000.0            |
| 4             | monika          | 25             | mumbai             | 6500.0            |
| 5             | mike            | 27             | bhopal             | 8500.0            |
| 6             | phoebe          | 22             | mp                 | 4500.0            |
| 7             | joey            | 24             | indore             | 10000.0           |
+---------------+-----------------+----------------+--------------------+-------------------
  • orders 表
0: jdbc:hive2://localhost:10000/test_db> create table if not exists `orders`(
. . . . . . . . . . . . . . . . . . . .> `oid` int,
. . . . . . . . . . . . . . . . . . . .> `date` string,
. . . . . . . . . . . . . . . . . . . .> `customer_id` int,
. . . . . . . . . . . . . . . . . . . .> `amount` double)
. . . . . . . . . . . . . . . . . . . .> row format delimited fields terminated by '\t';

load data local inpath '/Users/harvey/data/orders.txt' overwrite into table test_db.orders;

select * from test_db.orders;
+-------------+----------------------+---------------------+----------------+
| orders.oid  |     orders.date      | orders.customer_id  | orders.amount  |
+-------------+----------------------+---------------------+----------------+
| 102         | 2016-10-08 00:00:00  | 3                   | 3000.0         |
| 100         | 2016-10-08 00:00:00  | 3                   | 1500.0         |
| 101         | 2016-11-20 00:00:00  | 2                   | 1560.0         |
| 103         | 2015-05-20 00:00:00  | 4                   | 2060.0         |
+-------------+----------------------+---------------------+----------------+

1.inner join

说明:内连接,获取两表中相同的数据

select c.id, c.name, c.age, o.amount from customers c inner join orders o on c.id = o.customer_id;
+-------+-----------+--------+-----------+
| c.id  |  c.name   | c.age  | o.amount  |
+-------+-----------+--------+-----------+
| 2     | rachel    | 25     | 1560.0    |
| 3     | chandler  | 23     | 3000.0    |
| 3     | chandler  | 23     | 1500.0    |
| 4     | monika    | 25     | 2060.0    |
+-------+-----------+--------+-----------+

2.left join

说明:左连接,获取左表全部数据,右表有则读取,无则使用 NULL 填充

select c.id, c.name, c.age, o.amount from customers c left join orders o on c.id = o.customer_id;
+-------+-----------+--------+-----------+
| c.id  |  c.name   | c.age  | o.amount  |
+-------+-----------+--------+-----------+
| 1     | ross      | 32     | NULL      |
| 2     | rachel    | 25     | 1560.0    |
| 3     | chandler  | 23     | 3000.0    |
| 3     | chandler  | 23     | 1500.0    |
| 4     | monika    | 25     | 2060.0    |
| 5     | mike      | 27     | NULL      |
| 6     | phoebe    | 22     | NULL      |
| 7     | joey      | 24     | NULL      |
+-------+-----------+--------+-----------+

查询左表独有数据

select c.id, c.name, o.customer_id from customers c left join orders o on c.id = o.customer_id where o.customer_id is null;

+-------+---------+----------------+
| c.id  | c.name  | o.customer_id  |
+-------+---------+----------------+
| 1     | ross    | NULL           |
| 5     | mike    | NULL           |
| 6     | phoebe  | NULL           |
| 7     | joey    | NULL           |
+-------+---------+----------------+

3.right join

说明:右连接,获取右表全部数据,左表有则读取,无则使用 NULL 填充

select c.id, c.name, o.customer_id, o.amount from customers c right join orders o on c.id = o.customer_id;
+-------+-----------+----------------+-----------+
| c.id  |  c.name   | o.customer_id  | o.amount  |
+-------+-----------+----------------+-----------+
| 3     | chandler  | 3              | 3000.0    |
| 3     | chandler  | 3              | 1500.0    |
| 2     | rachel    | 2              | 1560.0    |
| 4     | monika    | 4              | 2060.0    |
+-------+-----------+----------------+-----------+

4.full join

说明:外连接,读取两表全部数据,没有匹配到的数据,使用NULL填充

select * from customers c full join orders o on c.id = o.customer_id;

hive join 查询_第1张图片

5.left semi join

左半连接,内关联只显示左表的数据

并不拼接两张表,两个表对 on 的条件字段做交集,返回前面表的记录,相较于其他的方法,这样子 hive 处理速度比较快

select * from customers c left semi join orders o on c.id = o.customer_id;

+-------+-----------+--------+------------+-----------+
| c.id  |  c.name   | c.age  | c.address  | c.salary  |
+-------+-----------+--------+------------+-----------+
| 2     | rachel    | 25     | delhi      | 1500.0    |
| 3     | chandler  | 23     | kota       | 2000.0    |
| 4     | monika    | 25     | mumbai     | 6500.0    |
+-------+-----------+--------+------------+-----------+

如下两条sql运行结果一致

select c.id, c.name from customers c where c.id in (select o.customer_id from orders o);

select c.id, c.name from customers c left semi join orders o on c.id = o.customer_id;

6.cross join

交叉关联 笛卡尔积 慎用
返回被连接的两个表的笛卡尔积,返回结果的行数等于两个表行数的乘积

# 设置为非严格模式
set hive.mapred.mode = unstrict;
select c.id, c.name, o.amount from customers c cross join orders o;

你可能感兴趣的:(Hive)