beeline 方式连接hive
$ beeline -u jdbc:hive2://localhost:10000/test_db -n harvey -p
数据准备
0: jdbc:hive2://localhost:10000/test_db> create table if not exists `customers`(
. . . . . . . . . . . . . . . . . . . .> `id` int,
. . . . . . . . . . . . . . . . . . . .> `name` string,
. . . . . . . . . . . . . . . . . . . .> `age` int,
. . . . . . . . . . . . . . . . . . . .> `address` string,
. . . . . . . . . . . . . . . . . . . .> `salary` double)
. . . . . . . . . . . . . . . . . . . .> row format delimited fields terminated by '\t';
load data local inpath '/Users/harvey/data/customers.txt' overwrite into table test_db.customers;
select * from test_db.customers;
+---------------+-----------------+----------------+--------------------+-------------------+
| customers.id | customers.name | customers.age | customers.address | customers.salary |
+---------------+-----------------+----------------+--------------------+-------------------+
| 1 | ross | 32 | ahmedabad | 2000.0 |
| 2 | rachel | 25 | delhi | 1500.0 |
| 3 | chandler | 23 | kota | 2000.0 |
| 4 | monika | 25 | mumbai | 6500.0 |
| 5 | mike | 27 | bhopal | 8500.0 |
| 6 | phoebe | 22 | mp | 4500.0 |
| 7 | joey | 24 | indore | 10000.0 |
+---------------+-----------------+----------------+--------------------+-------------------
0: jdbc:hive2://localhost:10000/test_db> create table if not exists `orders`(
. . . . . . . . . . . . . . . . . . . .> `oid` int,
. . . . . . . . . . . . . . . . . . . .> `date` string,
. . . . . . . . . . . . . . . . . . . .> `customer_id` int,
. . . . . . . . . . . . . . . . . . . .> `amount` double)
. . . . . . . . . . . . . . . . . . . .> row format delimited fields terminated by '\t';
load data local inpath '/Users/harvey/data/orders.txt' overwrite into table test_db.orders;
select * from test_db.orders;
+-------------+----------------------+---------------------+----------------+
| orders.oid | orders.date | orders.customer_id | orders.amount |
+-------------+----------------------+---------------------+----------------+
| 102 | 2016-10-08 00:00:00 | 3 | 3000.0 |
| 100 | 2016-10-08 00:00:00 | 3 | 1500.0 |
| 101 | 2016-11-20 00:00:00 | 2 | 1560.0 |
| 103 | 2015-05-20 00:00:00 | 4 | 2060.0 |
+-------------+----------------------+---------------------+----------------+
说明:内连接,获取两表中相同的数据
select c.id, c.name, c.age, o.amount from customers c inner join orders o on c.id = o.customer_id;
+-------+-----------+--------+-----------+
| c.id | c.name | c.age | o.amount |
+-------+-----------+--------+-----------+
| 2 | rachel | 25 | 1560.0 |
| 3 | chandler | 23 | 3000.0 |
| 3 | chandler | 23 | 1500.0 |
| 4 | monika | 25 | 2060.0 |
+-------+-----------+--------+-----------+
说明:左连接,获取左表全部数据,右表有则读取,无则使用 NULL 填充
select c.id, c.name, c.age, o.amount from customers c left join orders o on c.id = o.customer_id;
+-------+-----------+--------+-----------+
| c.id | c.name | c.age | o.amount |
+-------+-----------+--------+-----------+
| 1 | ross | 32 | NULL |
| 2 | rachel | 25 | 1560.0 |
| 3 | chandler | 23 | 3000.0 |
| 3 | chandler | 23 | 1500.0 |
| 4 | monika | 25 | 2060.0 |
| 5 | mike | 27 | NULL |
| 6 | phoebe | 22 | NULL |
| 7 | joey | 24 | NULL |
+-------+-----------+--------+-----------+
查询左表独有数据
select c.id, c.name, o.customer_id from customers c left join orders o on c.id = o.customer_id where o.customer_id is null;
+-------+---------+----------------+
| c.id | c.name | o.customer_id |
+-------+---------+----------------+
| 1 | ross | NULL |
| 5 | mike | NULL |
| 6 | phoebe | NULL |
| 7 | joey | NULL |
+-------+---------+----------------+
说明:右连接,获取右表全部数据,左表有则读取,无则使用 NULL 填充
select c.id, c.name, o.customer_id, o.amount from customers c right join orders o on c.id = o.customer_id;
+-------+-----------+----------------+-----------+
| c.id | c.name | o.customer_id | o.amount |
+-------+-----------+----------------+-----------+
| 3 | chandler | 3 | 3000.0 |
| 3 | chandler | 3 | 1500.0 |
| 2 | rachel | 2 | 1560.0 |
| 4 | monika | 4 | 2060.0 |
+-------+-----------+----------------+-----------+
说明:外连接,读取两表全部数据,没有匹配到的数据,使用NULL填充
select * from customers c full join orders o on c.id = o.customer_id;
左半连接,内关联只显示左表的数据
并不拼接两张表,两个表对 on 的条件字段做交集,返回前面表的记录,相较于其他的方法,这样子 hive 处理速度比较快
select * from customers c left semi join orders o on c.id = o.customer_id;
+-------+-----------+--------+------------+-----------+
| c.id | c.name | c.age | c.address | c.salary |
+-------+-----------+--------+------------+-----------+
| 2 | rachel | 25 | delhi | 1500.0 |
| 3 | chandler | 23 | kota | 2000.0 |
| 4 | monika | 25 | mumbai | 6500.0 |
+-------+-----------+--------+------------+-----------+
如下两条sql运行结果一致
select c.id, c.name from customers c where c.id in (select o.customer_id from orders o);
select c.id, c.name from customers c left semi join orders o on c.id = o.customer_id;
交叉关联 笛卡尔积 慎用
返回被连接的两个表的笛卡尔积,返回结果的行数等于两个表行数的乘积
# 设置为非严格模式
set hive.mapred.mode = unstrict;
select c.id, c.name, o.amount from customers c cross join orders o;