HIVE基础函数

1、聚合函数

聚合函数max/min/count/sum/avg 特点多进一出

hive> select count(1) from ruozedata_emp where deptno = 10;                 聚合函数会走MapReduce

OK

3

Time taken: 23.402 seconds, Fetched: 1 row(s)

hive> select max(salary),min(salary),avg(salary),sum(salary) from ruozedata_emp;

OK

10300.0 800.0  2621.6666666666665      39325.0

Time taken: 23.232 seconds, Fetched: 1 row(s)

2、分组函数    group by

注意:select 中出现的字段,如果没有出现在组函数/聚合函数中,必须出现在group by里面,否则会报错

hive> select deptno,avg(salary) from ruozedata_emp group by deptno;

OK

NULL    10300.0

10      2916.6666666666665

20      2175.0

30      1566.6666666666667

Time taken: 22.19 seconds, Fetched: 4 row(s)

       1. 求每个部门(deptno)、工作岗位(job)的最高工资(salary)

    hive> select max(salary) from ruozedata_emp group by deptno order by job;

        2.求每个部门的平均工资大于2000的部门

    hive> select deptno,avg(salary) from ruozedata_emp group by deptno having avg(salary)>2000;

where 需要在group by 之前

3、cash    when     then

        hive> select ename,salary,

    > case

    > when salary>1 and salary<=1000 then 'lower'

    > when salary>1000 and salary<=2000 then 'middle'

    > when salary>2000 and salary<=4000 then 'high'

    > else 'highest'

    > end

    > from ruozedata_emp;

OK

SMITH  800.0  lower

ALLEN  1600.0  middle

WARD    1250.0  middle

JONES  2975.0  high

MARTIN  1250.0  middle

BLAKE  2850.0  high

CLARK  2450.0  high

SCOTT  3000.0  high

KING    5000.0  highest

TURNER  1500.0  middle

ADAMS  1100.0  middle

JAMES  950.0  lower

FORD    3000.0  high

MILLER  1300.0  middle

HIVE    10300.0 highest

Time taken: 0.117 seconds, Fetched: 15 row(s)

4、join

内连接:inner   join =  join

结果为完全匹配的内容

hive> select * from a join b on a.id=b.id;

1          ruoze          1          30

2          j                   2          29

Time taken: 174.328 seconds, Fetched: 2 row(s)

外链接:left    right    all

left:左表全

hive> select * from a left join b on a.id= b.id;

1         ruoze          1          30

2          j                  2          29

3          k              NULL      NULL

Time taken: 31.09 seconds, Fetched: 3 row(s)

right:右表全

hive> select * from a right join b on a.id= b.id;

1          ruoze          1          30

2          j                  2          29

NULL NULL            4          21

Time taken: 23.674 seconds, Fetched: 3 row(s)

full:两个表数据都在,没有的字段用null 补充

hive> select * from a full join b on a.id= b.id;

1          ruoze          1          30

2          j                  2          29

3          k              NULL      NULL

NULL NULL          4          21

Time taken: 33.676 seconds, Fetched: 4 row(s)

笛卡尔积(不会报错)

hive> select * from a join b ;(未加条件)

HIVE基础函数_第1张图片
笛卡尔积

5、静态分区

创建:hive> create table order_partition(  ordernumber string,     eventtime string  )    partitioned by (event_month string) row format delimited fields terminated by '\t';

查看表结构:hive> desc formatted order_partition

# Partition Information

# col_name              data_type              comment           

event_month            string

你可能感兴趣的:(HIVE基础函数)