hive中count&sum的区别

首先创建一张测试表

>create table tmp.guanwm_test (a string, b int);

>insert into table tmp.guanwm_test values ("a", 1);
>insert into table tmp.guanwm_test values ("b", 2);
>insert into table tmp.guanwm_test values ("c", 3);
>select * from tmp.guanwm_test;
a	1
b	2
c	3

基础查询

>select count(*) from tmp.guanwm_test;
3

>select sum(1) from tmp.guanwm_test;
3

>select sum(2) from tmp.guanwm_test;
6

新插入一些null值数据

>insert into table tmp.guanwm_test values (NULL, 4);
>insert into table tmp.guanwm_test values ('d', NULL);

>select * from tmp.guanwm_test;
a	1
b	2
c	3
NULL	4
d	NULL

查询方法

>select count(*) from tmp.guanwm_test;
5

>select count(1), count(2), sum(1), sum(2) from tmp.guanwm_test; 
5	5	5	10

>select count(a), count(b), sum(a), sum(b) from tmp.guanwm_test;
4	4	0.0	10

结论:

  • count(数字)与count(*)逻辑一致,会包括null值的行
  • sum(数字)在string列与count逻辑一致,但是在数字列则是正常求和逻辑
  • count(列名)不会包括null值的行
  • sum(列名)在string列会返回0.0,但是在数字列则是正常求和逻辑

 

你可能感兴趣的:(hadoop,hive)