Hive踩坑——使用count(*),count(1)和count(某字段)的区别

结论

count(*)和count(1):对表中行数进行统计计算,包含null值。
count(某字段):对表中该字段的行数进行统计,不包含null值。如果出现空字符串,同样会进行统计。

数据准备

我们插入测试数据,这里对null值和空字符串没有作统一处理,以默认为准。

  • 建表语句
CREATE  TABLE IF NOT EXISTS `test_01`(
 name STRING,address STRING,gender STRING
 )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
  • 插入数据
insert overwrite table test_01
select 'Lucy','shanghai','female'
union all
select 'LiLei',null,'male'
union all
select 'Rose','shenzhen','female'
union all
select 'Marry',null,''
union all
select 'Curry','beijing',''
  • 数据查看
hive> select * from test_01;
OK
Marry	NULL	
Lucy	shanghai	female
LiLei	NULL		male
Rose	shenzhen	female
Curry	beijing	

验证

使用count(*)进行统计

select count(*) from test_01;
OK
5
Time taken: 1.145 seconds, Fetched: 1 row(s)

使用count(1)进行统计

select count(1) from test_01;
OK
5
Time taken: 1.113 seconds, Fetched: 1 row(s)

使用count(某字段)进行统计,字段含有null值

select count(address) from test_01;
OK
3
Time taken: 1.113 seconds, Fetched: 1 row(s)

使用count(某字段)进行统计,字段含有空字符串

select count(gender) from test_01;
OK
5
Time taken: 1.139 seconds, Fetched: 1 row(s)

你可能感兴趣的:(Hive)