Hive表数据去重

一、数据全部重复

例如:

name score
Computer 1600
Phone 12
Phone 12

操作步骤:

1.复制表结构
CREATE TABLE LIKE ;

2.插入去重后的数据
insert overwrite table
select distinct * from ;

ps:有时执行这个语句会报以下错误:
FAILED: SemanticException TOK_ALLCOLREF is not supported in current context

写上所有列名就好了:
insert overwrite table
select distinct name, score from ;

二、部分数据重复

例如:

name score type
Computer 1600 2
Phone 12 1
Phone 15 1

操作步骤:

1.复制表结构
CREATE TABLE LIKE ;

2.插入去重后的数据
insert overwrite table (
select t.name, t.score, t.type
from (
select
name, score, ,type, row_number() over(distribute by name sort by score ) as rn
from
) t where t.rn=1
);

3.总结一下就是:

insert overwrite table (
select <字段>
from (
select <字段>, row_number() over(distribute by <有重复的字段> sort by <重复字段的排列根据字段>) as rn
from
) t where t.rn=1
);

你可能感兴趣的:(Hive)