Hive运维记之——补分区表数

Hive运维记之-补分区表数据


1、首先将数据导入一个临时表分区

LOAD DATA INPATH 'hdfs://nwdservice/user/datamart/logdata/log-1472464800000/part-00000' OVERWRITE  into TABLE log.user_log PARTITION (p_day='2016-07-32');   

LOAD DATA INPATH 'hdfs://nwdservice/user/datamart/logdata/log-1472464800000/part-00001'  into TABLE log.user_log PARTITION (p_day='2016-07-32');  
LOAD DATA INPATH 'hdfs://nwdservice/user/datamart/logdata/log-1472464800000/part-00002'  into TABLE log.user_log PARTITION (p_day='2016-07-32'); 
LOAD DATA INPATH 'hdfs://nwdservice/user/datamart/logdata/log-1472464800000/part-00003'  into TABLE log.user_log PARTITION (p_day='2016-07-32'); 
LOAD DATA INPATH 'hdfs://nwdservice/user/datamart/logdata/log-1472464800000/part-00004'  into TABLE log.user_log PARTITION (p_day='2016-07-32');  
LOAD DATA INPATH 'hdfs://nwdservice/user/datamart/logdata/log-1472464800000/part-00005'  into TABLE log.user_log PARTITION (p_day='2016-07-32'); 



先overwrite这个临时分区,然后再追加数据,这个命令会把hdfs的数据移动到指定的分区中。



当然 2016-07-32是不存在的日期


2、导入缺失数据的分区

insert overwrite table log.user_log PARTITION (p_day='2016-07-19')
select day        ,
ip         ,
uuid       ,
t          ,
uid        ,
cururl     ,
refer      ,
browser    ,
browserver ,
os         ,
osver      ,
time       ,
av         ,
device     ,
cid        ,
sr         ,
bp         ,
cookie     ,
pagetype   ,
buttonid   ,
amount     ,
src        ,
operator   ,
nettype    ,
transtype  ,
province   ,
city       ,
longitude  ,
latitude   ,
ext         from log.user_log where p_day='2016-07-32' and substring(day,0,10)='2016-07-19';

注意这里不能用*,因为分区的字段无须导入


3、验证数据

select count(*) from log.user_log where p_day='2016-07-19';


select * from log.user_log where p_day='2016-07-19' limit 1;

删除临时分区

ALTER TABLE log.user_log  DROP PARTITION(p_day='2016-07-32'); 


线上补数据很正常,保留脚本备用吧!


你可能感兴趣的:(Hive)