Hive运维记之-补分区表数据
1、首先将数据导入一个临时表分区
LOAD DATA INPATH 'hdfs://nwdservice/user/datamart/logdata/log-1472464800000/part-00000' OVERWRITE into TABLE log.user_log PARTITION (p_day='2016-07-32');
LOAD DATA INPATH 'hdfs://nwdservice/user/datamart/logdata/log-1472464800000/part-00001' into TABLE log.user_log PARTITION (p_day='2016-07-32');
LOAD DATA INPATH 'hdfs://nwdservice/user/datamart/logdata/log-1472464800000/part-00002' into TABLE log.user_log PARTITION (p_day='2016-07-32');
LOAD DATA INPATH 'hdfs://nwdservice/user/datamart/logdata/log-1472464800000/part-00003' into TABLE log.user_log PARTITION (p_day='2016-07-32');
LOAD DATA INPATH 'hdfs://nwdservice/user/datamart/logdata/log-1472464800000/part-00004' into TABLE log.user_log PARTITION (p_day='2016-07-32');
LOAD DATA INPATH 'hdfs://nwdservice/user/datamart/logdata/log-1472464800000/part-00005' into TABLE log.user_log PARTITION (p_day='2016-07-32');
先overwrite这个临时分区,然后再追加数据,这个命令会把hdfs的数据移动到指定的分区中。
2、导入缺失数据的分区
insert overwrite table log.user_log PARTITION (p_day='2016-07-19')
select day ,
ip ,
uuid ,
t ,
uid ,
cururl ,
refer ,
browser ,
browserver ,
os ,
osver ,
time ,
av ,
device ,
cid ,
sr ,
bp ,
cookie ,
pagetype ,
buttonid ,
amount ,
src ,
operator ,
nettype ,
transtype ,
province ,
city ,
longitude ,
latitude ,
ext from log.user_log where p_day='2016-07-32' and substring(day,0,10)='2016-07-19';
3、验证数据
select count(*) from log.user_log where p_day='2016-07-19';
select * from log.user_log where p_day='2016-07-19' limit 1;
ALTER TABLE log.user_log DROP PARTITION(p_day='2016-07-32');
线上补数据很正常,保留脚本备用吧!