Hive 加载HDFS数据建表, 挂载分区遇到问题及解决方法

1.创建临时表:

CREATE EXTERNAL TABLE  IF NOT EXISTS tmp.tmp_tb_jinritoutiao_log 
(
content string  COMMENT 'json内容格式'
)
COMMENT '今日头条视频内容'
PARTITIONED BY (`day` string)
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/datastream/portal/jinritoutiao/video/';

2.加载HDFS数据

alter table tmp.tmp_tb_jinritoutiao_log add partition(day='20180810') location '/data/jinritoutiao/video/2018-08-10';

问题: 第一次加载时报错:

ValidationFailureSemanticException table is not partitioned but partition spec exists

意思是建的表不是分区表, 但明明加了day的分区,不知为何; 尝试很多次, 最终给day加了引号, 才解决问题..

PARTITIONED BY (`day` string)

3.将已有的数据添加到对应分区当中

alter table tmp.tmp_tb_jinritoutiao_log add partition(day='20180810') location '/datastream/portal/jinritoutiao/video/2018-08-10';

4.根据需求创建新表, 并将log中的一列解析拆分, 拆入新表当中

CREATE EXTERNAL  TABLE  IF NOT EXISTS tmp.tmp_jinritoutiao_video 
(
id string comment'',
class string comment'',
userId string comment'')
partitioned by (day string comment '分区字段')
STORED AS ORC
location '/user/portal/tmp_jinritoutiao_video';
insert overwrite table tmp.tmp_jinritoutiao_video partition (day='20180810')
select
get_json_object(content,'$.id') as id,
get_json_object(content,'$.class') as class,
get_json_object(userId,'$.class') as user_id
from tmp.tmp_tb_jinritoutiao_log where day='20180810' limit 10

5.done

你可能感兴趣的:(hadoop)