4.4.1 外部分区表
创建分区表:
create external table if not exists Chapter4..4log_messages (
hms int,
severity string,
server string,
process_id int,
message string )
partitioned by (year int,month int,day int)
row format delimited fields terminated by '\t' ;
Tips:外部分区表创建不需要制定location,管理表必须制定location(参考stocks表)
alter可以增加分区,需要为每一个分区键制定一个值(本例分区键为year、month、day)
alter其他用法参照4.6
hive> alter table log_messages add partition(year=2012,month=1,day=3)
> location "hdfs://master:9000/user/hive/warehouse/chapter4.db/log_messages/2012/01/03";
OK
Time taken: 0.222 seconds
HDFS目录:hdfs://master:9000/user/hive/warehouse/chapter4.db/log_messages/2012/01/03
如果不加location,则文件目录为:(Hive的默认目录设置格式)
hive> alter table log_messages add partition(year=2012,month=1,day=2);
OK
Time taken: 0.441 seconds
HDFS目录:hdfs://master:9000/user/hive/warehouse/chapter4.db/log_messages/year=2012/month=1/day=2
查看表的分区以及表的模式
hive> desc formatted log_messages;
...
# Partition Information
# col_name data_type comment
year int
month int
day int
此方式看不到分区数据所在路径,使用以下语句查看
hive> desc formatted log_messages partition(year=2012,month=1,day=3);
...
Location: hdfs://master:9000/user/hive/warehouse/chapter4.db/log_messages/2012/01/03
4.4.2 自定义表的存储格式
4.5 删除管理表
开启hadoop回收站功能,删除的数据会被转移到分布式文件系统的用户根目录下的。Trash目录下,即HDFS的/user/$USER/.Trash目录
开启回收站:fs.trash.interval=1440(1440表示时间,单位为分钟,也就是能保存24小时)
找回表的步骤:
1.重建表
2.重建所需的分区
3.从.Trash目录将误删的文件移动到正确的文件目录下进行重新存储
误删外表:只删除元数据信息,不删除数据
4.6修改表
alter table:修改元数据,但不修改数据(用于修改表模式中的错误、改变分区路径等)
4.6.1修改表名
hive> alter table log_messages rename to logmsgs;
OK
Time taken: 0.23 seconds
4.6.2增加修改删除表分区
增加分区:
hive> alter table logmsgs add if not exists
> partition (year=2019,month=5,day=29) location '/2019/05/29'
> partition (year=2019,month=5,day=30) location '/2019/05/30'
> partition (year=2019,month=5,day=31) location '/2019/05/31';
OK
Time taken: 0.211 seconds
修改分区路径:不会将数据从旧的路径转移走,也不会删除旧的数据
修改路径:(笔者没成功)
hive> alter table logmsgs partition(year=2019,month=5,day=29)
> set location 's3n://hive/2019/05/29';
删除某个分区
外部表:只删除分区,不删数据 管理表:分区、数据都删
删除分区:
hive> alter table logmsgs drop if exists partition(year=2019,month=05,day=29);
Dropped the partition year=2019/month=5/day=29
OK
Time taken: 0.367 seconds
4.6.3修改列信息 修改hms名字,并将其放在severity后面
hive> desc logmsgs;
OK
hms int
severity string
hive> alter table logmsgs
> change column hms hours_minutes_seconds int
> comment "The hours,minutes,and seconds part of the timestamp"
> after severity;
OK
Time taken: 0.222 seconds
hive> desc logmsgs;
OK
severity string
hours_minutes_seconds int The hours,minutes,and seconds part of the timestamp
4.6.4 增加列
hive> alter table logmsgs add columns (
> app_name string comment 'Application name'
> session_id int comment 'The current session id'); 类型是long,但是一直报错,改成int就好了
4.6.5 删除或者替换列
hive> alter table logmsgs replace columns(
> hours_minutes_seconds int comment 'hour,minute,seconds from timestamp',
> severity string comment 'The message severity',
> message string comment 'the rest of the message');
OK
Time taken: 0.191 seconds
hive> desc logmsgs;
OK
hours_minutes_seconds int hour,minute,seconds from timestamp
severity string The message severity
message string the rest of the message
P68 SerDe
4.6.6 修改表属性(修改增加,但不能删除)
hive> alter table logmsgs set tblproperties (
> "notes" = "The process id is no longer captured,this column is always null");
报错
4.6.7 修改存储属性:Sequencefile
hive> alter table logmsgs
> partition(year = 2019,month=5,day=30)
> set fileformat sequencefile;
OK
Time taken: 0.212 seconds
禁止分区被删和查询
hive> alter table logmsgs
> partition(year = 2019,month=5,day=30) enable no_drop;
OK
Time taken: 0.272 seconds
hive> alter table logmsgs drop partition(year = 2019,month=5,day=30);
FAILED: SemanticException [Error 30011]: Partition protected from being dropped chapter4@logmsgs@year=2019/month=5/day=30
将enable改为disable就可恢复删除的操作
//禁止查询
hive> alter table logmsgs
> partition(year = 2019,month=5,day=30) enable offline;
OK
Time taken: 0.272 seconds