Programming Hive ( Hive编程指南) 二

4.4.1   外部分区表

创建分区表:

create external table if not exists Chapter4..4log_messages (
    hms    int,
    severity    string,
    server      string,
    process_id  int,
    message     string )
partitioned by (year int,month int,day int)
row format delimited fields terminated by '\t' ;

Tips:外部分区表创建不需要制定location,管理表必须制定location(参考stocks表)

alter可以增加分区,需要为每一个分区键制定一个值(本例分区键为year、month、day)

       alter其他用法参照4.6

hive> alter table log_messages add partition(year=2012,month=1,day=3)
    > location "hdfs://master:9000/user/hive/warehouse/chapter4.db/log_messages/2012/01/03";
OK
Time taken: 0.222 seconds

HDFS目录:hdfs://master:9000/user/hive/warehouse/chapter4.db/log_messages/2012/01/03

如果不加location,则文件目录为:(Hive的默认目录设置格式)

hive> alter table log_messages add partition(year=2012,month=1,day=2);
OK
Time taken: 0.441 seconds

HDFS目录:hdfs://master:9000/user/hive/warehouse/chapter4.db/log_messages/year=2012/month=1/day=2

查看表的分区以及表的模式

hive> desc formatted log_messages;
...
# Partition Information          
# col_name              data_type               comment             
                 
year                    int                                         
month                   int                                         
day                     int                                         

此方式看不到分区数据所在路径,使用以下语句查看

hive> desc formatted log_messages partition(year=2012,month=1,day=3);
...               
Location:               hdfs://master:9000/user/hive/warehouse/chapter4.db/log_messages/2012/01/03   

4.4.2  自定义表的存储格式

4.5 删除管理表

开启hadoop回收站功能,删除的数据会被转移到分布式文件系统的用户根目录下的。Trash目录下,即HDFS的/user/$USER/.Trash目录

开启回收站:fs.trash.interval=1440(1440表示时间,单位为分钟,也就是能保存24小时)

找回表的步骤:

1.重建表

2.重建所需的分区

3.从.Trash目录将误删的文件移动到正确的文件目录下进行重新存储

误删外表:只删除元数据信息,不删除数据

 

 

 

4.6修改表

alter table:修改元数据,但不修改数据(用于修改表模式中的错误、改变分区路径等)

4.6.1修改表名

hive> alter table log_messages rename to logmsgs;
OK
Time taken: 0.23 seconds

4.6.2增加修改删除表分区

增加分区:
hive> alter table logmsgs add if not exists
    > partition (year=2019,month=5,day=29) location '/2019/05/29'
    > partition (year=2019,month=5,day=30) location '/2019/05/30'
    > partition (year=2019,month=5,day=31) location '/2019/05/31';
OK
Time taken: 0.211 seconds

 修改分区路径:不会将数据从旧的路径转移走,也不会删除旧的数据

修改路径:(笔者没成功)
hive> alter table logmsgs partition(year=2019,month=5,day=29)
    > set location 's3n://hive/2019/05/29';

删除某个分区

外部表:只删除分区,不删数据           管理表:分区、数据都删

删除分区:
hive> alter table logmsgs drop if exists partition(year=2019,month=05,day=29);
Dropped the partition year=2019/month=5/day=29
OK
Time taken: 0.367 seconds

4.6.3修改列信息   修改hms名字,并将其放在severity后面

hive> desc logmsgs;
OK
hms                     int                                         
severity                string                                      
                                        
hive> alter table logmsgs
    > change column hms hours_minutes_seconds int
    > comment "The hours,minutes,and seconds part of the timestamp"
    > after severity;
OK
Time taken: 0.222 seconds

hive> desc logmsgs;
OK
severity                string                                      
hours_minutes_seconds   int                     The hours,minutes,and seconds part of the timestamp

4.6.4 增加列

hive> alter table logmsgs add columns (
    > app_name string comment 'Application name'
    > session_id int comment 'The current session id');   类型是long,但是一直报错,改成int就好了

4.6.5 删除或者替换列

hive> alter table logmsgs replace columns(
    > hours_minutes_seconds int comment 'hour,minute,seconds from timestamp',
    > severity string comment 'The message severity',
    > message string comment 'the rest of the message');
OK
Time taken: 0.191 seconds
hive> desc logmsgs;
OK
hours_minutes_seconds   int                     hour,minute,seconds from timestamp
severity                string                  The message severity
message                 string                  the rest of the message

P68 SerDe

4.6.6 修改表属性(修改增加,但不能删除)

hive> alter table logmsgs set tblproperties (
    > "notes" = "The process id is no longer captured,this column is always null");
报错

4.6.7 修改存储属性:Sequencefile

hive> alter table logmsgs
    > partition(year = 2019,month=5,day=30)
    > set fileformat sequencefile;
OK
Time taken: 0.212 seconds

禁止分区被删和查询

hive> alter table logmsgs
    > partition(year = 2019,month=5,day=30) enable no_drop;
OK
Time taken: 0.272 seconds

hive> alter table logmsgs drop partition(year = 2019,month=5,day=30);
FAILED: SemanticException [Error 30011]: Partition protected from being dropped chapter4@logmsgs@year=2019/month=5/day=30

将enable改为disable就可恢复删除的操作

    //禁止查询
hive> alter table logmsgs
    > partition(year = 2019,month=5,day=30) enable offline;
OK
Time taken: 0.272 seconds

 

你可能感兴趣的:(Hive)