hive修改使用utf8编码支持中文字符集

转载来自葛大力:

1.hive建库语句:

create database amon DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

2.hive启动后,修改hive的元数据信息,无需重启mysql和hive就能生效

解决desc命令注释中文乱码:修改hive存储在mysql里的元数据相关信息 
1).修改字段注释字符集
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;

2).修改表注释字符集
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;

3).修改分区表参数,以支持分区键能够用中文表示

alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;

4).修改索引注解
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;

3.只有修改编码后才加入的中文注释才会正常显示 ,修改编码前已经存在的中文注释不会正常显示

========================建表测试========================

分区表
create table page_view 
( 
page_id bigint comment '页面ID', 
page_name string comment '页面名称', 
page_url string comment '页面URL' 
 ) 
comment '页面视图' 
partitioned by (ds string comment '当前时间,用于分区字段') ;

表字段索引
create index zxz_5_index
on table page_view (page_url)
as 'bitmap' 
with deferred rebuild
COMMENT "页面URL"

添加分区
alter table page_view add partition(ds='20190113');

插入数据
insert into page_view  partition(ds='20190113') values (1,"王克洲","田慧杰") ;

查看索引
SHOW FORMATTED INDEX ON page_view;

查看表结构
desc page_view;

测试结果 

 hive修改使用utf8编码支持中文字符集_第1张图片

 

4.后来发现hive无法创建中文索引,报错如下:

hive> alter table page_view add partition(ds='20190113开心');
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Exception thrown when executing query)

 hive修改使用utf8编码支持中文字符集_第2张图片

 解决办法如下:

MariaDB [hive]> show create table PARTITIONS;

| PARTITIONS | CREATE TABLE `PARTITIONS` (
  `PART_ID` bigint(20) NOT NULL,
  `CREATE_TIME` int(11) NOT NULL,
  `LAST_ACCESS_TIME` int(11) NOT NULL,
  `PART_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
  `SD_ID` bigint(20) DEFAULT NULL,
  `TBL_ID` bigint(20) DEFAULT NULL,
  `LINK_TARGET_ID` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`PART_ID`),
  UNIQUE KEY `UNIQUEPARTITION` (`PART_NAME`,`TBL_ID`),
  KEY `PARTITIONS_N49` (`TBL_ID`),
  KEY `PARTITIONS_N50` (`SD_ID`),
  KEY `PARTITIONS_N51` (`LINK_TARGET_ID`),
  CONSTRAINT `PARTITIONS_FK1` FOREIGN KEY (`TBL_ID`) REFERENCES `TBLS` (`TBL_ID`),
  CONSTRAINT `PARTITIONS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`),
  CONSTRAINT `PARTITIONS_FK3` FOREIGN KEY (`LINK_TARGET_ID`) REFERENCES `PARTITIONS` (`PART_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

MariaDB [hive]> alter table PARTITIONS  modify column `PART_NAME` varchar(767) character set utf8;
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
MariaDB [hive]> alter table PARTITIONS  modify column `PART_NAME` varchar(100) character set utf8;   
Query OK, 0 rows affected (0.01 sec)               
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [hive]> alter table PARTITIONS  modify column `PART_NAME` varchar(300) character set utf8;   
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
MariaDB [hive]> alter table PARTITIONS  modify column `PART_NAME` varchar(200) character set utf8;   
Query OK, 0 rows affected (0.00 sec)               
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [hive]> alter table PARTITIONS  modify column `PART_NAME` varchar(250) character set utf8;  
Query OK, 0 rows affected (0.00 sec)               
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [hive]> alter table PARTITIONS  modify column `PART_NAME` varchar(260) character set utf8;  
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

另外验证一个小问题,utf8占用三个字节,之前默认是767,我指定250 * 3 = 750可以,但是260*3=780不可以; 

再去创建中文分区,发现可以了:

 

你可能感兴趣的:(大数据运维之hive日常)