Hive 元数据“waiting for table metadata lock”

最近hive元数据统计时,老出现“waiting for metadata lock”,造成hive查询、统计的sql执行失败。现象:

33692473 hiveadmin 10.5.18.226:5176 dataplatform_hive Query 13316 Waiting for table metadata lock SELECT 'org.apache.hadoop.hive.metastore.model.MTable' AS NUCLEUS_TYPE,`THIS`.`CREATE_TIME`,`THIS`.`
33810095 hiveadmin 10.5.18.226:50978 dataplatform_hive Sleep 14357 NULL
33810096 hiveadmin 10.5.18.226:50985 dataplatform_hive Query 14207 Waiting for table metadata lock SELECT 'org.apache.hadoop.hive.metastore.model.MTable' AS NUCLEUS_TYPE,`THIS`.`CREATE_TIME`,`THIS`.`
33810124 hiveadmin 10.5.18.226:51027 dataplatform_hive Sleep 14334 NULL
33810127 hiveadmin 10.5.18.226:51028 dataplatform_hive Query 14093 Waiting for table metadata lock SELECT 'org.apache.hadoop.hive.metastore.model.MTable' AS NUCLEUS_TYPE,`THIS`.`CREATE_TIME`,`THIS`.`
33811831 hiveadmin 10.5.18.226:51839 dataplatform_hive Query 13351 Waiting for table metadata lock ALTER TABLE `TBLS` ADD CONSTRAINT `TBLS_FK2` FOREIGN KEY (`DB_ID`) REFERENCES `DBS` (`DB_ID`)
33811832 hiveadmin 10.5.18.226:51843 dataplatform_hive Sleep 13352 NULL
33812847 hiveadmin 10.5.18.226:52371 dataplatform_hive Query 12753 Waiting for table metadata lock SELECT 'org.apache.hadoop.hive.metastore.model.MTable' AS NUCLEUS_TYPE,`THIS`.`CREATE_TIME`,`THIS`.`
33813031 hiveadmin 10.5.18.226:52404 dataplatform_hive Query 12642 Waiting for table metadata lock ALTER TABLE `TBLS` ADD CONSTRAINT `TBLS_FK2` FOREIGN KEY (`DB_ID`) REFERENCES `DBS` (`DB_ID`)
33813034 hiveadmin 10.5.18.226:52405 dataplatform_hive Sleep 12643 NULL
33814259 hiveadmin 10.5.18.226:52891 dataplatform_hive Query 11918 Waiting for table metadata lock ALTER TABLE `TBLS` ADD CONSTRAINT `TBLS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`)
33814262 hiveadmin 10.5.18.226:52892 dataplatform_hive Sleep 11919 NULL
33814302 hiveadmin 10.5.18.226:52907 dataplatform_hive Query 11904 Waiting for table metadata lock ALTER TABLE `TBLS` ADD CONSTRAINT `TBLS_FK2` FOREIGN KEY (`DB_ID`) REFERENCES `DBS` (`DB_ID`)
33814306 hiveadmin 10.5.18.226:52910 dataplatform_hive Sleep 11904 NULL
33814867 hiveadmin 10.5.18.226:53478 dataplatform_hive Query 11561 Waiting for table metadata lock ALTER TABLE `TBLS` ADD CONSTRAINT `TBLS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`)
33814872 hiveadmin 10.5.18.226:53479 dataplatform_hive Sleep 11562 NULL
33815135 hiveadmin 10.5.18.226:53589 dataplatform_hive Query 11414 Waiting for table metadata lock ALTER TABLE `TBLS` ADD CONSTRAINT `TBLS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`)
33815136 hiveadmin 10.5.18.226:53590 dataplatform_hive Sleep 11414 NULL
33816324 hiveadmin 10.5.18.226:54193 dataplatform_hive Query 10701 Waiting for table metadata lock ALTER TABLE `TBLS` ADD CONSTRAINT `TBLS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`)
33816325 hiveadmin 10.5.18.226:54195 dataplatform_hive Query 10701 Waiting for table metadata lock ALTER TABLE `TBLS` ADD CONSTRAINT `TBLS_FK2` FOREIGN KEY (`DB_ID`) REFERENCES `DBS` (`DB_ID`)
33816333 hiveadmin 10.5.18.226:54206 dataplatform_hive Sleep 10701 NULL
33816334 hiveadmin 10.5.18.226:54207 dataplatform_hive Sleep 10701 NULL
33816349 hiveadmin 10.5.18.226:54218 dataplatform_hive Query 10696 Waiting for table metadata lock ALTER TABLE `TBLS` ADD CONSTRAINT `TBLS_FK2` FOREIGN KEY (`DB_ID`) REFERENCES `DBS` (`DB_ID`)
33816350 hiveadmin 10.5.18.226:54223 dataplatform_hive Sleep 10696 NULL
33816776 hiveadmin 10.5.18.226:54507 dataplatform_hive Query 10437 Waiting for table metadata lock ALTER TABLE `TBLS` ADD CONSTRAINT `TBLS_FK2` FOREIGN KEY (`DB_ID`) REFERENCES `DBS` (`DB_ID`)
33816783 hiveadmin 10.5.18.226:54508 dataplatform_hive Sleep 10437 NULL


这个问题困扰了好久,显然,是客户端不断的往mysql提交alter table操作,造成metadata lock,从而出现死锁。

分析步骤:

1,通过mysqlpcap来捕获向mysql数据库提交的sql操作,之前用mysql client端的show full processlist;来捕获,很难捕获到执行的sql。

mysqlpcap源码:https://github.com/hoterran/tcpcollect 将源码下载下载,直接make就可以编译了。

15:24:51:666434 10.9.18.47 330 1 hiveadmin hive_metasto SHOW FULL TABLES FROM `hive_metastore` LIKE 'TBLS'
15:24:51:668320 10.9.18.47 193 1 hiveadmin hive_metasto SHOW FULL TABLES FROM `hive_metastore` LIKE 'SDS'
15:24:51:670040 10.9.18.47 197 1 hiveadmin hive_metasto SHOW FULL TABLES FROM `hive_metastore` LIKE 'CDS'
15:24:51:671705 10.9.18.47 191 1 hiveadmin hive_metasto SHOW FULL TABLES FROM `hive_metastore` LIKE 'COLUMNS_V2'
15:24:51:673394 10.9.18.47 300 1 hiveadmin hive_metasto SHOW FULL TABLES FROM `hive_metastore` LIKE 'SERDE_PARAMS'
15:24:51:675201 10.9.18.47 188 1 hiveadmin hive_metasto SHOW FULL TABLES FROM `hive_metastore` LIKE 'SD_PARAMS'
15:24:51:676894 10.9.18.47 188 1 hiveadmin hive_metasto SHOW FULL TABLES FROM `hive_metastore` LIKE 'PARTITION_KEYS'
15:24:51:678575 10.9.18.47 197 1 hiveadmin hive_metasto SHOW FULL TABLES FROM `hive_metastore` LIKE 'SORT_COLS'
15:24:51:680264 10.9.18.47 185 1 hiveadmin hive_metasto SHOW FULL TABLES FROM `hive_metastore` LIKE 'BUCKETING_COLS'
15:24:51:682017 10.9.18.47 195 1 hiveadmin hive_metasto SHOW FULL TABLES FROM `hive_metastore` LIKE 'TABLE_PARAMS'
15:24:51:683751 10.9.18.47 132 1 hiveadmin hive_metasto SHOW INDEX FROM `SERDES` FROM `hive_metastore`
15:24:51:685707 10.9.18.47 69 1 hiveadmin hive_metasto SHOW CREATE TABLE `hive_metastore`.`SERDES`
15:24:51:687342 10.9.18.47 174 1 hiveadmin hive_metasto SHOW INDEX FROM `SERDES` FROM `hive_metastore`
15:24:51:689307 10.9.18.47 198 5 hiveadmin hive_metasto SHOW INDEX FROM `TBLS` FROM `hive_metastore`
15:24:51:691891 10.9.18.47 92 1 hiveadmin hive_metasto SHOW CREATE TABLE `hive_metastore`.`TBLS`
15:24:51:693666 10.9.18.47 11405 700 hiveadmin hive_metasto ALTER TABLE `TBLS` ADD CONSTRAINT `TBLS_FK2` FOREIGN KEY (`DB_ID`) REFERENCES `DBS` (`DB_ID`)
15:24:51:706311 10.9.18.47 44 0 hiveadmin hive_metasto SHOW WARNINGS
15:24:51:707666 10.9.18.47 11713 700 hiveadmin hive_metasto ALTER TABLE `TBLS` ADD CONSTRAINT `TBLS_FK1` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`)
15:24:51:720687 10.9.18.47 45 0 hiveadmin hive_metasto SHOW WARNINGS
15:24:51:722087 10.9.18.47 239 5 hiveadmin hive_metasto SHOW INDEX FROM `TBLS` FROM `hive_metastore`
15:24:51:724495 10.9.18.47 848 3 hiveadmin hive_metasto SHOW INDEX FROM `SDS` FROM `hive_metastore`
15:24:51:727512 10.9.18.47 129 1 hiveadmin hive_metasto SHOW CREATE TABLE `hive_metastore`.`SDS`
15:24:52:729261 10.9.18.47 384348 55250 hiveadmin hive_metasto ALTER TABLE `SDS` ADD CONSTRAINT `SDS_FK1` FOREIGN KEY (`SERDE_ID`) REFERENCES `SERDES` (`SERDE_ID`)
15:24:52:114860 10.9.18.47 56 0 hiveadmin hive_metasto SHOW WARNINGS

通过sql的执行顺序可以发现:hive在初始化时,会检查整个metastore的schema操作,如果不存在的约束、表、字段都会给补全,由于我们mysql的存储引擎使用的是myisam引擎,这个引擎不支持foreign key,当metastore每次检查都发现这个constriant不满足,所以就不停的执行初始化metastore 的schema。不断的提交alter table操作,从而造成锁表。

2,如何操作(问题定位,测试):

1)dump线上的metastore数据库,导入到另外一个数据库中。

2)删除脏数据,按照各个表之间的外键约束来删除,SQL如下:

delete from BUCKETING_COLS where SD_ID not in (SELECT SD_ID from SDS);
select "1 OK";
delete from DATABASE_PARAMS where DB_ID not in (select DB_ID from DBS);
select "2 OK";
delete from PARTITION_KEY_VALS where PART_ID not in (SELECT PART_ID FROM PARTITIONS);
delete from PARTITION_KEY_VALS where PART_ID in (select PART_ID from PARTITIONS where SD_ID not in (select SD_ID from SDS));
DELETE FROM PARTITION_KEY_VALS WHERE PART_ID IN (SELECT PART_ID FROM PARTITIONS WHERE TBL_ID NOT IN (SELECT TBL_ID FROM TBLS));
select "3 OK";
DELETE FROM PARTITION_PARAMS WHERE PART_ID NOT IN (SELECT PART_ID FROM PARTITIONS);
DELETE FROM PARTITION_PARAMS WHERE PART_ID IN (SELECT PART_ID FROM PARTITIONS WHERE SD_ID NOT IN (SELECT SD_ID FROM SDS));
DELETE FROM PARTITION_PARAMS WHERE PART_ID IN (SELECT PART_ID FROM PARTITIONS WHERE TBL_ID NOT IN (SELECT TBL_ID FROM TBLS));
select "4 OK";
delete from PARTITION_KEYS where TBL_ID not in (select TBL_ID from TBLS);
select "5 OK";
DELETE FROM SD_PARAMS WHERE SD_ID NOT IN(SELECT SD_ID FROM SDS);
select "6 OK";
DELETE FROM SERDE_PARAMS WHERE SERDE_ID NOT IN(SELECT SERDE_ID FROM SERDES);
select "7 OK";
delete from COLUMNS_V2 where CD_ID not in (SELECT CD_ID from CDS);
select "8 OK";
delete from PARTITIONS where SD_ID not in (select SD_ID from SDS);
select "9 OK";
delete from PARTITIONS where TBL_ID not in (select TBL_ID from TBLS);
select "10 OK";
DELETE FROM SDS WHERE SERDE_ID NOT IN(SELECT SERDE_ID FROM SERDES);
select "11 OK";
DELETE FROM SDS WHERE CD_ID NOT IN(SELECT CD_ID FROM CDS);
select "12 OK";
DELETE FROM SORT_COLS WHERE SD_ID NOT IN(SELECT SD_ID FROM SDS);
select "13 OK";
DELETE FROM TABLE_PARAMS WHERE TBL_ID NOT IN(SELECT TBL_ID FROM TBLS);
select "14 OK";
DELETE FROM TBLS WHERE SD_ID NOT IN(SELECT SD_ID FROM SDS);
select "15 OK";
DELETE FROM TBLS WHERE DB_ID NOT IN(SELECT DB_ID FROM DBS);
select "16 OK";

3)将数据库的所有元数据表都改为innodb的引擎来支持外键。

sql语句:alter table table_name engine=innodb;

4)测试监控是否hive的cli模式时仍有alter table的操作。

通过mysqlpcap来监控,发现不存在alter table的操作了。

3,操作线上数据库:

1)dump线上数据库进行备份,dump到另外一个数据库中。

2)执行清楚脏数据的sql。

3)将数据库所有的元数据表引擎都改为innodb。

4)监控是否有alter table操作。

监控显示不在有alter table操作。

至此,困扰已久的waiting for table metadata lock问题已经解决了。

总结:

1)metadata由于是myisam引擎,不支持foreign key,而hive每次启动时(不管是cli模式还是jdbc模式)都会检查metadata schema的完整性,包括各个表的唯一、外键等约束。如果缺失,会自动执行相关操作来完善schema。

2)由于业务增多,cli模式被大量使用,然后造成频繁提交alter元数据表相关的操作,从而由于并发,造成waiting for metadata lock。造成数据库的select操作时无响应。

原因:由于历史原因,进行迁库及版本混乱使用,造成metadata数据混乱,外键约束丢失,出现脏数据,在加上使用了myisam的引擎,不在对外键约束进行支持。从而导致不断提交metadata schema的修复操作,促发锁表等。

你可能感兴趣的:(hive)