实践数据湖iceberg 第一课 入门
实践数据湖iceberg 第二课 iceberg基于hadoop的底层数据格式
实践数据湖iceberg 第三课 在sqlclient中,以sql方式从kafka读数据到iceberg
实践数据湖iceberg 第四课 在sqlclient中,以sql方式从kafka读数据到iceberg(升级版本到flink1.12.7)
实践数据湖iceberg 第五课 hive catalog特点
实践数据湖iceberg 第六课 从kafka写入到iceberg失败问题 解决
实践数据湖iceberg 第七课 实时写入到iceberg
实践数据湖iceberg 第八课 hive与iceberg集成
实践数据湖iceberg 第九课 合并小文件
实践数据湖iceberg 第十课 快照删除
实践数据湖iceberg 第十一课 测试分区表完整流程(造数、建表、合并、删快照)
实践数据湖iceberg 第十二课 catalog是什么
实践数据湖iceberg 第十三课 metadata比数据文件大很多倍的问题
实践数据湖iceberg 第十四课 元数据合并(解决元数据随时间增加而元数据膨胀的问题)
元数据随时间增加而元数据膨胀的问题:iceberg合并,删除快照,但元数据还是没有删除,6M的数据大小,有几十G的元数据。
为解决这个问题,是本课主要目的
本文整体思路:建立一个有自动清理table_a5与没有自动清理(默认)的分区表table_a,不断往分区表insert数据,insert前后,查看metadata目录的变化,并记录变化。
为什么用分区表,生产环境大部分是分区表。
元数据随时间增加而元数据膨胀的问题:iceberg合并,删除快照,但元数据还是没有删除,6M的数据大小,有几十G的元数据。
在建表时,增加这2个参数
‘write.metadata.delete-after-commit.enabled’=‘true’,
‘write.metadata.previous-versions-max’=‘5’
增加参数的方法:
CREATE TABLE iceberg_db6.table_a5 (
id bigint, name string
) PARTITIONED BY (
dt string
) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
TBLPROPERTIES (
'write.distribution-mode'='hash',
'write.metadata.delete-after-commit.enabled'='true',
'write.metadata.previous-versions-max'='5'
);
这个语法,需要在hive客户端执行
测试本方案的可行性
table_a 没有自动合并metadata
table_a5 保留5个之前的metadata,加上目录新写入的,总共6个。
[root@hadoop101 software]# hive
hive (default)> add jar /opt/module/hive/lib/iceberg-hive-runtime-0.12.1.jar;
Added [/opt/module/hive/lib/iceberg-hive-runtime-0.12.1.jar] to class path
Added resources: [/opt/module/hive/lib/iceberg-hive-runtime-0.12.1.jar]
hive (default)> use iceberg_db6;
OK
hive (iceberg_db6)> CREATE TABLE iceberg_db6.table_a (
> id bigint, name string
> ) PARTITIONED BY (
> dt string
> ) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';
OK
Time taken: 0.546 seconds
hive (iceberg_db6)> CREATE TABLE iceberg_db6.table_a5 (
> id bigint, name string
> ) PARTITIONED BY (
> dt string
> ) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
> TBLPROPERTIES (
> 'write.distribution-mode'='hash',
> 'write.metadata.delete-after-commit.enabled'='true',
> 'write.metadata.previous-versions-max'='5'
> );
OK
在hive中insert数据
hive (iceberg_db6)> insert into table_a values(1,'apple','20220101')
> ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220211115034_c75f4d78-073c-4643-a49a-0d659ca5dcb2
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1642579431487_0020, Tracking URL = http://hadoop101:8088/proxy/application_1642579431487_0020/
Kill Command = /opt/module/hadoop/bin/hadoop job -kill job_1642579431487_0020
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2022-02-11 11:50:43,643 Stage-3 map = 0%, reduce = 0%
2022-02-11 11:50:50,911 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 4.16 sec
MapReduce Total cumulative CPU time: 4 seconds 160 msec
Ended Job = job_1642579431487_0020
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1 Cumulative CPU: 4.16 sec HDFS Read: 115157 HDFS Write: 3616 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 160 msec
OK
_col0 _col1 _col2
Time taken: 18.719 seconds
没有合并元数据的表,每次执行,metadata增加3个文件,metadata.json, snap文件增加,-m0.avro文件增加
insert into table_a values(2,'apple2','20220101');
insert into table_a values(2,'apple2','20220101');
insert into table_a values(3,'apple3','20220101');
insert into table_a values(4,'apple4','20220101');
insert into table_a values(5,'apple5','20220101');
insert into table_a values(6,'apple6','20220101');
合并元数据的表:
insert into table_a5 values(1,'apple1','20220101');
insert into table_a5 values(2,'apple2','20220101'); 有7个元数据文件生成
insert into table_a5 values(2,'apple2','20220101'); 总共有10个元数据文件生成
insert into table_a5 values(3,'apple3','20220101'); 总共有13个元数据文件生成
insert into table_a5 values(4,'apple4','20220101'); 总共有16个元数据文件生成
insert into table_a5 values(5,'apple5','20220101'); 总共有18个元数据文件生成
insert into table_a5 values(6,'apple6','20220101'); 总共有20个元数据文件生成
第二次insert table_a5
[root@hadoop103 ~]# hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata
Found 7 items
-rw-r--r-- 2 root supergroup 1846 2022-02-11 11:22 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00000-af035f2a-6178-4b06-8c55-3182ecfec103.metadata.json
-rw-r--r-- 2 root supergroup 2812 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00001-a058dcf4-4394-4691-a508-212290920ef2.metadata.json
-rw-r--r-- 2 root supergroup 3813 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00002-de593422-a794-4acf-a0b8-3edaa444a1eb.metadata.json
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/a2adc8c6-779d-4634-9840-ace594f56ac0-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/bb1c1064-fc53-4e82-ab7a-16b3d306ccd6-m0.avro
-rw-r--r-- 2 root supergroup 3865 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-8601063452676878887-1-bb1c1064-fc53-4e82-ab7a-16b3d306ccd6.avro
-rw-r--r-- 2 root supergroup 3796 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-9064233795022707236-1-a2adc8c6-779d-4634-9840-ace594f56ac0.avro
第三次insert后,总共10个元数据文件,比上次增加三个文件。每种文件类型增加一个。
[root@hadoop103 ~]# hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata
Found 10 items
-rw-r--r-- 2 root supergroup 1846 2022-02-11 11:22 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00000-af035f2a-6178-4b06-8c55-3182ecfec103.metadata.json
-rw-r--r-- 2 root supergroup 2812 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00001-a058dcf4-4394-4691-a508-212290920ef2.metadata.json
-rw-r--r-- 2 root supergroup 3813 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00002-de593422-a794-4acf-a0b8-3edaa444a1eb.metadata.json
-rw-r--r-- 2 root supergroup 4814 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00003-0c1ecbb8-d0b3-4252-9e99-cfe83bf59988.metadata.json
-rw-r--r-- 2 root supergroup 6105 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/3fcfbd0c-072c-4923-83a5-4d3c5e07c297-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/a2adc8c6-779d-4634-9840-ace594f56ac0-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/bb1c1064-fc53-4e82-ab7a-16b3d306ccd6-m0.avro
-rw-r--r-- 2 root supergroup 3906 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-3791422441458827238-1-3fcfbd0c-072c-4923-83a5-4d3c5e07c297.avro
-rw-r--r-- 2 root supergroup 3865 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-8601063452676878887-1-bb1c1064-fc53-4e82-ab7a-16b3d306ccd6.avro
-rw-r--r-- 2 root supergroup 3796 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-9064233795022707236-1-a2adc8c6-779d-4634-9840-ace594f56ac0.avro
第四次inert 13个元数据文件,比上次增加三个文件。每种文件类型增加一个。
[root@hadoop103 ~]# hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata
Found 13 items
-rw-r--r-- 2 root supergroup 1846 2022-02-11 11:22 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00000-af035f2a-6178-4b06-8c55-3182ecfec103.metadata.json
-rw-r--r-- 2 root supergroup 2812 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00001-a058dcf4-4394-4691-a508-212290920ef2.metadata.json
-rw-r--r-- 2 root supergroup 3813 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00002-de593422-a794-4acf-a0b8-3edaa444a1eb.metadata.json
-rw-r--r-- 2 root supergroup 4814 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00003-0c1ecbb8-d0b3-4252-9e99-cfe83bf59988.metadata.json
-rw-r--r-- 2 root supergroup 5811 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00004-c82ed9e4-9b35-4ba1-86de-c45d29a03aea.metadata.json
-rw-r--r-- 2 root supergroup 6105 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/3fcfbd0c-072c-4923-83a5-4d3c5e07c297-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/a2adc8c6-779d-4634-9840-ace594f56ac0-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/bb1c1064-fc53-4e82-ab7a-16b3d306ccd6-m0.avro
-rw-r--r-- 2 root supergroup 6104 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/f36b8532-a2ce-44f6-8fe7-085e82021845-m0.avro
-rw-r--r-- 2 root supergroup 3947 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-233411818671532638-1-f36b8532-a2ce-44f6-8fe7-085e82021845.avro
-rw-r--r-- 2 root supergroup 3906 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-3791422441458827238-1-3fcfbd0c-072c-4923-83a5-4d3c5e07c297.avro
-rw-r--r-- 2 root supergroup 3865 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-8601063452676878887-1-bb1c1064-fc53-4e82-ab7a-16b3d306ccd6.avro
-rw-r--r-- 2 root supergroup 3796 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-9064233795022707236-1-a2adc8c6-779d-4634-9840-ace594f56ac0.avro
第五次 16文件,比上次增加三个文件。每种文件类型增加一个。
[root@hadoop103 ~]# hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata
Found 16 items
-rw-r--r-- 2 root supergroup 1846 2022-02-11 11:22 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00000-af035f2a-6178-4b06-8c55-3182ecfec103.metadata.json
-rw-r--r-- 2 root supergroup 2812 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00001-a058dcf4-4394-4691-a508-212290920ef2.metadata.json
-rw-r--r-- 2 root supergroup 3813 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00002-de593422-a794-4acf-a0b8-3edaa444a1eb.metadata.json
-rw-r--r-- 2 root supergroup 4814 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00003-0c1ecbb8-d0b3-4252-9e99-cfe83bf59988.metadata.json
-rw-r--r-- 2 root supergroup 5811 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00004-c82ed9e4-9b35-4ba1-86de-c45d29a03aea.metadata.json
-rw-r--r-- 2 root supergroup 6808 2022-02-11 12:02 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00005-70f67022-2bcd-4c4b-b956-aaf0e419085a.metadata.json
-rw-r--r-- 2 root supergroup 6105 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/3fcfbd0c-072c-4923-83a5-4d3c5e07c297-m0.avro
-rw-r--r-- 2 root supergroup 6105 2022-02-11 12:02 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/4a55c041-b0e6-43c1-9c24-6216fa9ba44e-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/a2adc8c6-779d-4634-9840-ace594f56ac0-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/bb1c1064-fc53-4e82-ab7a-16b3d306ccd6-m0.avro
-rw-r--r-- 2 root supergroup 6104 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/f36b8532-a2ce-44f6-8fe7-085e82021845-m0.avro
-rw-r--r-- 2 root supergroup 3982 2022-02-11 12:02 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-217916175263432478-1-4a55c041-b0e6-43c1-9c24-6216fa9ba44e.avro
-rw-r--r-- 2 root supergroup 3947 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-233411818671532638-1-f36b8532-a2ce-44f6-8fe7-085e82021845.avro
-rw-r--r-- 2 root supergroup 3906 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-3791422441458827238-1-3fcfbd0c-072c-4923-83a5-4d3c5e07c297.avro
-rw-r--r-- 2 root supergroup 3865 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-8601063452676878887-1-bb1c1064-fc53-4e82-ab7a-16b3d306ccd6.avro
-rw-r--r-- 2 root supergroup 3796 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-9064233795022707236-1-a2adc8c6-779d-4634-9840-ace594f56ac0.avro
第六次18个,增加2个文件。metadata增加新的,删除旧的,snap文件增加,-m0.avro文件增加
[root@hadoop103 ~]# hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata
Found 18 items
-rw-r--r-- 2 root supergroup 2812 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00001-a058dcf4-4394-4691-a508-212290920ef2.metadata.json
-rw-r--r-- 2 root supergroup 3813 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00002-de593422-a794-4acf-a0b8-3edaa444a1eb.metadata.json
-rw-r--r-- 2 root supergroup 4814 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00003-0c1ecbb8-d0b3-4252-9e99-cfe83bf59988.metadata.json
-rw-r--r-- 2 root supergroup 5811 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00004-c82ed9e4-9b35-4ba1-86de-c45d29a03aea.metadata.json
-rw-r--r-- 2 root supergroup 6808 2022-02-11 12:02 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00005-70f67022-2bcd-4c4b-b956-aaf0e419085a.metadata.json
-rw-r--r-- 2 root supergroup 7608 2022-02-11 12:03 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00006-fecf1467-2a10-4428-be41-0ef3e4325efb.metadata.json
-rw-r--r-- 2 root supergroup 6105 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/3fcfbd0c-072c-4923-83a5-4d3c5e07c297-m0.avro
-rw-r--r-- 2 root supergroup 6105 2022-02-11 12:02 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/4a55c041-b0e6-43c1-9c24-6216fa9ba44e-m0.avro
-rw-r--r-- 2 root supergroup 6107 2022-02-11 12:03 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/6658d381-3ea8-4983-bf94-b718196b19b0-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/a2adc8c6-779d-4634-9840-ace594f56ac0-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/bb1c1064-fc53-4e82-ab7a-16b3d306ccd6-m0.avro
-rw-r--r-- 2 root supergroup 6104 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/f36b8532-a2ce-44f6-8fe7-085e82021845-m0.avro
-rw-r--r-- 2 root supergroup 3982 2022-02-11 12:02 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-217916175263432478-1-4a55c041-b0e6-43c1-9c24-6216fa9ba44e.avro
-rw-r--r-- 2 root supergroup 3947 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-233411818671532638-1-f36b8532-a2ce-44f6-8fe7-085e82021845.avro
-rw-r--r-- 2 root supergroup 3906 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-3791422441458827238-1-3fcfbd0c-072c-4923-83a5-4d3c5e07c297.avro
-rw-r--r-- 2 root supergroup 4032 2022-02-11 12:03 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-4596811458566565273-1-6658d381-3ea8-4983-bf94-b718196b19b0.avro
-rw-r--r-- 2 root supergroup 3865 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-8601063452676878887-1-bb1c1064-fc53-4e82-ab7a-16b3d306ccd6.avro
-rw-r--r-- 2 root supergroup 3796 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-9064233795022707236-1-a2adc8c6-779d-4634-9840-ace594f56ac0.avro
…
以下是两次间,执行一个insert,metadata多2个文件。3种文件类型,metadata.json文件,删除最老的,增加新的。维持在6个文件,,snap文件增加,-m0.avro文件增加
[root@hadoop103 ~]# hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata
Found 22 items
-rw-r--r-- 2 root supergroup 4814 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00003-0c1ecbb8-d0b3-4252-9e99-cfe83bf59988.metadata.json
-rw-r--r-- 2 root supergroup 5811 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00004-c82ed9e4-9b35-4ba1-86de-c45d29a03aea.metadata.json
-rw-r--r-- 2 root supergroup 6808 2022-02-11 12:02 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00005-70f67022-2bcd-4c4b-b956-aaf0e419085a.metadata.json
-rw-r--r-- 2 root supergroup 7608 2022-02-11 12:03 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00006-fecf1467-2a10-4428-be41-0ef3e4325efb.metadata.json
-rw-r--r-- 2 root supergroup 8408 2022-02-11 14:09 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00007-af261b67-bcce-4cfd-93d5-b0675ecaa68b.metadata.json
-rw-r--r-- 2 root supergroup 9208 2022-02-11 14:10 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00008-7d22c9fe-4be9-4a08-aba3-4fdc7e64843c.metadata.json
-rw-r--r-- 2 root supergroup 6109 2022-02-11 14:10 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/0e858ef8-8242-4f3c-9351-20eb78973628-m0.avro
-rw-r--r-- 2 root supergroup 6107 2022-02-11 14:09 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/347052a4-047a-4903-a1b9-4cebb9d4474d-m0.avro
-rw-r--r-- 2 root supergroup 6105 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/3fcfbd0c-072c-4923-83a5-4d3c5e07c297-m0.avro
-rw-r--r-- 2 root supergroup 6105 2022-02-11 12:02 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/4a55c041-b0e6-43c1-9c24-6216fa9ba44e-m0.avro
-rw-r--r-- 2 root supergroup 6107 2022-02-11 12:03 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/6658d381-3ea8-4983-bf94-b718196b19b0-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/a2adc8c6-779d-4634-9840-ace594f56ac0-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/bb1c1064-fc53-4e82-ab7a-16b3d306ccd6-m0.avro
-rw-r--r-- 2 root supergroup 6104 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/f36b8532-a2ce-44f6-8fe7-085e82021845-m0.avro
-rw-r--r-- 2 root supergroup 4106 2022-02-11 14:10 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-1792093041816645047-1-0e858ef8-8242-4f3c-9351-20eb78973628.avro
-rw-r--r-- 2 root supergroup 3982 2022-02-11 12:02 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-217916175263432478-1-4a55c041-b0e6-43c1-9c24-6216fa9ba44e.avro
-rw-r--r-- 2 root supergroup 3947 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-233411818671532638-1-f36b8532-a2ce-44f6-8fe7-085e82021845.avro
-rw-r--r-- 2 root supergroup 3906 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-3791422441458827238-1-3fcfbd0c-072c-4923-83a5-4d3c5e07c297.avro
-rw-r--r-- 2 root supergroup 4032 2022-02-11 12:03 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-4596811458566565273-1-6658d381-3ea8-4983-bf94-b718196b19b0.avro
-rw-r--r-- 2 root supergroup 4069 2022-02-11 14:09 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-5266483990222305982-1-347052a4-047a-4903-a1b9-4cebb9d4474d.avro
-rw-r--r-- 2 root supergroup 3865 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-8601063452676878887-1-bb1c1064-fc53-4e82-ab7a-16b3d306ccd6.avro
-rw-r--r-- 2 root supergroup 3796 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-9064233795022707236-1-a2adc8c6-779d-4634-9840-ace594f56ac0.avro
[root@hadoop103 ~]# hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata
Found 24 items
-rw-r--r-- 2 root supergroup 5811 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00004-c82ed9e4-9b35-4ba1-86de-c45d29a03aea.metadata.json
-rw-r--r-- 2 root supergroup 6808 2022-02-11 12:02 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00005-70f67022-2bcd-4c4b-b956-aaf0e419085a.metadata.json
-rw-r--r-- 2 root supergroup 7608 2022-02-11 12:03 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00006-fecf1467-2a10-4428-be41-0ef3e4325efb.metadata.json
-rw-r--r-- 2 root supergroup 8408 2022-02-11 14:09 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00007-af261b67-bcce-4cfd-93d5-b0675ecaa68b.metadata.json
-rw-r--r-- 2 root supergroup 9208 2022-02-11 14:10 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00008-7d22c9fe-4be9-4a08-aba3-4fdc7e64843c.metadata.json
-rw-r--r-- 2 root supergroup 10008 2022-02-11 14:12 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/00009-a4546c6d-6579-4347-a7d7-2cc9404d9fdb.metadata.json
-rw-r--r-- 2 root supergroup 6109 2022-02-11 14:10 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/0e858ef8-8242-4f3c-9351-20eb78973628-m0.avro
-rw-r--r-- 2 root supergroup 6107 2022-02-11 14:09 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/347052a4-047a-4903-a1b9-4cebb9d4474d-m0.avro
-rw-r--r-- 2 root supergroup 6105 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/3fcfbd0c-072c-4923-83a5-4d3c5e07c297-m0.avro
-rw-r--r-- 2 root supergroup 6105 2022-02-11 12:02 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/4a55c041-b0e6-43c1-9c24-6216fa9ba44e-m0.avro
-rw-r--r-- 2 root supergroup 6107 2022-02-11 12:03 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/6658d381-3ea8-4983-bf94-b718196b19b0-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/a2adc8c6-779d-4634-9840-ace594f56ac0-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/bb1c1064-fc53-4e82-ab7a-16b3d306ccd6-m0.avro
-rw-r--r-- 2 root supergroup 6108 2022-02-11 14:12 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/c9477d55-233e-44e2-8ae1-c263faa31100-m0.avro
-rw-r--r-- 2 root supergroup 6104 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/f36b8532-a2ce-44f6-8fe7-085e82021845-m0.avro
-rw-r--r-- 2 root supergroup 4106 2022-02-11 14:10 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-1792093041816645047-1-0e858ef8-8242-4f3c-9351-20eb78973628.avro
-rw-r--r-- 2 root supergroup 3982 2022-02-11 12:02 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-217916175263432478-1-4a55c041-b0e6-43c1-9c24-6216fa9ba44e.avro
-rw-r--r-- 2 root supergroup 3947 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-233411818671532638-1-f36b8532-a2ce-44f6-8fe7-085e82021845.avro
-rw-r--r-- 2 root supergroup 3906 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-3791422441458827238-1-3fcfbd0c-072c-4923-83a5-4d3c5e07c297.avro
-rw-r--r-- 2 root supergroup 4032 2022-02-11 12:03 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-4596811458566565273-1-6658d381-3ea8-4983-bf94-b718196b19b0.avro
-rw-r--r-- 2 root supergroup 4069 2022-02-11 14:09 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-5266483990222305982-1-347052a4-047a-4903-a1b9-4cebb9d4474d.avro
-rw-r--r-- 2 root supergroup 4147 2022-02-11 14:12 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-5340462524091360888-1-c9477d55-233e-44e2-8ae1-c263faa31100.avro
-rw-r--r-- 2 root supergroup 3865 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-8601063452676878887-1-bb1c1064-fc53-4e82-ab7a-16b3d306ccd6.avro
-rw-r--r-- 2 root supergroup 3796 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/metadata/snap-9064233795022707236-1-a2adc8c6-779d-4634-9840-ace594f56ac0.avro
[root@hadoop103 ~]# hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/data/*
Found 9 items
-rw-r--r-- 2 root supergroup 960 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/data/dt=20220101/00000-0-root_20220211115727_ec43b5e2-9ca0-4e5c-be20-fd36c10826c4-job_1642579431487_0027-00001.parquet
-rw-r--r-- 2 root supergroup 959 2022-02-11 11:57 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/data/dt=20220101/00000-0-root_20220211115743_26023d43-60c1-40cb-a7d8-3b0cb2543afd-job_1642579431487_0028-00001.parquet
-rw-r--r-- 2 root supergroup 959 2022-02-11 12:00 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/data/dt=20220101/00000-0-root_20220211115949_c295562a-a9ca-4610-8191-15b3b49a3f1c-job_1642579431487_0029-00001.parquet
-rw-r--r-- 2 root supergroup 959 2022-02-11 12:01 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/data/dt=20220101/00000-0-root_20220211120117_7f8ac48c-8390-4fe8-a2c7-863596d6705e-job_1642579431487_0030-00001.parquet
-rw-r--r-- 2 root supergroup 960 2022-02-11 12:02 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/data/dt=20220101/00000-0-root_20220211120231_ed92b42f-e534-4f86-9cdb-2eed37bb52fe-job_1642579431487_0031-00001.parquet
-rw-r--r-- 2 root supergroup 960 2022-02-11 12:03 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/data/dt=20220101/00000-0-root_20220211120339_e4b268aa-7c13-4e19-a88d-bb442a197a56-job_1642579431487_0032-00001.parquet
-rw-r--r-- 2 root supergroup 960 2022-02-11 14:09 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/data/dt=20220101/00000-0-root_20220211140850_13c3a794-0735-44e6-958d-719ddb48f62a-job_1642579431487_0034-00001.parquet
-rw-r--r-- 2 root supergroup 960 2022-02-11 14:10 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/data/dt=20220101/00000-0-root_20220211141030_5d1ef8ba-5c97-475b-be90-b8534a42f7f9-job_1642579431487_0035-00001.parquet
-rw-r--r-- 2 root supergroup 960 2022-02-11 14:12 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/table_a5/data/dt=20220101/00000-0-root_20220211141154_01168473-ec47-4e20-ace1-2ff6d5fc5eec-job_1642579431487_0036-00001.parquet
[root@hadoop103 ~]#
hive (iceberg_db6)> select * from table_a5;
OK
table_a5.id table_a5.name table_a5.dt
2 apple2 20220101
2 apple2 20220101
7 apple7 20220101
4 apple4 20220101
5 apple5 20220101
8 apple8 20220101
6 apple6 20220101
1 apple1 20220101
3 apple3 20220101
Time taken: 0.34 seconds, Fetched: 9 row(s)
通过配置以下参数,可以控制metadata.json的个数
'write.metadata.delete-after-commit.enabled'='true',
'write.metadata.previous-versions-max'='5'
snap,m0.avro文件还是会不断增加, 控制他们的方法,可以使用合并小文件,并清理snapshot实现。