实践数据湖iceberg 第十三课 metadata比数据文件大很多倍的问题

系列文章目录

实践数据湖iceberg 第一课 入门
实践数据湖iceberg 第二课 iceberg基于hadoop的底层数据格式
实践数据湖iceberg 第三课 在sqlclient中,以sql方式从kafka读数据到iceberg
实践数据湖iceberg 第四课 在sqlclient中,以sql方式从kafka读数据到iceberg(升级版本到flink1.12.7)
实践数据湖iceberg 第五课 hive catalog特点
实践数据湖iceberg 第六课 从kafka写入到iceberg失败问题 解决
实践数据湖iceberg 第七课 实时写入到iceberg
实践数据湖iceberg 第八课 hive与iceberg集成
实践数据湖iceberg 第九课 合并小文件
实践数据湖iceberg 第十课 快照删除
实践数据湖iceberg 第十一课 测试分区表完整流程(造数、建表、合并、删快照)
实践数据湖iceberg 第十二课 catalog是什么
实践数据湖iceberg 第十三课 metadata比数据文件大很多倍的问题
实践数据湖iceberg 第十四课 元数据合并(解决元数据随时间增加而元数据膨胀的问题)


文章目录

  • 系列文章目录
  • 问题提出
  • 出现问题的建表方式
  • iceberg小文件合并后出现的问题(现状)
  • 清理最后一个快照的5分钟前的所有快照代码
  • 总结


问题提出

数据不断写入iceberg, 也进行合并与清理快照,发现快照和manifest文件都被清理,但metadata的文件没有被清理的痕迹

数据文件只有6.3M,数据个数20个,但metadata总大小33.1G,metadata个数8715个, 清理最后一个快照前5分钟的所有数据,发现对数据没影响

问题解决方法? 待后续解决,关注后面更新。。。

出现问题的建表方式

基于hiveCatalog在sqlClient建表,建表语句,具体查看11课。
在第11课结尾中也发现这个问题。单独写一篇文章以显示它的重要性。

iceberg小文件合并后出现的问题(现状)

文件大小

[root@hadoop103 ~]# hadoop fs -du -h   /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/
6.3 M   /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data
33.1 G  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata

文件个数

[root@hadoop101 ~]# hadoop fs -du -h /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data|wc
     21      61    2940
[root@hadoop101 ~]# hadoop fs -du -h /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata|wc
   8715   26144 1246221

metadata目录

-rw-r--r--   2 root supergroup    8118751 2022-01-26 11:19 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08690-b9a3c862-443e-4f6b-a1fc-c17fe3e517dc.metadata.json
-rw-r--r--   2 root supergroup    8119685 2022-01-26 11:20 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08691-34894f4a-d881-4b8f-b228-7adba992a08f.metadata.json
-rw-r--r--   2 root supergroup    8120615 2022-01-26 11:21 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08692-1ce25766-4ca5-473e-945f-3fd848cae5e3.metadata.json
-rw-r--r--   2 root supergroup    8121549 2022-01-26 11:22 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08693-4bd481a5-f32b-4f15-aad7-4cd3a5af6b39.metadata.json
-rw-r--r--   2 root supergroup    8122483 2022-01-26 11:23 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08694-4f3554aa-4db7-443d-bbb9-ac0871ec02da.metadata.json
-rw-r--r--   2 root supergroup    8123417 2022-01-26 11:24 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08695-e8bf9bda-44e7-4624-83a2-d64db09f5660.metadata.json
-rw-r--r--   2 root supergroup    8124351 2022-01-26 11:25 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08696-2b95f1d4-6843-41e6-9e16-77bbe1875b7f.metadata.json
-rw-r--r--   2 root supergroup    8125285 2022-01-26 11:26 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08697-f11c1b8f-f987-4589-8159-521c65328163.metadata.json
-rw-r--r--   2 root supergroup    8126219 2022-01-26 11:27 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08698-fb8b744a-db03-4b80-8612-15de1d6278cc.metadata.json
-rw-r--r--   2 root supergroup    8127153 2022-01-26 11:28 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08699-a6b6683d-d9f1-45a1-a09b-b242a8284b96.metadata.json
-rw-r--r--   2 root supergroup    8128087 2022-01-26 11:29 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08700-cad78b24-8cd7-464f-95fe-296e96bfd648.metadata.json
-rw-r--r--   2 root supergroup    8129021 2022-01-26 11:30 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08701-0f702902-b2ae-4029-b8cd-97b5df0474ff.metadata.json
-rw-r--r--   2 root supergroup    8129955 2022-01-26 11:31 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08702-91dbcc1f-9d40-4662-874e-8f1091c0a52f.metadata.json
-rw-r--r--   2 root supergroup    8130889 2022-01-26 11:32 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08703-2c78ad8f-69ff-408f-afec-8d707ff944e8.metadata.json
-rw-r--r--   2 root supergroup    8131823 2022-01-26 11:33 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08704-84085a27-b185-468f-9c23-2984a9330762.metadata.json
-rw-r--r--   2 root supergroup    8132757 2022-01-26 11:34 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08705-edc7f661-0ed2-4e46-82a0-a2006dd01ad5.metadata.json
-rw-r--r--   2 root supergroup    8133691 2022-01-26 11:35 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08706-9c3378aa-21cb-48bf-be52-70b25ea59308.metadata.json
-rw-r--r--   2 root supergroup    8343948 2022-01-27 11:52 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08707-afd79c3c-e280-45c4-9797-2fa9a4fa27f4.metadata.json
-rw-r--r--   2 root supergroup    8344913 2022-01-27 14:16 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08708-75efd8f6-ba3f-47dc-8b89-b3177c477a62.metadata.json
-rw-r--r--   2 root supergroup    8345875 2022-01-27 14:38 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08709-78209251-777c-4a4f-9292-64cf3f2190ae.metadata.json
-rw-r--r--   2 root supergroup      23219 2022-01-27 15:17 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08710-d69a0a2b-959e-488d-8443-471986f49e32.metadata.json
-rw-r--r--   2 root supergroup       5777 2022-01-27 14:38 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/6c6d7719-74a9-4817-914a-b0df5eb8f6ba-m0.avro
-rw-r--r--   2 root supergroup       6441 2022-01-27 14:38 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/6c6d7719-74a9-4817-914a-b0df5eb8f6ba-m1.avro
-rw-r--r--   2 root supergroup       5771 2022-01-27 14:38 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/6c6d7719-74a9-4817-914a-b0df5eb8f6ba-m2.avro
-rw-r--r--   2 root supergroup       3844 2022-01-27 14:38 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/snap-7762404597294868190-1-6c6d7719-74a9-4817-914a-b0df5eb8f6ba.avro

大小格式化

7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08684-d4af58ae-4967-48a6-ac40-9308a075fe00.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08685-89f09f2f-6cdf-43d8-acc2-79496dcaf18d.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08686-9be5033f-2592-4696-9c2f-5d1d408910c6.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08687-f111331a-599f-4068-9590-e57c76e46c31.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08688-18779a1c-fd2d-43c2-9c62-4d1efb4caed2.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08689-a1bfd5ea-23a1-431b-8208-a82f2561952e.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08690-b9a3c862-443e-4f6b-a1fc-c17fe3e517dc.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08691-34894f4a-d881-4b8f-b228-7adba992a08f.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08692-1ce25766-4ca5-473e-945f-3fd848cae5e3.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08693-4bd481a5-f32b-4f15-aad7-4cd3a5af6b39.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08694-4f3554aa-4db7-443d-bbb9-ac0871ec02da.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08695-e8bf9bda-44e7-4624-83a2-d64db09f5660.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08696-2b95f1d4-6843-41e6-9e16-77bbe1875b7f.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08697-f11c1b8f-f987-4589-8159-521c65328163.metadata.json
7.7 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08698-fb8b744a-db03-4b80-8612-15de1d6278cc.metadata.json
7.8 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08699-a6b6683d-d9f1-45a1-a09b-b242a8284b96.metadata.json
7.8 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08700-cad78b24-8cd7-464f-95fe-296e96bfd648.metadata.json
7.8 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08701-0f702902-b2ae-4029-b8cd-97b5df0474ff.metadata.json
7.8 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08702-91dbcc1f-9d40-4662-874e-8f1091c0a52f.metadata.json
7.8 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08703-2c78ad8f-69ff-408f-afec-8d707ff944e8.metadata.json
7.8 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08704-84085a27-b185-468f-9c23-2984a9330762.metadata.json
7.8 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08705-edc7f661-0ed2-4e46-82a0-a2006dd01ad5.metadata.json
7.8 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08706-9c3378aa-21cb-48bf-be52-70b25ea59308.metadata.json
8.0 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08707-afd79c3c-e280-45c4-9797-2fa9a4fa27f4.metadata.json
8.0 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08708-75efd8f6-ba3f-47dc-8b89-b3177c477a62.metadata.json
8.0 M     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08709-78209251-777c-4a4f-9292-64cf3f2190ae.metadata.json
22.7 K    /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08710-d69a0a2b-959e-488d-8443-471986f49e32.metadata.json
5.6 K     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/6c6d7719-74a9-4817-914a-b0df5eb8f6ba-m0.avro
6.3 K     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/6c6d7719-74a9-4817-914a-b0df5eb8f6ba-m1.avro
5.6 K     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/6c6d7719-74a9-4817-914a-b0df5eb8f6ba-m2.avro
3.8 K     /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/snap-7762404597294868190-1-6c6d7719-74a9-4817-914a-b0df5eb8f6ba.avro

data目录:

[root@hadoop101 ~]# hadoop fs -du -h /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data
169.1 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00000-0-3c21e5b1-54e8-42b1-8bdc-a0b8f1514ee1-00001.parquet
169.0 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00000-0-3c21e5b1-54e8-42b1-8bdc-a0b8f1514ee1-00002.parquet
169.1 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00000-0-3c21e5b1-54e8-42b1-8bdc-a0b8f1514ee1-00003.parquet
3.1 M    /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00000-0-cdcc5019-0c59-41e4-80c6-1d4185455065-00001.parquet
508      /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00000-0-dd8bc29f-831a-4904-830e-2ef56e4a4743-08707.parquet
169.0 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-139af0f5-d3ee-4f35-bd2e-73ce2aaf4792-00001.parquet
169.1 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-139af0f5-d3ee-4f35-bd2e-73ce2aaf4792-00002.parquet
169.1 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-139af0f5-d3ee-4f35-bd2e-73ce2aaf4792-00003.parquet
552      /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00001-0-e9e8a782-fa82-4c4d-9786-c05b8aab251a-08707.parquet
5.9 K    /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00002-0-a0f46641-b14d-4f8b-a16e-4c768bcba775-00109.parquet
169.1 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00002-0-fe001b68-3753-44a7-adb4-63d43c8b3226-00001.parquet
164.7 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00002-0-fe001b68-3753-44a7-adb4-63d43c8b3226-00002.parquet
169.2 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00002-0-fe001b68-3753-44a7-adb4-63d43c8b3226-00003.parquet
169.0 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00002-0-fe001b68-3753-44a7-adb4-63d43c8b3226-00004.parquet
169.2 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00003-0-1d71db79-abf1-4088-9282-bc907e45e262-00001.parquet
169.0 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00003-0-1d71db79-abf1-4088-9282-bc907e45e262-00002.parquet
168.9 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00003-0-1d71db79-abf1-4088-9282-bc907e45e262-00003.parquet
168.9 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00003-0-1d71db79-abf1-4088-9282-bc907e45e262-00004.parquet
527.5 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00004-0-fea6f5d5-759f-4769-9ced-b3ecca214e36-00001.parquet
169.0 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00004-0-fea6f5d5-759f-4769-9ced-b3ecca214e36-00002.parquet
168.8 K  /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/data/00004-0-fea6f5d5-759f-4769-9ced-b3ecca214e36-00003.parquet

清理最后一个快照的5分钟前的所有快照代码

执行合并、清理代码
清理最后一个快照的5分钟前的所有快照


import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
import org.apache.hadoop.conf.Configuration
import org.apache.iceberg.catalog.{Namespace, TableIdentifier}
import org.apache.iceberg.flink.actions.Actions
import org.apache.iceberg.flink.{CatalogLoader, TableLoader}
import org.apache.log4j.{Level, Logger}
import org.slf4j.LoggerFactory

import java.util
import java.util.concurrent.TimeUnit

object FlinkDataStreamSmallFileCompactTest {
  private var logger: org.slf4j.Logger = _

  def main(args: Array[String]): Unit = {
    logger = LoggerFactory.getLogger(this.getClass.getSimpleName)
    Logger.getLogger("org.apache").setLevel(Level.INFO)
    Logger.getLogger("hive.metastore").setLevel(Level.WARN)
    Logger.getLogger("akka").setLevel(Level.WARN)

  
    // hive catalog
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    System.setProperty("HADOOP_USER_NAME", "root")
    val map = new util.HashMap[String, String]()
    map.put("type", "iceberg")
    map.put("catalog-type", "hive")
    map.put("property-version", "2")
    map.put("/warehouse", "/user/hive/warehouse")
    //    map.put("datanucleus.schema.autoCreateTables", "true")
    //    压缩小文件
    //    快照过期处理
    map.put("uri", "thrift://hadoop101:9083")
    val iceberg_catalog = CatalogLoader.hive(
      "hive_catalog6", //catalog名称
      new Configuration(),
      new util.HashMap()
    )
//    val identifier = TableIdentifier.of(Namespace.of("iceberg_db6"), //db名称
//      "behavior_with_date_log_ib") //表名称  behavior_with_date_log_ib   behavior_log_ib6
    val identifier = TableIdentifier.of(Namespace.of("iceberg_db6"), //db名称
      "behavior_log_ib6") //表名称  behavior_with_date_log_ib   behavior_log_ib6
    val loader = TableLoader.fromCatalog(iceberg_catalog, identifier)
    loader.open()
    val table = loader.loadTable()
    Actions.forTable(env, table)
      .rewriteDataFiles
      .maxParallelism(5)
      .targetSizeInBytes(128 * 1024 * 1024)
      .execute
    // 清除5分钟前历史快照
    val snapshot = table.currentSnapshot
     val old = snapshot.timestampMillis - TimeUnit.MINUTES.toMillis(5)
    if (snapshot != null) {
      table.expireSnapshots
        .expireOlderThan(old)
        .commit()
      println(s" behavior_with_date_log_ib 表 清理完成!!!")
    }
  }
}

清理日志:
发现:没有数据被清理

22/02/10 19:48:51 INFO conf.HiveConf: Found configuration file file:/E:/workspace/jt_workspace/iceberg-learning/flink-iceberg-learning/target/classes/hive-site.xml
22/02/10 19:48:51 WARN conf.HiveConf: HiveConf of name hive.metastore.event.db.notification.api.auth does not exist
22/02/10 19:48:51 INFO security.JniBasedUnixGroupsMapping: Error getting groups for root: Unknown error.
22/02/10 19:48:51 WARN security.UserGroupInformation: No groups available for user root
22/02/10 19:48:51 INFO iceberg.BaseMetastoreTableOperations: Refreshing table metadata from new version: hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_log_ib6/metadata/08710-d69a0a2b-959e-488d-8443-471986f49e32.metadata.json
22/02/10 19:48:56 INFO iceberg.BaseMetastoreCatalog: Table loaded by catalog: hive_catalog6.iceberg_db6.behavior_log_ib6
22/02/10 19:48:56 INFO iceberg.BaseTableScan: Scanning table hive_catalog6.iceberg_db6.behavior_log_ib6 snapshot 7762404597294868190 created at 2022-01-27 14:38:10.105 with filter true
22/02/10 19:48:56 INFO iceberg.RemoveSnapshots: Expiring snapshots older than: Thu Jan 27 14:33:10 CST 2022 (1643265190105)
22/02/10 19:48:56 INFO iceberg.BaseMetastoreTableOperations: Nothing to commit.
22/02/10 19:48:56 INFO iceberg.RemoveSnapshots: Committed snapshot changes

其他表删除的日志:

22/02/10 17:44:27 INFO iceberg.RemoveSnapshots: Expired snapshot: BaseSnapshot{id=6848801094803890889, timestamp_ms=1644485336293, operation=append, summary={flink.job-id=78930f941991e19112d3917fd4dd4cb2, flink.max-committed-checkpoint-id=18788, added-data-files=3, added-records=5961, added-files-size=51810, changed-partition-count=2, total-records=4985317, total-files-size=43416360, total-data-files=105, total-delete-files=0, total-position-deletes=0, total-equality-deletes=0}, manifest-list=hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-6848801094803890889-1-d96ba7dc-7ff2-40ad-a582-f33c987a6740.avro, schema-id=0}
22/02/10 17:44:27 INFO iceberg.RemoveSnapshots: Expired snapshot: BaseSnapshot{id=5895976650901516425, timestamp_ms=1644485396286, operation=append, summary={flink.job-id=78930f941991e19112d3917fd4dd4cb2, flink.max-committed-checkpoint-id=18789, added-data-files=2, added-records=5960, added-files-size=50611, changed-partition-count=1, total-records=4991277, total-files-size=43466971, total-data-files=107, total-delete-files=0, total-position-deletes=0, total-equality-deletes=0}, manifest-list=hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-5895976650901516425-1-a9f423cc-0133-4118-9292-016d5227f57a.avro, schema-id=0}
22/02/10 17:44:27 INFO iceberg.RemoveSnapshots: Expired snapshot: BaseSnapshot{id=3903341502082098658, timestamp_ms=1644485457083, operation=append, summary={flink.job-id=78930f941991e19112d3917fd4dd4cb2, flink.max-committed-checkpoint-id=18790, added-data-files=2, added-records=5960, added-files-size=50631, changed-partition-count=1, total-records=4997237, total-files-size=43517602, total-data-files=109, total-delete-files=0, total-position-deletes=0, total-equality-deletes=0}, manifest-list=hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-3903341502082098658-1-7a86b5d3-8c5e-4a9c-96c3-85a0c5fa3df0.avro, schema-id=0}
22/02/10 17:44:27 INFO iceberg.RemoveSnapshots: Expired snapshot: BaseSnapshot{id=1095796975631658317, timestamp_ms=1644485516288, operation=append, summary={flink.job-id=78930f941991e19112d3917fd4dd4cb2, flink.max-committed-checkpoint-id=18791, added-data-files=2, added-records=5959, added-files-size=51052, changed-partition-count=1, total-records=5003196, total-files-size=43568654, total-data-files=111, total-delete-files=0, total-position-deletes=0, total-equality-deletes=0}, manifest-list=hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-1095796975631658317-1-b071bfb7-3109-4a92-972d-c620138f7220.avro, schema-id=0}
22/02/10 17:44:27 INFO iceberg.RemoveSnapshots: Expired snapshot: BaseSnapshot{id=451594432613548689, timestamp_ms=1644485576287, operation=append, summary={flink.job-id=78930f941991e19112d3917fd4dd4cb2, flink.max-committed-checkpoint-id=18792, added-data-files=2, added-records=5959, added-files-size=50810, changed-partition-count=1, total-records=5009155, total-files-size=43619464, total-data-files=113, total-delete-files=0, total-position-deletes=0, total-equality-deletes=0}, manifest-list=hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-451594432613548689-1-4de192bc-1b21-445b-903f-a88137b930c5.avro, schema-id=0}
22/02/10 17:44:27 INFO iceberg.RemoveSnapshots: Expired snapshot: BaseSnapshot{id=22739922463920002, timestamp_ms=1644485636293, operation=append, summary={flink.job-id=78930f941991e19112d3917fd4dd4cb2, flink.max-committed-checkpoint-id=18793, added-data-files=2, added-records=5962, added-files-size=50713, changed-partition-count=1, total-records=5015117, total-files-size=43670177, total-data-files=115, total-delete-files=0, total-position-deletes=0, total-equality-deletes=0}, manifest-list=hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-22739922463920002-1-1c513718-42d8-41a0-82ea-486d2a4a3bbb.avro, schema-id=0}
22/02/10 17:44:27 INFO iceberg.RemoveSnapshots: Expired snapshot: BaseSnapshot{id=5013785705895265232, timestamp_ms=1644485696292, operation=append, summary={flink.job-id=78930f941991e19112d3917fd4dd4cb2, flink.max-committed-checkpoint-id=18794, added-data-files=2, added-records=5961, added-files-size=50652, changed-partition-count=1, total-records=5021078, total-files-size=43720829, total-data-files=117, total-delete-files=0, total-position-deletes=0, total-equality-deletes=0}, manifest-list=hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-5013785705895265232-1-03d4f1b3-c4ee-4217-b0f2-19168a8ed28e.avro, schema-id=0}
22/02/10 17:44:27 INFO iceberg.RemoveSnapshots: Expired snapshot: BaseSnapshot{id=2526947968329093048, timestamp_ms=1644485756306, operation=append, summary={flink.job-id=78930f941991e19112d3917fd4dd4cb2, flink.max-committed-checkpoint-id=18795, added-data-files=4, added-records=5961, added-files-size=52941, changed-partition-count=2, total-records=5027039, total-files-size=43773770, total-data-files=121, total-delete-files=0, total-position-deletes=0, total-equality-deletes=0}, manifest-list=hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-2526947968329093048-1-8336f6a1-7039-41a7-b736-229ce5bcf10a.avro, schema-id=0}
22/02/10 17:44:27 INFO iceberg.RemoveSnapshots: Expired snapshot: BaseSnapshot{id=2484166318625325659, timestamp_ms=1644485816296, operation=append, summary={flink.job-id=78930f941991e19112d3917fd4dd4cb2, flink.max-committed-checkpoint-id=18796, added-data-files=2, added-records=5959, added-files-size=50849, changed-partition-count=1, total-records=5032998, total-files-size=43824619, total-data-files=123, total-delete-files=0, total-position-deletes=0, total-equality-deletes=0}, manifest-list=hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-2484166318625325659-1-02b973a7-2012-4661-9147-145ea82b5126.avro, schema-id=0}
22/02/10 17:44:27 INFO iceberg.RemoveSnapshots: Expired snapshot: BaseSnapshot{id=1992367331685787804, timestamp_ms=1644485876293, operation=append, summary={flink.job-id=78930f941991e19112d3917fd4dd4cb2, flink.max-committed-checkpoint-id=18797, added-data-files=2, added-records=5464, added-files-size=46683, changed-partition-count=1, total-records=5038462, total-files-size=43871302, total-data-files=125, total-delete-files=0, total-position-deletes=0, total-equality-deletes=0}, manifest-list=hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-1992367331685787804-1-c0b03758-41c3-46bc-b157-8e846674b1e2.avro, schema-id=0}
22/02/10 17:44:27 INFO iceberg.RemoveSnapshots: Expired snapshot: BaseSnapshot{id=3398467964620293154, timestamp_ms=1644485936300, operation=append, summary={flink.job-id=78930f941991e19112d3917fd4dd4cb2, flink.max-committed-checkpoint-id=18798, added-data-files=3, added-records=5960, added-files-size=52223, changed-partition-count=2, total-records=5044422, total-files-size=43923525, total-data-files=128, total-delete-files=0, total-position-deletes=0, total-equality-deletes=0}, manifest-list=hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-3398467964620293154-1-d10ea8c5-3986-45b1-bde6-6ed75148dce2.avro, schema-id=0}
22/02/10 17:44:27 INFO iceberg.RemoveSnapshots: Committed snapshot changes; cleaning up expired manifests and data files.
22/02/10 17:44:31 WARN iceberg.RemoveSnapshots: Manifests to delete: hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/278e6825-3381-47aa-a08b-4d86a1a0f0e6-m0.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/c00444d4-86e4-4df9-b7b1-29bc15e203a5-m0.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/9380e713-bd4a-41b4-9140-704a7624d2bf-m5.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/9380e713-bd4a-41b4-9140-704a7624d2bf-m1.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/9380e713-bd4a-41b4-9140-704a7624d2bf-m4.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/c7fb0523-a144-4bcd-89f3-56c0984561d1-m21.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/92f9c63e-bc85-4965-9a75-b346fe797ad9-m0.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/9380e713-bd4a-41b4-9140-704a7624d2bf-m3.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/9380e713-bd4a-41b4-9140-704a7624d2bf-m0.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/b98bd620-63f7-4cc5-8b77-1c6b4ba1cf95-m0.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/9380e713-bd4a-41b4-9140-704a7624d2bf-m6.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/836c917c-2207-400a-a74c-edc562a9603a-m0.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/9305f3e1-9e54-4499-ae7e-8bacc7816c31-m0.avro
22/02/10 17:44:31 WARN iceberg.RemoveSnapshots: Manifests Lists to delete: hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-541878440800103826-1-1b8107b6-6f58-41b3-bca3-21bf624c4719.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-7791706873901858756-1-b98bd620-63f7-4cc5-8b77-1c6b4ba1cf95.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-5657396929463700436-1-c00444d4-86e4-4df9-b7b1-29bc15e203a5.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-3398467964620293154-1-d10ea8c5-3986-45b1-bde6-6ed75148dce2.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-6570416090976553560-1-e329179f-e202-41f6-852e-2585b46eee2e.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-4278660516617569111-1-d6a355d7-f7d4-4be6-b640-674aedea38d0.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-7075499429808392849-1-9ade184e-f771-4413-81a8-a968785638f9.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-2342072444431983976-1-8a281c33-2d42-4828-adb9-0fcbc49cbacd.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-1095796975631658317-1-b071bfb7-3109-4a92-972d-c620138f7220.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-6005578883465127048-1-7ccf0fb1-9472-4ec4-8198-0dc6b911bdf7.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-5532078125138954836-1-6bb16325-09df-4638-8a61-e02a6f5e53f6.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-4262237804586276768-1-e7c26525-29d1-4a1a-867e-cd9790a55068.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-1722651238361119409-1-df33539d-c13f-4d07-aa7e-657b42df1f78.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-22739922463920002-1-1c513718-42d8-41a0-82ea-486d2a4a3bbb.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-468106048969373971-1-5d4446d8-d779-426a-8243-8b857383fd3e.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-8018294029736388458-1-c208faf8-7d30-474a-880c-8191db9cd448.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-1470299392035948712-1-d9dd390e-ee51-4ffd-ae30-e91a6d019757.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-1992367331685787804-1-c0b03758-41c3-46bc-b157-8e846674b1e2.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-7058941970938557666-1-9602f8d7-4638-4d23-af5b-64d3382e1644.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-5361802753278781380-1-9b81fdec-ecee-4330-8541-aea40c878268.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-2183998972431095493-1-1b41dec9-c9e6-441e-bdfd-a5cbd52b11fc.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-1078972720570425309-1-be9504a7-fb98-48df-9999-c2857d856af7.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-3495751966676651473-1-5dfc4f7b-c16d-4429-8182-a375db8ec903.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-2843486521572234923-1-754d9e82-2175-4762-8737-d95ae98200d4.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-8309783644936857381-1-9305f3e1-9e54-4499-ae7e-8bacc7816c31.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-2484166318625325659-1-02b973a7-2012-4661-9147-145ea82b5126.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-6848801094803890889-1-d96ba7dc-7ff2-40ad-a582-f33c987a6740.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-1559852676159610002-1-615b94f7-5a3a-42a3-9181-1ad9a2425427.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-5487640863335657501-1-cf9145ca-9184-4095-9af5-625307270cde.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-3606588897957810627-1-92f9c63e-bc85-4965-9a75-b346fe797ad9.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-6539673233134379517-1-345c9b77-be31-449c-a67e-970b80078069.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-5895976650901516425-1-a9f423cc-0133-4118-9292-016d5227f57a.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-3903341502082098658-1-7a86b5d3-8c5e-4a9c-96c3-85a0c5fa3df0.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-9158469320395181971-1-7674e415-c2f5-4566-b251-20c2636dfc1f.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-2526947968329093048-1-8336f6a1-7039-41a7-b736-229ce5bcf10a.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-7955617778669899471-1-1049aed0-3215-4267-82fe-e37df441957f.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-7923280809105826466-1-40b70bd0-e8fb-4186-8c1f-97a427649160.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-878441999283792062-1-68955262-5444-4898-8d98-f93736abcd9b.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-4364834558723325257-1-00fb7f19-4224-4da8-b1f0-d85ed241d7eb.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-451594432613548689-1-4de192bc-1b21-445b-903f-a88137b930c5.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-5974895447555666685-1-36baa0b7-0f1f-4e1c-9595-69179fb09aa9.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-6532101506813450600-1-5898b192-1821-48bd-9c6c-98cd496ba37a.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-8489936993001945197-1-353c552c-1595-495f-8e44-641f47ebf250.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-8127473867318873076-1-f0905056-841f-496b-a6fd-133ca6f121d2.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-3511541622291330360-1-9380e713-bd4a-41b4-9140-704a7624d2bf.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-4031948148957742647-1-278e6825-3381-47aa-a08b-4d86a1a0f0e6.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-3212606031402422010-1-a70f8e62-8c34-432e-b752-9063ed2c902f.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-5013785705895265232-1-03d4f1b3-c4ee-4217-b0f2-19168a8ed28e.avro, hdfs://ns/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/snap-6026165152411827559-1-5ca27510-3e7c-446e-8f69-eddd80bb2b66.avro
 behavior_with_date_log_ib 表 清理完成!!!

Process finished with exit code 0


总结

iceberg的文件合并与快照删除特点:

合并:会生成新的文件
快照删除:会删除snap和Manifests 文件,metadata文件没有合并,并清理老metadata

你可能感兴趣的:(iceberg,kafka,hive,big,data)