Apache Hudi 0.12.2发布

长期支持版本

我们的目标是维护 0.12 更长时间,并通过最新的 0.12.x 版本提供稳定版本供用户迁移。 此版本 (0.12.2) 是最新的 0.12 版本。

迁移指南

此版本 (0.12.2) 没有引入任何新的表版本,因此如果您使用的是 0.12.0,则无需迁移。
如果从旧版本迁移,请查看之前发行说明中的迁移指南,特别是0.6.0, 0.9.0, 0.10.0, 0.11.0, and 0.12.0.中的升级说明。

bug修复

0.12.2 版本主要用于错误修复和稳定性。 这些修复跨越许多组件,包括

  • DeltaStreamer
  • 数据类型/模式相关的错误修复
  • Table服务
  • 元数据表
  • Spark SQL
  • Presto 稳定性/性能修复
  • Trino 稳定性/性能修复
  • 元同步
  • Flink 引擎
  • 单元、功能、集成测试和 CI

Release Notes

Sub-task

  • [HUDI-5244] - Fix bugs in schema evolution client with lost operation field and not found schema

Bug

  • [HUDI-3453] - Metadata table throws NPE when scheduling compaction plan
  • [HUDI-3661] - Flink async compaction is not thread safe when use watermark
  • [HUDI-4281] - Using hudi to build a large number of tables in spark on hive causes OOM
  • [HUDI-4588] - Ingestion failing if source column is dropped
  • [HUDI-4855] - Bootstrap table from Deltastreamer cannot be read in Spark
  • [HUDI-4893] - More than 1 splits are created for a single log file for MOR table
  • [HUDI-4898] - for mor table, presto/hive shoud respect payload class during merge parquet file and log file
  • [HUDI-4901] - Add avro version to Flink profiles
  • [HUDI-4946] - merge into with no preCombineField has dup row in only insert
  • [HUDI-4952] - Reading from metadata table could fail when there are no completed commits
  • [HUDI-4966] - Meta sync throws exception if TimestampBasedKeyGenerator is used to generate partition path containing slashes
  • [HUDI-4971] - aws bundle causes class loading issue
  • [HUDI-4975] - datahub sync bundle causes class loading issue
  • [HUDI-4998] - Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work
  • [HUDI-5003] - InLineFileSystem will throw NumberFormatException, cause the type of startOffset is int and out of bounds
  • [HUDI-5007] - Prevent Hudi from reading the entire timeline's when performing a LATEST streaming read
  • [HUDI-5008] - Avoid unset HoodieROTablePathFilter in IncrementalRelation
  • [HUDI-5025] - Rollback failed with log file not found when rollOver in rollback process
  • [HUDI-5041] - lock metric register confict error
  • [HUDI-5057] - Fix msck repair hudi table
  • [HUDI-5058] - The primary key cannot be empty when Flink reads an error from the hudi table
  • [HUDI-5061] - bulk insert operation don't throw other exception except IOE Exception
  • [HUDI-5063] - totalScantime and other run time stats missing from commit metadata
  • [HUDI-5070] - Fix Flaky TestCleaner test : testInsertAndCleanByCommits
  • [HUDI-5076] - Non serializable path used with engineContext with metadata table initialization
  • [HUDI-5087] - Max value read from metatable incorrect
  • [HUDI-5088] - Failed to synchronize the hive metadata of the Flink table
  • [HUDI-5092] - Querying Hudi table throws NoSuchMethodError in Databricks runtime
  • [HUDI-5096] - boolean param is broken in HiveSyncTool
  • [HUDI-5097] - Read 0 records from partitioned table without partition fields in table configs
  • [HUDI-5151] - Flink data skipping doesn't work with ClassNotFoundException of InLineFileSystem
  • [HUDI-5157] - Duplicate partition path for chained hudi tables.
  • [HUDI-5163] - Failure handling w/ spark ds write failures
  • [HUDI-5176] - Incremental source may miss commits if there are inflight commits before completed commits
  • [HUDI-5185] - Compaction run fails with --hoodieConfigs
  • [HUDI-5203] - Debezium payload does not handle null-field cases
  • [HUDI-5228] - Flink table service job fs view conf overwrites the one of writing job
  • [HUDI-5242] - Do not fail Meta sync in Deltastreamer when inline table service fails
  • [HUDI-5251] - Unexpected avro dependency in flink 1.15 bundle
  • [HUDI-5253] - HoodieMergeOnReadTableInputFormat could have duplicate records issue if it contains delta files while still splittable
  • [HUDI-5260] - Insert into sql with strict insert mode and no preCombineField should not overwrite existing records
  • [HUDI-5277] - RunClusteringProcedure can't exit corretly
  • [HUDI-5286] - UnsupportedOperationException throws when enabling filesystem retry
  • [HUDI-5291] - NPE in collumn stats for null values
  • [HUDI-5320] - Spark SQL CTAS does not propagate Table properties to actual SparkSqlWriter
  • [HUDI-5325] - Fix Create Table to propagate properly Metadata Table enabling config
  • [HUDI-5336] - Fix log file parsing to consider "." at the beginning
  • [HUDI-5346] - Fixing performance traps in CTAS
  • [HUDI-5347] - Fix Merge Into performance traps
  • [HUDI-5350] - oom cause compaction event lost
  • [HUDI-5351] - Handle meta fields being disabled in Bulk Insert Partitioners
  • [HUDI-5373] - Different fileids are assigned to the same bucket
  • [HUDI-5375] - Fix re-using of file readers w/ metadata table in FileIndex
  • [HUDI-5393] - Remove the reuse of metadata table writer for flink write client
  • [HUDI-5403] - Input Format class has metadata table enabled for file listing unexpectedly by default
  • [HUDI-5409] - Avoid file index and use fs view cache in COW input format
  • [HUDI-5412] - Send the boostrap event if the JM also rebooted

Improvement

  • [HUDI-4526] - improve spillableMapBasePath disk directory is full
  • [HUDI-4799] - improve analyzer exception tip when can not resolve expression
  • [HUDI-4960] - Upgrade Jetty version for Timeline server
  • [HUDI-4980] - Make avg record size calculated based on commit instant only
  • [HUDI-4995] - Dependency conflicts on apache http with other projects
  • [HUDI-4997] - use jackson-v2 replace jackson-v1 import
  • [HUDI-5002] - Remove deprecated API usage in SparkHoodieHBaseIndex#generateStatement
  • [HUDI-5027] - Replace hardcoded hbase config keys with HbaseConstants
  • [HUDI-5045] - Add tests to integ test to test bulk_insert followed by upsert
  • [HUDI-5066] - Support hoodie source metaclient cache for flink planner
  • [HUDI-5102] - source operator(monitor and reader) support user uid
  • [HUDI-5104] - Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter
  • [HUDI-5111] - Add metadata on read support to integ tests
  • [HUDI-5184] - Remove export PYSPARK_SUBMIT_ARGS="--master local[*]" from HoodiePySparkQuickstart.py
  • [HUDI-5247] - Clean up java client tests
  • [HUDI-5296] - Support disabling schema on read if not required
  • [HUDI-5338] - Adjust coalesce behavior within "NONE" sort mode for bulk insert
  • [HUDI-5344] - Upgrade com.google.protobuf:protobuf-java
  • [HUDI-5345] - Avoid fs.exists calls for metadata table in HFileBootstrapIndex
  • [HUDI-5348] - Cache file slices within MDT reader
  • [HUDI-5357] - Optimize release artifacts' deployment
  • [HUDI-5370] - Properly close file handles for Metadata writer

Test

  • [HUDI-5383] - Test 0.12.2 release branch

Task

  • [HUDI-3287] - Remove unnecessary deps in hudi-kafka-connect
  • [HUDI-5081] - Resources clean-up in hudi-utilities tests
  • [HUDI-5221] - Make the decision for flink sql bucket index case-insensitive
  • [HUDI-5223] - Partial failover for flink
  • [HUDI-5227] - Upgrade Jetty to 9.4.48

你可能感兴趣的:(hudi,apache,hive,大数据)