亲爱的社区小伙伴们,Apache Doris 2.0.3 版本已于 2023 年 12 月 14 日正式发布,该版本对复杂数据类型、统计信息收集、倒排索引、数据湖分析、分布式副本管理等多个功能进行了优化,欢迎大家下载体验。
此外,由衷感谢 104 位贡献者,他们为 Apache Doris 2.0.3 版本提交了超过 1000 个功能优化项以及问题修复,为性能及稳定性提升做出重要贡献。
GitHub下载:Releases · apache/doris · GitHub
官网下载页:Download - Apache Doris
统计信息是 CBO 优化器进行代价估算时的依赖,通过收集统计信息有助于优化器了解数据分布特性、执行计划成本估算以及执行计划选择,用于查询效率的提升。
从 2.0.3 版本开始,Apache Doris 支持了自动统计信息收集,并默认为开启状态。在每次导入事务提交后,Apache Doris 会记录导入事务更新的表信息,并估算表统计信息的健康度。当健康度低于配置参数时,Doris 将自动触发统计信息收集作业。为了降低统计信息作业的资源开销,Apache Doris 会自动采取采样的方式收集统计信息,用户也可根据需求调整参数以获得更准确的数据分布信息。
更多信息请参考:Statistics - Apache Doris
Java UDF、JDBC catalog、Hudi MOR 表等功能支持复杂数据类型
[feature](jni) support complex types in jni framework by AshinGau · Pull Request #24810 · apache/doris · GitHub
[feature](tvf)(jni-avro)jni-avro scanner add complex data types by DongLiang-0 · Pull Request #26236 · apache/doris · GitHub
Paimon catalog 支持复杂数据类型
[feature](paimon)paimon catalog supports complex types by DongLiang-0 · Pull Request #25364 · apache/doris · GitHub
Paimon catalog 支持 Paimon 0.5 版本
[improvement](catalog)compatible with paimon 0.5 by zddr · Pull Request #24985 · apache/doris · GitHub
新优化器支持 BitmapAgg 函数
[feature](fe) add function 'BitmapAgg' in nereids by mrhhsg · Pull Request #25508 · apache/doris · GitHub
支持 SHA 系列摘要函数
[feature](function) Support SHA family functions by zclllyybb · Pull Request #24342 · apache/doris · GitHub
聚合函数 min_by 和 max_by 支持 bitmap 数据类型
[feature](function) support bitmap type in min/max_by agg function by zhangstar333 · Pull Request #25430 · apache/doris · GitHub
增加 milliseconds/microseconds_add/sub/diff 函数
[feature](datetime-func)support milliseconds_add/sub/diff and microseconds_diff by LemonLiTree · Pull Request #24114 · apache/doris · GitHub
增加 json_insert, json_replace, json_set JSON 函数
[feature](json-function) add json_insert, json_replace, json_set functions by xuefengze · Pull Request #24384 · apache/doris · GitHub
在过滤率高的倒排索引 match where 条件和过滤率低的普通 where 条件组合时,大幅降低索引列的 IO
优化经过 where 条件过滤后随机读数据的效率
优化在 JSON 数据类型上使用老的 get_json_xx 函数的性能,提升 2-4 倍
支持配置降低读数据线程的优先级,保证写入的 CPU 资源和实时性
增加返回 largeint 的 uuid-numeric 函数,性能比返回 string 的 uuid 函数快 20 倍
Case when 的性能提升 3 倍
在存储引擎执行中裁剪不必要的谓词计算
支持 count 算子下推到存储层
优化支持 and or 表达式中包含 nullable 类型的计算性能
支持更多场景下 limit 算子提前到 join 前执行的改写,以提升执行效率
增加消除 inline view 中的无用的 order by 算子,以提升执行效率
优化了部分情况下的基数估计和代价模型的准确性,以提升执行效率
优化了 JDBC catalog 的谓词下推逻辑和大小写逻辑
优化了 file cache 的第一次开启后的读取效率
优化 Hive 表 SQL cache 策略,使用 HMS 中存储的分区更新时间作为 cache 是否失效的判断,提高 cache 命中率
优化了 Merge-on-Write compaction 效率
优化了外表查询的线程分配逻辑,降低内存使用
优化 column reader 的内存使用
优化跳过删除分区、colocate group、持续写时均衡失败、冷热分层表不能均衡等;
审计日志插件的配置使用 token 代替明文密码以增强安全性
[Feature](auditloader) Plugin auditloader use auth token to avoid using cleartext passwords in config by zhiqiang-hhhh · Pull Request #26278 · apache/doris · GitHub
log4j 配置安全性增强
[Enhancement](log) Improve Safety and Robustness of Log4j Configuration by zy-kkk · Pull Request #24861 · apache/doris · GitHub
日志中不显示用户敏感信息
[improvement](log) log desensitization without displaying user info by HHoflittlefish777 · Pull Request #26912 · apache/doris · GitHub
修复了 map/struct 对定长 CHAR(n) 没有正确截断的问题
[FIX](collectiontype) fix shrink char column in map/struct by amorynan · Pull Request #25725 · apache/doris · GitHub
修复了 struct 嵌套 map/array 写入失败的问题
[FIX](complextype)fix struct nested complex collection type and and regresstest by amorynan · Pull Request #26973 · apache/doris · GitHub
修复了 count distinct 不支持 array/map/struct 的问题
[FIX](func) fix count distinct do not support arr/map/struct by amorynan · Pull Request #25483 · apache/doris · GitHub
解决 query 中出现 delete 复杂类型之后,升级过程中出现 BE crash 的问题
[FIX](upgrade)fix upgrade for predict column delete collection type will make core by amorynan · Pull Request #26006 · apache/doris · GitHub
修复了 jsonb 在 where 条件中 BE crash 问题
[FIX](jsonb)fix jsonb is not in predict column by amorynan · Pull Request #27325 · apache/doris · GitHub
修复了 outer join 中有 array 类型时 BE crash 的问题
[FIX](resize) fix array and map offsets resize with default value by amorynan · Pull Request #25669 · apache/doris · GitHub
修复 orc 格式 decimal 类型读取错误的问题
[Fix](orc-reader) Fix orc complex types when late materialization was turned on by disabling late materialization in this case. by kaka11chen · Pull Request #26548 · apache/doris · GitHub
[Fix](orc-reader) Fix orc decimal128 scale issue. by kaka11chen · Pull Request #25977 · apache/doris · GitHub
[Fix](orc-reader) Add missing `break` introduced by #26548. by kaka11chen · Pull Request #26633 · apache/doris · GitHub
修复了关闭倒排索引查询时 OR NOT 组合 where 条件结果错误的问题
[Fix](inverted index) fix compound query result error when disable inverted_index_query session variable by airborne12 · Pull Request #26327 · apache/doris · GitHub
修复了空数组的倒排索引写入时 BE crash 的问题
[Fix](inverted index) fix empty array index writer bug by airborne12 · Pull Request #25984 · apache/doris · GitHub
修复输出为空的情况下index compaction BE crash 的问题
[opt](index compaction) optimize checks before index compaction by qidaye · Pull Request #25486 · apache/doris · GitHub
修复新增列没有写入数据时,增加倒排索引 BE crash 的问题
[fix](build index) fix core when build index for a new column which without data by Tanya-W · Pull Request #27276 · apache/doris · GitHub
修复 1.2 版本误建倒排索引后升级 2.0 等情况下倒排索引硬链缺失和泄露的问题
[fix](build index) Fix inverted index hardlink leak and missing problem by xiaokang · Pull Request #26903 · apache/doris · GitHub
修复 group by 语句中包括重复表达式导致 BE crash 的问题
[Bug](materialized-view) add limitation for duplicate expr on materialized view by BiteTheDDDDt · Pull Request #27523 · apache/doris · GitHub
禁止视图创建时 group by 子句中使用 float/doubld 类型
[Bug](materialized-view) add limit for group by with float/double on create mv by BiteTheDDDDt · Pull Request #25823 · apache/doris · GitHub
增强支持了 select 查询命中物化视图的功能
[Bug](materialized-view) enable rewrite on select materialized index with aggregate mode by BiteTheDDDDt · Pull Request #24691 · apache/doris · GitHub
修复当使用了表的 alias 时物化视图不能命中的问题
[Bug](materialized-view) fix not match mv when some alias on agg by BiteTheDDDDt · Pull Request #25321 · apache/doris · GitHub
修复了创建物化视图中使用 percentile_approx 的问题
[Bug](materialized-view) fix some bugs on create mv with percentile_approx by BiteTheDDDDt · Pull Request #26528 · apache/doris · GitHub
修复 table sample 功能在 partition table 上无法正常工作的问题
[fix](planner) Fix sample partition table by xinyiZzz · Pull Request #25912 · apache/doris · GitHub
修复 table sample 指定 tablet 无法工作的问题
[fix](planner) Fix `select table tablet` not effective by xinyiZzz · Pull Request #25378 · apache/doris · GitHub
修复基于主键条件更新的空指针异常
[fix](partial update) Fix NPE when the query statement of an update statement is a point query in `OriginPlanner` by bobhan1 · Pull Request #26881 · apache/doris · GitHub
修复部分列更新字段名大小写问题
[fix](partial update) keep case insensitivity and use the columns' origin names in `partialUpdateCols` in origin planner by bobhan1 · Pull Request #27223 · apache/doris · GitHub
修复 schema change 时 mow 会出现重复 key 的问题
[fix](merge-on-write) fix duplicate key in schema change by liaoxin01 · Pull Request #25705 · apache/doris · GitHub
修复 routine load 一流多表时 unkown slot descriptor 错误
[fix](multi-table) fix unknown source slot descriptor when load multi table by HHoflittlefish777 · Pull Request #25762 · apache/doris · GitHub
修复内存统计并发访问导致 BE crash 问题
[fix](load) add lock in active_memtable_mem_consumption by kaijchen · Pull Request #27101 · apache/doris · GitHub
修复重复取消导入导致 BE crash 的问题
[fix](load) skip cancel already cancelled channels by kaijchen · Pull Request #27111 · apache/doris · GitHub
修复 broker load 时 broker 连接报错问题
[fix](broker-read) refactor broker reading process to avoid null broker connection by TangSiyang2001 · Pull Request #26050 · apache/doris · GitHub
修复 compaction 和 scan 并发下 delete 谓词可能导致查询结果不对的问题
[Bug](ScanNode) Fix potential incorrect query result caused by concurrent NewOlapScanNode initialization and Compaction by platoneko · Pull Request #24638 · apache/doris · GitHub
修复 compaction task 存在时打印大量 stacktrace 日志的问题
[chore](compaction) Do not print the stack trace when the compaction task already exists by Xiaoccer · Pull Request #25597 · apache/doris · GitHub
解决 iceberg 表中包含特殊字符导致查询失败的问题
[fix](iceberg) iceberg use custom method to encode special characters in column name by AshinGau · Pull Request #27108 · apache/doris · GitHub
修复 Hive metastore 不同版本的兼容性问题
[fix](hms) fix compatibility issue of hive metastore client by morningman · Pull Request #27327 · apache/doris · GitHub
修复读取 MaxCompute 分区表错误的问题
[fix](multi-catalog)fix maxcompute partition filter and session creation by wsjz · Pull Request #24911 · apache/doris · GitHub
修复备份到对象存储失败的问题
[fix](backup) fix backup fail on s3 by morningman · Pull Request #25496 · apache/doris · GitHub
[fix](backup) missing use_path_style properties for minio by morningman · Pull Request #25803 · apache/doris · GitHub
修复 JDBC catalog 处理 Oracle 日期类型格式错误的问题
[fix](jdbc catalog) fix handle oracle date format by zy-kkk · Pull Request #25487 · apache/doris · GitHub
修复 JDBC catalog 读取 MySQL 0000-00-00 日期异常的问题
[fix](jdbc catalog) fix mysql zero date by zy-kkk · Pull Request #26569 · apache/doris · GitHub
修复从 MariaDB 读取数据时间类型默认值为 current_timestamp 时空指针异常问题
[fix](multicatalog)fix jdbc catalog current_timestamp default by vinlee19 · Pull Request #25016 · apache/doris · GitHub
修复 JDBC catalog 处理 bitmap 类型时 BE crash 的问题
[fix](jdbc catalog) fix jdbc catalog read bitmap data crash by zy-kkk · Pull Request #25034 · apache/doris · GitHub
[BugFix](JDBC Catalog) fix jdbc catalog query bitmap may cause be core sometimes by GoGoWen · Pull Request #26933 · apache/doris · GitHub
修复了部分场景下分区裁剪错误的问题
[fix](nereids) partition prune fails in case of NOT expression by englefly · Pull Request #27047 · apache/doris · GitHub
[fix](nereids) patition prune is affected by non-paritition-key condition by englefly · Pull Request #26873 · apache/doris · GitHub
[fix](nereids) prune partition bug in pattern ColA <> ColB by englefly · Pull Request #25769 · apache/doris · GitHub
[fix](nereids) temp partition is always pruned by englefly · Pull Request #27636 · apache/doris · GitHub
修复了部分场景下子查询处理不正确的问题
[fix](nereids)only push down subquery in non-window agg functions by starocean999 · Pull Request #26034 · apache/doris · GitHub
[fix](be)fix bug of converting outer join probe block to nullable by starocean999 · Pull Request #25492 · apache/doris · GitHub
[fix](nereids)push down subquery exprs in non-distinct agg functions by starocean999 · Pull Request #25955 · apache/doris · GitHub
[fix](Nereids) fill miss slot in having subquery by XieJiann · Pull Request #27177 · apache/doris · GitHub
修复了部分语义解析的错误
[fix](Nereids) should not replace slot by Alias when do NormalizeSlot by morrySnow · Pull Request #24928 · apache/doris · GitHub
[fix](nereids)fix bug of duplicate name of inline view by starocean999 · Pull Request #25627 · apache/doris · GitHub
修复 right outer/anti join 时,有可能丢失数据的问题
[fix](Nereids) ban right outer, right anti, full outer with bucket shuffle by morrySnow · Pull Request #26529 · apache/doris · GitHub
修复了谓词被错误的下推穿过聚合算子的问题
[fix](Nereids) non-slot filter should not be push through aggregate by morrySnow · Pull Request #25525 · apache/doris · GitHub
修正了部分情况下返回的结果 header 不正确的问题
[opt](Nereids) use correct column label when execute query in FE by morrySnow · Pull Request #25372 · apache/doris · GitHub
包含有 nullsafeEquals 表达式(<=>)作为连接条件时,可以正确对规划出 hash join
[fix](Nereids): NullSafeEqual should be in HashJoinCondition by jackwener · Pull Request #27127 · apache/doris · GitHub
修复了 set operation 算子中无法正确列裁剪的问题
[fix](Nereids) column pruning under union broken unexpectedly by morrySnow · Pull Request #26884 · apache/doris · GitHub
复杂数据类型 array/map/struct 的输出格式改成跟输入格式以及 JSON 规范保持一致,跟之前版本的主要变化是日期和字符串用双引号括起来,array/map 内部的空值显示为 null 而不是 NULL。
[Fix](Serde) Fix content displayed by complex types in MySQL Client by BePPPower · Pull Request #25946 · apache/doris · GitHub
默认情况下,当用户属性 resource_tags.location
没有设置时,只能使用 default 资源组的节点,而之前版本中可以访问任意节点。
[improvement](resource-tag) limit the default user's resource tag to 'default' by morningman · Pull Request #25331 · apache/doris · GitHub
支持 SHOW_VIEW 权限,拥有 SELECT 或 LOAD 权限的用户将不再能够执行 SHOW CREATE VIEW
语句,必须单独授予 SHOW_VIEW 权限。
[improvement](auth) support show view priv by zddr · Pull Request #25370 · apache/doris · GitHub