Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
Deduplicate
Partial Update
Aggregation
基本概念详见官网链接
tableEnv.executeSql("CREATE CATALOG paimon WITH ('type' = 'paimon', 'warehouse'='file:///D://data')");
tableEnv.executeSql("USE CATALOG paimon");
1:准备数据
Row.ofKind(RowKind.INSERT, 1,183.0,15)
Row.ofKind(RowKind.INSERT, 1,11.9,20)
2:建表
CREATE TABLE if not exists MyTable (
product_id BIGINT,
price DOUBLE,
sales BIGINT,
PRIMARY KEY (product_id) NOT ENFORCED
)
3:先插入第一条数据之后的结果
//Row.ofKind(RowKind.INSERT, 1,183.0,15)
+---------------+--------+---------+
| product_id | price | sales |
+---------------+--------+---------+
| 1 | 183.0 | 15 |
+---------------+--------+---------+
4:插入第二条数据的结果
+----+-------------+---------+-------------+
| op | product_id | price | sales |
+----+-------------+---------+-------------+
| +I | 1 | 11.9 | 20 |
+----+-------------+---------+-------------+
//查询最终结果
+----------------+--------+--------+
| product_id | price | sales |
+----------------+--------+--------+
| 1 | 11.9 | 20 |
+----------------+--------+--------+
5:删除操作
+----+-------------+----------+--------+
| op | product_id | price | sales |
+----+-------------+----------+--------+
| -D | 1 | 11.9 | 20 |
+----+-------------+----------+--------+
//查询结果
Empty set
Process finished with exit code 0
流批都支持
deduplicate 是默认的 merge engine 同一主键只保留最后一条数据,如果最后的操作是删除则被删除。简而言之就是支持update和delete功能。
1:准备数据
Row.ofKind(RowKind.INSERT, "李思思", 22,"男",null,"上海")
Row.ofKind(RowKind.INSERT, "汪呜呜", 1,"女",null,null)
Row.ofKind(RowKind.INSERT, "李思思", 22,"男","中国",null)
Row.ofKind(RowKind.INSERT, "汪呜呜", 1,"女",null,"北京")
2:建表
CREATE TABLE if not exists MyTable(
name STRING,
age INT,
sex STRING,
country STRING,
city STRING,
PRIMARY KEY (name,sex) NOT ENFORCED
) PARTITIONED BY (sex)
WITH (
'merge-engine'='partial-update',
'bucket' = '2'
)
3:结果
//第一次插入数据之后查询结果
+---------+------+-----+-----------+----------+
| name | age | sex | country | city |
+---------+------+-----+-----------+----------+
| 汪呜呜 | 1 | 女 | (NULL) | (NULL) |
| 李思思 | 22 | 男 | (NULL) | 上海 |
+---------+------+-----+-----------+----------+
//第二次插入数据
+----+---------+-------------+------+-----------+------------+
| op | name | age | sex | country | city |
+----+---------+-------------+------+-----------+------------+
| +I | 李思思 | 22 | 男 | 中国 | (NULL) |
| +I | 汪呜呜 | 1 | 女 | (NULL) | 北京 |
+----+---------+-------------+------+-----------+------------+
//查询结果
+-----------+-------------+-------+-----------+--------+
| name | age | sex | country | city |
+-----------+-------------+-------+-----------+--------+
| 李思思 | 22 | 男 | 中国 | 上海 |
| 汪呜呜 | 1 | 女 | (NULL) | 北京 |
+-----------+-------------+-------+-----------+--------+
3:删除数据
//不支持删除数据查询会报错
Caused by: java.lang.IllegalArgumentException: Partial update can not accept delete records. Partial delete is not supported!
//设置'partial-update.ignore-delete'='true'参数忽略删除操作可正常读取
//删除
+----+--------+------+------+---------+----------+
| op | name | age | sex | country | city |
+----+--------+------+------+---------+----------+
| -D | 李思思 | 22 | 男 | 中国 | (NULL) |
+----+--------+------+------+---------+----------+
//查询
|-----------+------+------+------------+--------+
| name | age | sex | country | city |
+-----------+------+------+------------+--------+
| 李思思 | 22 | 男 | 中国 | 上海 |
| 汪呜呜 | 1 | 女 | (NULL) | 北京 |
+-----------+------+------+------------+--------+
批模式
流模式报错:Exception in thread “main” java.lang.RuntimeException: Partial update streaming reading is not supported. You can use ‘lookup’ or ‘full-compaction’ changelog producer to support streaming reading.
支持update操作,但是当最后数据为null时不会进行update,不支持delete操作,不支持流读
在’merge-engine’='partial-update’时配合sequence group实现多流的更新写入
1:准备数据
//第一次插入的数据
Row.ofKind(RowKind.INSERT, "李思思", 22,"男",1,null,1,"上海")
Row.ofKind(RowKind.INSERT, "汪呜呜", 1,"女",1,null,1,null)
//第二次插入的数据
Row.ofKind(RowKind.INSERT, "李思思", 32,"男",null,"中国",2,null)
Row.ofKind(RowKind.INSERT, "汪呜呜", 11,"女",1,"中国",2,"北京")
2:建表
CREATE TABLE if not exists MyTable(
name STRING,
age INT,
sex STRING,
version INT,
country STRING,
version2 INT,
city STRING,
PRIMARY KEY (name,sex) NOT ENFORCED
) PARTITIONED BY (sex)
WITH (
'merge-engine'='partial-update',
'fields.version.sequence-group'='age',
'fields.version2.sequence-group'='country,city',
'bucket' = '2'
)
3:结果
//第一次插入数据之后的查询结果
+-------------+-------------+--------+-------------+------------+-------------+----------+
| name | age | sex | version | country | version2 | city |
+-------------+-------------+--------+-------------+------------+-------------+----------+
| 汪呜呜 | 1 | 女 | 1 | (NULL) | 1 | (NULL) |
| 李思思 | 22 | 男 | 1 | (NULL) | 1 | 上海 |
+-------------+-------------+--------+-------------+------------+-------------+----------+
//插入数据
+----+--------+------+-----+-------------+----------+-------------+---------+
| op | name | age | sex | version | country | version2 | city |
+----+--------+------+-----+-------------+----------+-------------+---------+
| +I | 李思思 | 32 | 男 | (NULL) | 中国 | 2 | (NULL) |
| +I | 汪呜呜 | 11 | 女 | 1 | 中国 | 2 | 北京 |
+----+--------+------+-----+-------------+----------+-------------+---------+
//最终查询结果
+---------+------+------+-------------+----------+-------------+---------+
| name | age | sex | version | country | version2 | city |
+---------+------+------+-------------+----------+-------------+---------+
| 汪呜呜 | 11 | 女 | 1 | 中国 | 2 | 北京 |
| 李思思 | 22 | 男 | 1 | 中国 | 2 | (NULL) |
+---------+------+------+-------------+----------+-------------+---------+
4:删除
同样不支持删除操作
批模式
当sequence-group >=原来的sequence-group是会做update,不支持删除操作,要配合’partial-update.ignore-delete’='true’使用,只支持批读取。
sequence-group 的作用是用来解决并发写的问题,这个后续会专门测试
1:准备数据
Row.ofKind(RowKind.INSERT, 1,1.0,1)
Row.ofKind(RowKind.INSERT, 1,12.9,2)
2:建表
CREATE TABLE if not exists MyTable (
product_id BIGINT,
price DOUBLE,
sales BIGINT,
PRIMARY KEY (product_id) NOT ENFORCED
) WITH (
'merge-engine' = 'aggregation',
'fields.price.aggregate-function' = 'max',
'fields.sales.aggregate-function' = 'sum'
)
3:结果
//插入数据
+----+-------------+-------+-------------+
| op | product_id | price | sales |
+----+-------------+-------+-------------+
| +I | 1 | 1.0 | 1 |
+----+-------------+-------+-------------+
//查询结果
+-------------+---------+-------------+
| product_id | price | sales |
+-------------+---------+-------------+
| 1 | 1.0 | 1 |
+-------------+---------+-------------+
//插入数据
+----+-------------+---------+-------------+
| op | product_id | price | sales |
+----+-------------+---------+-------------+
| +I | 1 | 12.9 | 2 |
+----+-------------+---------+-------------+
//最终结果
+-------------+--------+-------------+
| product_id | price | sales |
+-------------+--------+-------------+
| 1 | 12.9 | 3 |
+-------------+--------+-------------+
4:删除
//不支持删除操作,报如下错
Aggregate function 'max' does not support retraction, If you allow this function to ignore retraction messages, you can configure 'fields.${field_name}.ignore-retract'='true'.
//添加参数忽略删除操作
CREATE TABLE if not exists MyTable (
product_id BIGINT,
price DOUBLE,
sales BIGINT,
PRIMARY KEY (product_id) NOT ENFORCED
) WITH (
'merge-engine' = 'aggregation',
'fields.price.aggregate-function' = 'max',
'fields.price.ignore-retract' = 'true',
'fields.sales.ignore-retract' = 'true',
'fields.sales.aggregate-function' = 'sum'
);
批模式
不支持删除操作,只支持批模式读取,按照指定的聚合函数对指定字段进行聚合