笔者使用Flink SQL(jdbc connector)将实时数据写入Clickhouse时,查询Flink官方文档发现flink-connector-jdbc仅支持MySQL,PostgreSQL和Derby。无奈只能上手查阅资料,并扩展源码功能解决。
注:1.11.0版本之后flink-connector-jdbc
DataStream
支持了ClickHouse Sink
① github上下载flink项目源码:https://github.com/apache/flink.git
。
下载的flink源码整个项目比较难一次编译成功,这里提供两个快速解决办法,以供参考:
- 将阿里云仓库镜像放入本地maven settings.xml的中(放在mirrors标签里第一个)
<mirror> <id>nexus-aliyunid> <mirrorOf>*,!jeecg,!jeecg-snapshots,!mapr-releasesmirrorOf> <name>Nexus aliyunname> <url>http://maven.aliyun.com/nexus/content/groups/publicurl> mirror> <mirror> <id>mapr-publicid> <mirrorOf>mapr-releasesmirrorOf> <name>mapr-releasesname> <url>https://maven.aliyun.com/repository/mapr-publicurl> mirror>
- 将flink-connector-jdbc子项目中的flink版本改为已发布的稳定版本号(例:1.12.0),就可以仅打包这个子项目。
② 仿照flink-connector-jdbc支持mysql的情况添加Clickhouse相关代码,共需添加三处:ClickHouseDialect
,ClickHouseRowConverter
,修改JdbcDialects
添加ClickHouseDialect()
。话不多说,上代码。
package org.apache.flink.connector.jdbc.dialect;
import org.apache.flink.connector.jdbc.internal.converter.ClickHouseRowConverter;
import org.apache.flink.connector.jdbc.internal.converter.JdbcRowConverter;
import org.apache.flink.table.types.logical.LogicalTypeRoot;
import org.apache.flink.table.types.logical.RowType;
import java.util.Arrays;
import java.util.List;
import java.util.Optional;
/** JDBC dialect for ClickHouse. */
public class ClickHouseDialect extends AbstractDialect {
private static final long serialVersionUID = 1L;
private static final String SQL_DEFAULT_PLACEHOLDER = " :";
private static final int MAX_TIMESTAMP_PRECISION = 6;
private static final int MIN_TIMESTAMP_PRECISION = 1;
private static final int MAX_DECIMAL_PRECISION = 65;
private static final int MIN_DECIMAL_PRECISION = 1;
@Override
public boolean canHandle(String url) {
return url.startsWith("jdbc:clickhouse:");
}
@Override
public JdbcRowConverter getRowConverter(
RowType rowType) {
return new ClickHouseRowConverter(rowType);
}
@Override
public Optional<String> defaultDriverName() {
return Optional.of("ru.yandex.clickhouse.ClickHouseDriver");
}
@Override
public String quoteIdentifier(String identifier) {
return identifier;
}
@Override
public int maxDecimalPrecision() {
return MAX_DECIMAL_PRECISION;
}
@Override
public int minDecimalPrecision() {
return MIN_DECIMAL_PRECISION;
}
@Override
public int maxTimestampPrecision() {
return MAX_TIMESTAMP_PRECISION;
}
@Override
public int minTimestampPrecision() {
return MIN_TIMESTAMP_PRECISION;
}
@Override
public List<LogicalTypeRoot> unsupportedTypes() {
return Arrays.asList(
LogicalTypeRoot.BINARY,
LogicalTypeRoot.TIMESTAMP_WITH_LOCAL_TIME_ZONE,
LogicalTypeRoot.TIMESTAMP_WITH_TIME_ZONE,
LogicalTypeRoot.INTERVAL_YEAR_MONTH,
LogicalTypeRoot.INTERVAL_DAY_TIME,
LogicalTypeRoot.ARRAY,
LogicalTypeRoot.MULTISET,
LogicalTypeRoot.MAP,
LogicalTypeRoot.ROW,
LogicalTypeRoot.DISTINCT_TYPE,
LogicalTypeRoot.STRUCTURED_TYPE,
LogicalTypeRoot.NULL,
LogicalTypeRoot.RAW,
LogicalTypeRoot.SYMBOL,
LogicalTypeRoot.UNRESOLVED
);
}
@Override
public String dialectName() {
return "ClickHouse";
}
}
package org.apache.flink.connector.jdbc.internal.converter;
import org.apache.flink.table.types.logical.RowType;
/**
* Runtime converter that responsible to convert between JDBC object and Flink internal object for
* ClickHouse.
*/
public class ClickHouseRowConverter extends AbstractJdbcRowConverter {
private static final long serialVersionUID = 1L;
@Override
public String converterName() {
return "ClickHouse";
}
public ClickHouseRowConverter(RowType rowType) {
super(rowType);
}
}
org.apache.flink.connector.jdbc.dialect.JdbcDialects
public final class JdbcDialects {
private static final List<JdbcDialect> DIALECTS = Arrays.asList(
new DerbyDialect(),
new MySQLDialect(),
new PostgresDialect(),
new ClickHouseDialect(), //这里是上面的ClickHouseDialect的类对象
new OracleDialect()
);
......
......
}
其它支持JDBC的数据源也可使用此方法继续扩展,比如笔者这里的OracleDialect()。
该方法阿里云文档 “ 使用flink-connector-clickhouse写入ClickHouse ” 中有详细描述,可自行查阅。
注:该connector从Flink 1.12版本开始支持,Flink1.11使用会报错。且该插件maven依赖下载和安装会出问题,可直接下载jar包导入项目Libraries中。或者在github上有相关代码,这里就自行寻找不贴链接了。
CREATE TABLE test_rightTable (
id INT,
name STRING,
gender STRING,
age INT,
address STRING,
phone INT
) WITH (
'connector' = 'jdbc',
'url' = 'jdbc:clickhouse://*****:8123/default',
'username' = 'default',
'password' = '******',
'table-name' = 'test_rightTable'
);
CREATE TABLE test_sink_clickhouse_table
logtime STRING,
col1 STRING,
col2 STRING
) WITH (
'connector' = 'clickhouse',
'url' = 'clickhouse://****:8124', //clickhouse的host:port
'table-name' = 'test_sink_clickhouse_table'
)
完整SQL程序可参考Flink 1.11 SQL 快速上手,内含Demo及详细分析和使用过程,亲测可行!
① ClickHouse数据抽取问题
flink-connector-clickhouse只支持加载,不支持抽取,扩展的flink-connector-jdbc是都支持的。
② 库名指定问题
配置url的时候使用clickhouse://****:8124/${database-name}
并不会直接报错,但这里的库名指定是不生效的,如果库名不是default,且报错该表在default下不存在,需要另外使用database-name
参数指定。