clickhouse初探

背景

目前公司用的是influxdb来存储时序数据,但是influxdb太坑了·,查一天的数据就开始内存猛涨,然后就炸了,查询语句也不适应。因此调研了tdengine,还把influxdb和tdengine做了性能对比。
结果嘛 ,首先tdengine确实比influxdb快,内存也不会暴涨,sql用着也顺手,但是tdengine集群模式才刚起步。很多问题得不到解答,社区不活跃,建了几个微信群,有问题都是在群里猛问,人家也不一定回答。感觉有点小家子气,谁整天关注群消息啊···
总之,最后哪个也没选,凑合着用吧先,倒是服务器扩容了,算是解决了问题。
最近又看到了clickhouse,用来分析用户画像之类的不错,就学一学,探探路。
之所以写这篇文章,实在是遇到了几个新手会遇到的坑,所以来分享下,希望能让你也少走点弯路。同时,我也是新学的,内容比较浅显····

照着官网安装、测试都没有问题,就是坑在了用java连接上。

一、远程连接

我在虚拟机上装的CH,在本机测试,不可避免的,要配置下允许远程连接,很多软件都有这个。
配置软件目录是:/etc/clickhouse-server/config.xml
正确的配置方法如下:
clickhouse初探_第1张图片

这两个不动,把上面那个:: 注释去掉
    <listen_host>::1</listen_host>
    <listen_host>127.0.0.1</listen_host>
其实可以看到注释上说,这两个是默认的value,分别对应IPv4 and IPv6的ip。

改成:: ,即可listen所有ip,当然改成装机的ip更好

修改后重启即可
systemctl restart clickhouse-server.service

可以通过查看端口来验证:
修改前:
lsof -i :8123
COMMAND       PID       USER   FD   TYPE     DEVICE SIZE/OFF NODE NAME
clickhous 2901438 clickhouse   53u  IPv6 2962808809      0t0  TCP localhost:8123 (LISTEN)
clickhous 2901438 clickhouse   55u  IPv4 2962808813      0t0  TCP localhost:8123 (LISTEN)

可以看到IPv4 and IPv6 都是绑定的localhost

curl 'http://localhost:8123/'
Ok.
curl 'http://192.168.1.100:8123/'
curl: (7) Failed to connect to 192.168.1.100 port 8123: 拒绝连接

通过localhost查询就可以,通过ip就不行。
修改后就可以了
lsof -i :8123
COMMAND       PID       USER   FD   TYPE     DEVICE SIZE/OFF NODE NAME
clickhous 2923409 clickhouse   58u  IPv6 2963328449      0t0  TCP *:8123 (LISTEN)

curl 'http://192.168.1.100:8123/'
Ok.

如果不配置,代码中的错误大概是:
com.clickhouse.client.ClickHouseException: 拒绝连接 (Connection refused)

如果重启失败,比如:

systemctl start clickhouse-server.service
Job for clickhouse-server.service failed because the service did not take the steps required by its unit configuration.
See "systemctl status clickhouse-server.service" and "journalctl -xe" for details.

可以查看启动日志:/var/log/clickhouse-server/clickhouse-server.err.log
我就遇到了错误:
Address already in use: [::]:9000

这是因为我装了hadoop,把9000占用了。把端口改为9001即可,
客户端连接的时候指定一下端口:

clickhouse-client --port 9001

还有的错误是Address already in use: [::]:8123
这就奇怪了,lsof明明没有查到,那是因为上回启动CH失败后,CH并不会就直接失败了,停了。而是一直启动中的状态

systemctl status clickhouse-server.service
● clickhouse-server.service - ClickHouse Server (analytic DBMS for big data)
   Loaded: loaded (/usr/lib/systemd/system/clickhouse-server.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: protocol) since Sun 2023-06-25 10:34:54 CST; 22s ago
 Main PID: 2903753

所以需要先把CH stop,然后再start

这里说一下CH占用的端口:

    <!-- Port for HTTP API. See also 'https_port' for secure connections.
         This interface is also used by ODBC and JDBC drivers (DataGrip, Dbeaver, ...)
         and by most of web interfaces (embedded UI, Grafana, Redash, ...).
      -->
    <http_port>8123</http_port>

    <!-- Port for interaction by native protocol with:
         - clickhouse-client and other native ClickHouse tools (clickhouse-benchmark, clickhouse-copier);
         - clickhouse-server with other clickhouse-servers for distributed query processing;
         - ClickHouse drivers and applications supporting native protocol
         (this protocol is also informally called as "the TCP protocol");
         See also 'tcp_port_secure' for secure connections.
    -->
    <tcp_port>9000</tcp_port>

    <!-- Compatibility with MySQL protocol.
         ClickHouse will pretend to be MySQL for applications connecting to this port.
    -->
    <mysql_port>9004</mysql_port>

    <!-- Compatibility with PostgreSQL protocol.
         ClickHouse will pretend to be PostgreSQL for applications connecting to this port.
    -->
    <postgresql_port>9005</postgresql_port>

    <!-- HTTP API with TLS (HTTPS).
         You have to configure certificate to enable this interface.
         See the openSSL section below.
    -->
    <!-- <https_port>8443</https_port> -->

    <!-- Native interface with TLS.
         You have to configure certificate to enable this interface.
         See the openSSL section below.
    -->
    <!-- <tcp_port_secure>9440</tcp_port_secure> -->

    <!-- Native interface wrapped with PROXYv1 protocol
         PROXYv1 header sent for every connection.
         ClickHouse will extract information about proxy-forwarded client address from the header.
    -->
    <!-- <tcp_with_proxy_port>9011</tcp_with_proxy_port> -->

    <!-- Port for communication between replicas. Used for data exchange.
         It provides low-level data access between servers.
         This port should not be accessible from untrusted networks.
         See also 'interserver_http_credentials'.
         Data transferred over connections to this port should not go through untrusted networks.
         See also 'interserver_https_port'.
      -->
    <interserver_http_port>9009</interserver_http_port>

8123是http的端口,9000是tcp的端口
java的客户端是用的http协议,因此要用8123,
clickhouse-client用的是tcp协议,因此默认是9000

mysql,postgresql的端口用于,用mysql等连接CH,例如:

mysql --protocol tcp -u default -P 9004

二、java连接

java连接有很多种api,网上搜一搜,大概有三种

官方的:

https://github.com/ClickHouse/clickhouse-java

<dependency>
    <groupId>com.clickhouse</groupId>
    <!-- or clickhouse-grpc-client if you prefer gRPC -->
    <artifactId>clickhouse-http-client</artifactId>
    <version>0.4.6</version>
</dependency>

据说以前是ru.yandex.clickhouse,现在已经不再更新了,目前(2023年6月25日)就是com.clickhouse

还有第三方的比如:
https://github.com/housepower/ClickHouse-Native-JDBC
https://github.com/blynkkk/clickhouse4j

反正都是对CH提供的各种协议接口进行了封装抽象。

这里只说一下官网的api:

  • Java client
  • JDBC Driver
  • R2DBC Driver

java client是基础层,JDBC 和 R2DBC是构建于 client之上的,JDBC是同步的,R2DBC是异步的。性能来说 当然是client更好

先看下官网的client例子:
clickhouse初探_第2张图片
这个例子其实是有点坑的,且不说下面query的例子 语法有点奇怪的问题,上面那个url其实不需要加jdbc:ch://,加了反而报错

16:13:07.163 [ClickHouseScheduler-1] DEBUG com.clickhouse.client.ClickHouseNode - Failed to probe localhost:0
java.net.ConnectException: connect: Address is invalid on local machine, or port is not valid on remote machine

调试了半天也没找到为啥会连接localhost:0。忽然看到代码中有一行
ClickHouseClient client = ClickHouseClient.newInstance(ClickHouseProtocol.HTTP);
既然是http协议,那会不会是jdbc开头了,它识别不出来?,改回http开头,果然,一下就成功了。
这里贴一下运行成功的代码,官网的connect方法也过时了,改成了read


import com.clickhouse.client.*;
import com.clickhouse.data.ClickHouseFormat;
import com.clickhouse.data.ClickHouseRecord;


public class CHClientTest {
    public static void main(String[] args) throws ClickHouseException {
        ClickHouseNodes servers = ClickHouseNodes.of(
                "http://192.168.1.100:8123/tutorial"
                        + "?load_balancing_policy=random&health_check_interval=5000&failover=2");
        ClickHouseClient client = ClickHouseClient.newInstance(ClickHouseProtocol.HTTP);

        ClickHouseResponse response = client.read(servers) // or client.connect(endpoints)
                // you'll have to parse response manually if using a different format
                .format(ClickHouseFormat.RowBinaryWithNamesAndTypes)
                .query("select * from numbers(:limit)")
                .params(1000).executeAndWait();
        ClickHouseResponseSummary summary = response.getSummary();
        long totalRows = summary.getTotalRowsToRead();
        for (ClickHouseRecord r : response.records()) {
            int num = r.getValue(0).asInteger();
            System.out.println(num);
        }

    }
}

再看下jdbc的例子:
clickhouse初探_第3张图片
这里只需要改下getConnection的密码,默认情况下,CH的用户就是default,密码是空字符串

贴一下代码


import com.clickhouse.client.ClickHouseException;
import com.clickhouse.jdbc.ClickHouseDataSource;

import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Properties;

public class CHJDBCTest {
    public static void main(String[] args) throws ClickHouseException, SQLException {


        String url = "jdbc:ch://192.168.1.100:8123/tutorial"; // use http protocol and port 8123 by default
// String url = "jdbc:ch://my-server:8443/system?ssl=true&sslmode=strict&&sslrootcert=/mine.crt";
        Properties properties = new Properties();
// properties.setProperty("ssl", "true");
// properties.setProperty("sslmode", "NONE"); // NONE to trust all servers; STRICT for trusted only
        ClickHouseDataSource dataSource = new ClickHouseDataSource(url, new Properties());
        try (Connection conn = dataSource.getConnection("default", "");
             Statement stmt = conn.createStatement()) {
            ResultSet rs = stmt.executeQuery("SELECT COUNT(*) FROM tutorial.visits_v1");
            while (rs.next()) {
                System.out.println(rs.getBigDecimal(1));
            }
        }
    }
}

顺便一提,try后面的小括号,叫做try-with-resources机制,将实现了 java.lang.AutoCloseable 接口的资源定义在 try 后面的小括号中,无论 try 块是正常结束仍是异常结束,这个资源都会被自动关闭。
try 小括号里面的部分称为 try-with-resources 块。编译器自动帮我们生成了finally块,并且在里面调用了资源的close方法。

好了,至此总结完毕,有新的收获再回来更新(flag一定要立)

你可能感兴趣的:(clickhouse,java,开发语言)