clickhouse查询表容量方法
#clickhouse-client进入客户端
pda1:)show databases;
pda1:)create database test;
pda1:)use system;
pda1:)show tables;
pda1:) exit;
其余的就是常规的一些sql语句。
pip install clickhouse_driver
from clickhouse_driver import Client
#可以再加个连接超时时间send_receive_timeout=send_receive_timeout
host = "127.0.0.1"
port = 29000
user = "default"
password = "bigdata"
database = "default"
client = Client(host=host, port=port, user=user, password=password,database=database)
print(client.execute("show databases"))
还可以通过 sqlalchemy 去连接,但是默认情况下 sqlalchemy 找不到对应的 dialect,我们需要再安装一个模块:pip install sqlalchemy_clickhouse==0.2.2,安装之后就可以使用了。
指定用户名和密码的形式
from sqlalchemy import create_engine
host = "127.0.0.1"
user = "default"
password = "bigdata"
db = "default"
port = 28123 # http连接端口
engine = create_engine('clickhouse://{user}:{password}@{host}:{port}/{db}'
.format(user=user,
host=host,
password=password,
db=db,
port=port))
re = engine.execute("show databases").fetchall()
print(re)
用户口令中含有特殊字符@的情况下,如何正确链接数据库。
from urllib.parse import quote_plus as urlquote
password = urlquote("ClickhouseTest@190")
python版本要求3.7及以上。
pip install clickhouse-connect
import clickhouse_connect
host = "127.0.0.1"
port = 28123
user = "default"
password = "bigdata"
database = "default"
client = clickhouse_connect.get_client(host=host, port=port, username=user, password=password)
# 要使用 ClickHouse SQL 检索数据,请使用客户端query方法:
result = client.query('show databases')
re = result.result_set
print(re)
操作命令
# 要运行 ClickHouse SQL 命令,请使用客户端command方法:
client.command('CREATE TABLE new_table (key UInt32, value String, metric Float64) ENGINE MergeTree ORDER BY key')
# 要插入批处理数据,请使用insert带有二维行和值数组的客户端方法:
row1 = [1000, 'String Value 1000', 5.233]
row2 = [2000, 'String Value 2000', -107.04]
data = [row1, row2]
client.insert('new_table', data, column_names=['key', 'value', 'metric'])
使用clickhouser_driver
# -*- coding: utf-8 -*-
from clickhouse_driver import Client
#可以再加个连接超时时间send_receive_timeout=send_receive_timeout
host = "127.0.0.1"
port = 29000
user = "default"
password = "bigdata"
database = "test"
client = Client(host=host, port=port, user=user, password=password,database=database)
# (1)创建表
# str_sql = 'CREATE TABLE IF NOT EXISTS test (id Int32, value String) ENGINE = Memory'
# client.execute(str_sql)
# # (2)生成测试数据
# data = [(i, f'value_{i}') for i in range(1000)]
#
# # (3)批量写入ClickHouse
# client.execute('INSERT INTO test (id, value) VALUES', data)
# (4)查询数据
str_query = 'select * from test limit 10'
re = client.execute(str_query)
print(re)
clickhouse 适合批量写入数据,而不适合一条一条写入数据。
批量写入的数据最好都是同一个表的同一个分区下的。
(1)创建数据库test
create database test;
(2)创建表batch
CREATE TABLE test.batch
(
`eday` Date,
`edaytime` DateTime,
`groupname` String,
`pdaaddr` String,
`starttime` DateTime64(3),
`step` Float32,
`data` String,
`db_time` DateTime64(3),
INDEX pdaaddr_idx pdaaddr TYPE minmax GRANULARITY 8192,
INDEX starttime_idx starttime TYPE minmax GRANULARITY 8192
)
ENGINE = MergeTree
PARTITION BY eday
PRIMARY KEY (pdaaddr,starttime)
ORDER BY (pdaaddr,starttime)
TTL edaytime + toIntervalHour(6)
SETTINGS index_granularity = 8192,
min_rows_for_compact_part = 3;
# -*- coding: utf-8 -*-
from datetime import datetime
from clickhouse_driver import Client
#可以再加个连接超时时间send_receive_timeout=send_receive_timeout
host = "127.0.0.1"
port = 29000
user = "default"
password = "bigdata"
database = "test"
client = Client(host=host, port=port, user=user, password=password,database=database)
str1 = "2023-06-16 15:30:25.952"
st1 = datetime.strptime(str1, '%Y-%m-%d %H:%M:%S.%f')
# clickhouse根据字段类型会自动将时间戳转换
# (2)生成测试数据
eday = st1 # 2023-06-16
edaytime = st1 # "2023-06-16 15:30:25"
groupname = "testgroup"
pdaaddr = "device1*01*data1"
starttime = st1 # "2023-06-16 15:30:25.952"
step = 778899
data = "78.6742"
db_time = st1 # "2023-06-16 15:30:25.952"
data_list = [eday,edaytime,groupname,pdaaddr,starttime,step,data,db_time]
data_list_batch = [data_list]
# (3)批量写入ClickHouse
client.execute('INSERT INTO batch VALUES', data_list_batch)
-- 查看表大小
SELECT
table AS table_name,
sum(rows) AS row_num,
formatReadableSize(sum(data_uncompressed_bytes)) AS org_size,
formatReadableSize(sum(data_compressed_bytes)) AS compress_size,
round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS compress_ratio
FROM system.parts
WHERE database='test'
GROUP BY table order by sum(rows) desc;
查询表大小的结果如下
table_name row_num org_size compress_size compress_ratio
1万数据项
batch 10000 614.15 KiB 101.95 KiB 17.0
batch 20000 1.20 MiB 203.87 KiB 17.0
batch 30000 1.80 MiB 306.16 KiB 17.0
batch 40000 2.40 MiB 408.11 KiB 17.0
10万数据项
batch 100000 6.08 MiB 1.04 MiB 17.0
batch 200000 12.17 MiB 2.08 MiB 17.0
batch 300000 18.25 MiB 3.11 MiB 17.0
batch 400000 24.33 MiB 4.15 MiB 17.0
100万数据项
batch 1000000 61.78 MiB 10.75 MiB 17.0
batch 2000000 123.56 MiB 21.50 MiB 17.0
batch 3000000 185.33 MiB 32.25 MiB 17.0
10万数据项,1秒钟,占据1M。