面向OLAP的列式存储DBMS-16-[ClickHouse]python操作ClickHouse

clickhouse查询表容量方法

1 clickhouse常用命令

#clickhouse-client进入客户端
面向OLAP的列式存储DBMS-16-[ClickHouse]python操作ClickHouse_第1张图片
pda1:)show databases;
pda1:)create database test;
pda1:)use system;
pda1:)show tables;
pda1:) exit;
其余的就是常规的一些sql语句。

2 python操作clickhouse

2.1 clickhouse-driver(9000)

pip install clickhouse_driver

from clickhouse_driver import Client
#可以再加个连接超时时间send_receive_timeout=send_receive_timeout
host = "127.0.0.1"
port = 29000
user = "default"
password = "bigdata"
database = "default"
client = Client(host=host, port=port, user=user, password=password,database=database)
print(client.execute("show databases"))

2.2 sqlalchemy_clickhouse(8123)

还可以通过 sqlalchemy 去连接,但是默认情况下 sqlalchemy 找不到对应的 dialect,我们需要再安装一个模块:pip install sqlalchemy_clickhouse==0.2.2,安装之后就可以使用了。
指定用户名和密码的形式

from sqlalchemy import create_engine

host = "127.0.0.1"
user = "default"
password = "bigdata"
db = "default"
port = 28123  # http连接端口

engine = create_engine('clickhouse://{user}:{password}@{host}:{port}/{db}'
                .format(user=user,
                        host=host,
                        password=password,
                        db=db,
                        port=port))
re = engine.execute("show databases").fetchall()
print(re)

用户口令中含有特殊字符@的情况下,如何正确链接数据库。

from urllib.parse import quote_plus as urlquote
password = urlquote("ClickhouseTest@190")

2.3 clickhouse-connect

python版本要求3.7及以上。
pip install clickhouse-connect

import clickhouse_connect
host = "127.0.0.1"
port = 28123
user = "default"
password = "bigdata"
database = "default"
client = clickhouse_connect.get_client(host=host, port=port, username=user, password=password)
# 要使用 ClickHouse SQL 检索数据,请使用客户端query方法:
result = client.query('show databases')
re = result.result_set
print(re)

操作命令

# 要运行 ClickHouse SQL 命令,请使用客户端command方法:
client.command('CREATE TABLE new_table (key UInt32, value String, metric Float64) ENGINE MergeTree ORDER BY key')

# 要插入批处理数据,请使用insert带有二维行和值数组的客户端方法:
row1 = [1000, 'String Value 1000', 5.233]
row2 = [2000, 'String Value 2000', -107.04]
data = [row1, row2]
client.insert('new_table', data, column_names=['key', 'value', 'metric'])

3 批量写入

使用clickhouser_driver

# -*- coding: utf-8 -*-
from clickhouse_driver import Client
#可以再加个连接超时时间send_receive_timeout=send_receive_timeout
host = "127.0.0.1"
port = 29000
user = "default"
password = "bigdata"
database = "test"
client = Client(host=host, port=port, user=user, password=password,database=database)
# (1)创建表
# str_sql = 'CREATE TABLE IF NOT EXISTS test (id Int32, value String) ENGINE = Memory'
# client.execute(str_sql)

# # (2)生成测试数据
# data = [(i, f'value_{i}') for i in range(1000)]
#
# # (3)批量写入ClickHouse
# client.execute('INSERT INTO test (id, value) VALUES', data)

# (4)查询数据
str_query = 'select * from test limit 10'
re = client.execute(str_query)
print(re)

4 压力测试

clickhouse 适合批量写入数据,而不适合一条一条写入数据。
批量写入的数据最好都是同一个表的同一个分区下的。

(1)创建数据库test
create database test;
(2)创建表batch
CREATE TABLE test.batch
(
    `eday` Date,
    `edaytime` DateTime,
    `groupname` String,
    `pdaaddr` String,
    `starttime` DateTime64(3),
    `step` Float32,
    `data` String,
    `db_time` DateTime64(3),
    INDEX pdaaddr_idx pdaaddr TYPE minmax GRANULARITY 8192,
    INDEX starttime_idx starttime TYPE minmax GRANULARITY 8192
)
ENGINE = MergeTree
PARTITION BY eday
PRIMARY KEY (pdaaddr,starttime)
ORDER BY (pdaaddr,starttime)
TTL edaytime + toIntervalHour(6)
SETTINGS index_granularity = 8192,
min_rows_for_compact_part = 3;

4.1 模拟写入一条数据

# -*- coding: utf-8 -*-
from datetime import datetime
from clickhouse_driver import Client
#可以再加个连接超时时间send_receive_timeout=send_receive_timeout
host = "127.0.0.1"
port = 29000
user = "default"
password = "bigdata"
database = "test"
client = Client(host=host, port=port, user=user, password=password,database=database)

str1 = "2023-06-16 15:30:25.952"
st1 = datetime.strptime(str1, '%Y-%m-%d %H:%M:%S.%f')

# clickhouse根据字段类型会自动将时间戳转换

# (2)生成测试数据
eday = st1  # 2023-06-16
edaytime = st1  # "2023-06-16 15:30:25"
groupname = "testgroup"
pdaaddr = "device1*01*data1"
starttime = st1  # "2023-06-16 15:30:25.952"
step = 778899
data = "78.6742"
db_time = st1  # "2023-06-16 15:30:25.952"

data_list = [eday,edaytime,groupname,pdaaddr,starttime,step,data,db_time]
data_list_batch = [data_list]

# (3)批量写入ClickHouse
client.execute('INSERT INTO batch VALUES', data_list_batch)

4.2 模拟写入一批数据

-- 查看表大小
SELECT
    table AS table_name,
    sum(rows) AS row_num,
    formatReadableSize(sum(data_uncompressed_bytes)) AS org_size,
    formatReadableSize(sum(data_compressed_bytes)) AS compress_size,
    round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS compress_ratio
FROM system.parts
WHERE database='test'
GROUP BY table order by sum(rows) desc;

查询表大小的结果如下

table_name	row_num		org_size		compress_size	compress_ratio
1万数据项
batch		10000		614.15 KiB		101.95 KiB		17.0
batch		20000		1.20 MiB		203.87 KiB		17.0
batch		30000		1.80 MiB		306.16 KiB		17.0
batch		40000		2.40 MiB		408.11 KiB		17.0
10万数据项
batch		100000		6.08 MiB		1.04 MiB	17.0
batch		200000		12.17 MiB		2.08 MiB	17.0
batch		300000		18.25 MiB		3.11 MiB	17.0
batch		400000		24.33 MiB		4.15 MiB	17.0
100万数据项
batch		1000000		61.78 MiB		10.75 MiB	17.0
batch		2000000		123.56 MiB		21.50 MiB	17.0
batch		3000000		185.33 MiB		32.25 MiB	17.0

10万数据项,1秒钟,占据1M。

你可能感兴趣的:(ClickHouse,clickhouse)