第一步:Mysql导入数据到Python
第二步:python在clickhouse建表及数据处理,准备导入
第三步:数据导入(数据文件无压缩)
from clickhouse_driver import Client
import types
import time,datetime
from datetime import date
import pymysql
import warnings
import csv
warnings.filterwarnings('ignore')
pos1 = pymysql.connect(host='192.168.1.235',port=3306,user='root',password='123456',db='0001790455_pos',charset="utf8")
pos = pos1.cursor()
client = Client(host='192.168.1.231',database='test6',user='default',password='')
x=1000
while (x <=1000) :
client.execute('DROP TABLE IF EXISTS test')
creattable="""CREATE TABLE test (\
consumption_id UInt64,\
member_id UInt64,\
fans_id UInt64,\
bill_date Date,\
money Float32,\
people_num UInt8,\
dish_name String,\
created_org UInt8,\
open_id String,\
subscribed_time DateTime,\
unsubscribed_time DateTime,\
source_type UInt8,\
sns_type UInt8,\
is_subscribed UInt8\
)ENGINE=MergeTree(bill_date,(consumption_id,created_org),8192)"""
data=[]
start = time.time()
pos.execute("select *from bigtable ")
end = time.time()
print(str(x)+'行mysql数据获取时间',end-start)
readcsv=pos.fetchall()
readcsv=list(readcsv)
for row in readcsv:
row=list(row)
data.append(row)
try:
client.execute(creattable)
start = time.time()
client.execute('INSERT INTO test VALUES', data,types_check=True)
end = time.time()
print(str(x)+'clickhouse插入时间',end-start)
print('')
except Exception as e:
print(e)
x=x*2
时间统计(mysql的数据获取时间,clickhouse的插入时间,mysql插入时间,clickhouse按条插入时间):
1000行: 查询时间:0.07s clickhouse: 0.04s mysql:0.84s clickhouse: 4.83s
2000行: 查询时间:0.13s clickhouse: 0.06s mysql:1.70s clickhouse: 9.68s
4000行: 查询时间:0.26s clickhouse: 0.11s mysql:4.08s clickhouse: 19.86s
8000行: 查询时间:0.55s clickhouse: 0.23s mysql:8.06s ………………
1.6万行: 查询时间:1.14s clickhouse: 0.49s mysql:15.75s ………………
3.2万行: 查询时间:2.13s clickhouse: 0.80s mysql:31.51s ………………
6.4万行: 查询时间:4.29s clickhouse: 1.65s mysql:63.51s ………………
12.8万行: 查询时间:8.87s clickhouse: 3.16s mysql:126s ………………
25.6万行: 查询时间:17.01s clickhouse: 6.46s mysql:252s ………………
51.2万行: 查询时间:33.58s clickhouse: 12.76s mysql:504s ………………
102万行: 查询时间:67.16s clickhouse: 25.65s mysql:1008s ………………
204万行: 查询时间:135.94s clickhouse: 50.67s mysql:2016s ………………
308万行: 查询时间:200s clickhouse: 95s
基本就是线性增长,同时增加的还有python处理数据的时间
到一百万行左右,CPU的占用率急剧升高
————————————————————————————
目前速度比拼的结果是:
mysql通过python直接插入到clickhouse(大量数据一起):耗时360s
mysql存到csv,csv导入到clickhouse:530s
按条插入到clickhouse太慢,不考虑