使用python将csv文件批量加载到mysql中

最近在研究怎样使用python将csv文件load到mysql中,因为是测试csv中的数据量很少,只有几条数据。我在网上看了很多使用python将csv文件中的数据加载到mysql中,但是大部分网友分享的都是使用sql的insert命令来将数据插入到mysql中,这样虽然可以实现,但是如果csv中的数据量非常大,上千万,亿万的时候,使用insert命令向mysql中插入数据,速度会很慢很慢。我们知道mysql有个批量加载的命令,叫load,因此我想使用load命令,来将csv文件批量加载到mysql中(使用load命令如果csv文件非常大,建议将csv文件切割成小的csv文件,然后在挨个加载到mysql中)。注:我写的这个博客,中间参考了很多网友的分享,准备加明一下参考链接,但是现在也找不到了

我先把我的python代码贴出来(我在ubunt14.04上装的python版本为 3.5.3):

#!/usr/bin/env python
# coding=utf-8
# load2mysql.py

import csv
import sys
import codecs
import pymysql

#命令传参
csv_filename = sys.argv[1] #csv格式的文件
database = sys.argv[2]     #mysql数据库名
table_name = sys.argv[3]   #mysql表名

file=codecs.open(csv_filename,'r','utf-8')
reader=file.readline()
b=reader.split(',')
colum=''
for a in b:
    colum=colum+a+' varchar(255),'
colum=colum[:-1]
create = 'create table if not exists '+table_name+' '+'('+colum+')'+' DEFAULT CHARSET=utf8'
data = 'LOAD DATA LOCAL INFILE \'' + csv_filename + '\'REPLACE INTO TABLE ' + table_name + ' FIELDS TERMINATED BY \',\' ENCLOSED BY \'\"\' LINES TERMINATED BY \'\n\' IGNORE 1 LINES;'

#连接到mysql,主要一定要加上local_infile=1参数,否则会报错的
conn = pymysql.connect(host='10.10.10.77',port=3306,user='mysql',passwd='mysql',db=database,local_infile=1)
cursor = conn.cursor()
cursor.execute('set names utf8')
cursor.execute('set character_set_connection=utf8')
cursor.execute(create)
cursor.execute(data)

conn.commit()
cursor.close()
conn.close()

执行命令为: $       ./load2mysql.py test.csv ocnn_online csv_test       

命令解释:load2mysql.py 是上面的python文件,  test.csv是csv文件,  ocnn_online是mysql库,csv_test是mysql表

我的csv文件内容是:

login_name,password,mac,id,create_date
zhan,123,13,20,2018-03-14 11:17:16
wang,2345,312,21,2018-03-14 11:17:16
li,,24,22,2018-03-14 11:17:16
zhao,fds,4325,23,2018-03-14 11:17:16
qian,324,\\N,24,2018-03-14 11:17:16

一开始我使用connect连接数据库没有加最后一个参数的时候,一直报错,找了一下午也不知道什么原因,错误信息如下: ./load2mysql.py test.csv ocn_online csv_test
Traceback (most recent call last):
  File "./load2mysql.py", line 37, in
    cursor.execute(data)
▽ File "/usr/local/lib/python3.5/dist-packages/PyMySQL-0.8.1-py3.5.egg/pymysql/cursors.py", line 170, in execute
    result = self._query(query)
  File "/usr/local/lib/python3.5/dist-packages/PyMySQL-0.8.1-py3.5.egg/pymysql/cursors.py", line 328, in _query
    conn.query(q)
  File "/usr/local/lib/python3.5/dist-packages/PyMySQL-0.8.1-py3.5.egg/pymysql/connections.py", line 893, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/usr/local/lib/python3.5/dist-packages/PyMySQL-0.8.1-py3.5.egg/pymysql/connections.py", line 1103, in _read_query_result
    result.read()
  File "/usr/local/lib/python3.5/dist-packages/PyMySQL-0.8.1-py3.5.egg/pymysql/connections.py", line 1396, in read
    first_packet = self.connection._read_packet()
  File "/usr/local/lib/python3.5/dist-packages/PyMySQL-0.8.1-py3.5.egg/pymysql/connections.py", line 1059, in _read_packet
    packet.check_error()
  File "/usr/local/lib/python3.5/dist-packages/PyMySQL-0.8.1-py3.5.egg/pymysql/connections.py", line 384, in check_error
    err.raise_mysql_exception(self._data)
  File "/usr/local/lib/python3.5/dist-packages/PyMySQL-0.8.1-py3.5.egg/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.InternalError: (1148, 'The used command is not allowed with this MySQL version')

一开始我以为是python3.x不支持批量加载命令,后来在网上查了很多资料,看到一个网友的博客上偶然提到说要加最后一个参数,然后我就实了一下,就没有报错了,查看mysql表,数据也写到表里了。简直nice,另关于LOAD的批量加载命令的参数配置,网上也有很多说明,这里我就不说了,还有我这里说个坑,python2.x与python3.x差别太大了,网上很多资料都是python2.x的,然后在python3.x上根本运行不了,心塞

 

你可能感兴趣的:(python)