pandas to_sql详解

pandas to_sql api的使用文章已经很多了,但是都只是简单介绍了怎么使用,一些细节问题没有介绍到,这里我们增加说明一些细节问题:
1.列的对应
2.多值插入
3.批量插入

api说明

照常,我们对api参数也做一下详细的介绍,也就是翻译文档:

Parameters
----------
name : str
    Name of SQL table.
con : sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection
    Using SQLAlchemy makes it possible to use any DB supported by that
    library. Legacy support is provided for sqlite3.Connection objects. The user
    is responsible for engine disposal and connection closure for the SQLAlchemy
    connectable See `here \
        <https://docs.sqlalchemy.org/en/13/core/connections.html>`_.

schema : str, optional
    Specify the schema (if database flavor supports this). If None, use
    default schema.
if_exists : {'fail', 'replace', 'append'}, default 'fail'
    How to behave if the table already exists.

    * fail: Raise a ValueError.
    * replace: Drop the table before inserting new values.
    * append: Insert new values to the existing table.

index : bool, default True
    Write DataFrame index as a column. Uses `index_label` as the column
    name in the table.
index_label : str or sequence, default None
    Column label for index column(s). If None is given (default) and
    `index` is True, then the index names are used.
    A sequence should be given if the DataFrame uses MultiIndex.
chunksize : int, optional
    Specify the number of rows in each batch to be written at a time.
    By default, all rows will be written at once.
dtype : dict or scalar, optional
    Specifying the datatype for columns. If a dictionary is used, the
    keys should be the column names and the values should be the
    SQLAlchemy types or strings for the sqlite3 legacy mode. If a
    scalar is provided, it will be applied to all columns.
method : {None, 'multi', callable}, optional
    Controls the SQL insertion clause used:

    * None : Uses standard SQL ``INSERT`` clause (one per row).
    * 'multi': Pass multiple values in a single ``INSERT`` clause.
    * callable with signature ``(pd_table, conn, keys, data_iter)``.

    Details and a sample callable implementation can be found in the
    section :ref:`insert method <io.sql.method>`.

Returns
-------
None or int
    Number of rows affected by to_sql. None is returned if the callable
    passed into ``method`` does not return the number of rows.

翻译:
name:表名
con:数据库连接驱动
schema:指定数据库schema,默认即可
if_exists:如果表存在进行的操作(fail:返回失败,append:追加,replace:替换)
		  这里就有一个细节问题:replace替换是删除表再重新建表插入,之前的表结构会被删除,
		  重建的表只保留df里面的列,以默认数据类型建表,建议不使用replace,
		  虽然它可以自动建表,但是类型不可控,建议还是在数据库层面建表,根据数据情况指定合理的数据类型
index:是否插入index列
index_label:index对应的数据库列,如果index为True,需指定
chunksize:批量插入数据大小,数据量很大时需要设置,不然数据库会发生连接超时失败
dtype:指定每列数据类型,建议不使用,在数据库层面设置
method:不设置的话就是一行一行的组装insert语句,
	   'multi'设置批量插入语句insert values

好了,api翻译完了,接下来说一说细节问题:
1.列的对应
看那么多文档最疑惑的就是列的对应:api是按照dataframe的列名和数据库表的字段名一一对应的,如果是append,追加在相应的列下面,其余列为空

2.多值插入
多值插入时建议设置chunksize,不然容易连接超时,这个和数据库缓存大小有关

3.批量插入
批量插入建议设置method:'multi'

over

你可能感兴趣的:(sql,pandas,python,pandas,sql,python)