pandas.read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, skip_footer=0, doublequote=True, delim_whitespace=False, as_recarray=None, compact_ints=None, use_unsigned=None, low_memory=True, buffer_lines=None, memory_map=False, float_precision=None)[source]
filepath_or_buffer : 字符串,文件路径,或者文件句柄,或者字符串IO
字符串可能是一个URL。有效的URL方案包括http、ftp、s3和file。对于文件URL,需要主机名 。例如,本地文件可以是://localhost/path/to/table.csv
sep : 字符串,分割符,默认值为‘,’
分割符的使用。如果sep为None,则C引擎无法自动检测分隔符,但Python解析引擎可以检测,这意味着将使用后者,并通过Python的内置嗅探器csves.niffer自动检测分隔符。此外,长度大于1个字符的分隔符将被解释为正则表达式,并强制使用Python解析引擎。注意,that regexdelimiters are prone to ignoring quoted data。正则表达式示例:
delimiter : 字符串,分割符,默认值为 none
delim_whitespace : 布尔,默认值为 False
' '或
' '
) ,等价于sep='\s+'
. 如果此参数为True,隔离符参数将不发生效力
header :整数,或整数列表,缺省值 ‘infer’
多行列实验总结:In [21]: a = pd.read_csv('t.txt',header=[1,2]) a b c d e dd oo hh ll cc 0 1226 1240 1245 1237 1241 1 1227 1246 1247 1233 1239 In [23]: a = pd.read_csv('t.txt',header=[0,1]) d open h l c a b c d e 0 dd oo hh ll cc 1 1226 1240 1245 1237 1241 2 1227 1246 1247 1233 1239 In [25]: a = pd.read_csv('t.txt',header=[0,2]) In [26]: a Out[26]: d open h l c dd oo hh ll cc 0 1226 1240 1245 1237 1241 1 1227 1246 1247 1233 1239 In [27]: a = pd.read_csv('t.txt',header=[0,1,2]) In [28]: a Out[28]: d open h l c a b c d e dd oo hh ll cc 0 1226 1240 1245 1237 1241 1 1227 1246 1247 1233 1239 In [29]: a = pd.read_csv('t.txt',header=[3]) In [30]: a Out[30]: Empty DataFrame Columns: [1226, 1240, 1245, 1237, 1241] Index: [] In [31]: a = pd.read_csv('t.txt',header=[2]) In [32]: a Out[32]: Empty DataFrame Columns: [dd, oo, hh, ll, cc] Index: [] In [33]: a = pd.read_csv('t.txt',header=[1,2]) In [34]: a Out[34]: a b c d e dd oo hh ll cc 0 1226 1240 1245 1237 1241 1 1227 1246 1247 1233 1239 In [35]: a = pd.read_csv('t.txt',header=[2]) In [36]: a Out[36]: Empty DataFrame Columns: [dd, oo, hh, ll, cc] Index: [] In [37]: a = pd.read_csv('t.txt',header=2) In [38]: a Out[38]: dd oo hh ll cc 0 1226 1240 1245 1237 1241 1 1227 1246 1247 1233 1239 In [40]: a = pd.read_csv('t.txt',header=1) In [41]: a Out[41]: a b c d e 0 dd oo hh ll cc 1 1226 1240 1245 1237 1241 2 1227 1246 1247 1233 1239 In [42]: a = pd.read_csv('t.txt',header=1,names=['date','open','heigh','low','close']) In [43]: a Out[43]: date open heigh low close 0 dd oo hh ll cc 1 1226 1240 1245 1237 1241 2 1227 1246 1247 1233 1239
names : 列名数组,缺省值 None
当header=None时,将使用 names作为列名,如果heander指定特定行,则使用names作为替代
index_col : int or sequence or False, default None
usecols : array-like or callable, default None
返回列的一个子集。如果是数组,所有元素都必须是位置(即,整数索引到文档列中),或者是与列名称相对应的字符串,这些列名要么由用户输入,要么从文档头部行(s)中推断出来。例如,一个有效的数组参数use-cols参数将是[0,1,2]或[' foo ', ' bar ', ' baz ']。
If callable, the callable function will be evaluated against the columnnames, returning names where the callable function evaluates to True. Anexample of a valid callable argument would be
lambda x: x.upper() in['AAA','BBB', 'DDD']
. Using this parameter results in much fasterparsing time and lower memory usage.
as_recarray : boolean, default False
squeeze : boolean, default False
在解析数据之后返回一个NumPy recarray而不是DataFrame。如果设置为True,此选项将优先于这些参数。此外,由于行索引在这种格式中不可用,所以index_col参数将被忽略。
如果解析后的数据只包含一个列,那么返回一个 Series
prefix : str, default None
自动生成的列名编号的前缀,如: ‘X’ for X0, X1, ...
In [11]: b = pd.read_csv('t.txt',prefix='x')
In [12]: b
dd oo hh ll cc
0 1226 1240 1245 1237 1241
1 1227 1246 1247 1233 1239
In [13]: b = pd.read_csv('t.txt',header=None,prefix='x')
In [14]: b
x0 x1 x2 x3 x4
0 dd oo hh ll cc
1 1226 1240 1245 1237 1241
2 1227 1246 1247 1233 1239
mangle_dupe_cols : boolean, default True
重复的列将被指定为' X.0 '…' X。N’,而不是'X”…“X”。如果列中有重复的名称,传递False将导致数据被覆盖
dtype : Type name or dict of column -> type, default None
指定列的数据类型,如:{‘a’: np.float64, ‘b’: np.int32} ,如果指定转换器,它们将被应用于dtype转换
engine : {‘c’, ‘python’}, optional
converters : dict, default None
In [20]: b=pd.read_csv('t.txt') In [21]: b Out[21]: dd oo hh ll cc 0 1226 1240 1245 1237 1241 1 1227 1246 1247 1233 1239 In [23]: def fun(x): ....: x = int(x) - 1000 ....: return x ....: In [30]: b = pd.read_csv('t.txt',converters={1:fun}) In [31]: b Out[31]: dd oo hh ll cc 0 1226 240 1245 1237 1241 1 1227 246 1247 1233 1239 In [32]: b = pd.read_csv('t.txt',converters={'dd':fun}) In [33]: b Out[33]: dd oo hh ll cc 0 226 1240 1245 1237 1241 1 227 1246 1247 1233 1239 In [34]: b = pd.read_csv('t.txt',converters={'dd':fun,'ll':fun}) In [35]: b Out[35]: dd oo hh ll cc 0 226 1240 1245 237 1241 1 227 1246 1247 233 1239 In [36]:
true_values : list, default None (不懂)
Values to consider as True
false_values : list, default None (不懂)
Values to consider as False
skipinitialspace : boolean, default False
skiprows : list-like or integer or callable, default None
If callable, the callable function will be evaluated against the rowindices, returning True if the row should be skipped and False otherwise.An example of a valid callable argument would be
lambda x: x in [0, 2]
. (不懂)
skipfooter : int, default 0
忽略文件末尾处的函数,当 engine = 'c' 时此功能无效
skip_footer : int, default 0
Deprecated since version 0.19.0: Use the skipfooter parameter instead, as they are identical
nrows : int, default None
na_values : scalar, str, list-like, or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specificper-column NA values. By default the following values are interpreted asNaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’,‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’,‘null’.
keep_default_na : bool, default True
Number of rows of file to read. Useful for reading pieces of large files
na_filter : boolean, default True
检测空值,此参数设置为 Falsek可以提供大文件的读取性能
Detect missing value markers (empty strings and the value of na_values). Indata without any NAs, passing na_filter=False can improve the performanceof reading a large file
verbose : boolean, default False
Indicate number of NA values placed in non-numeric columns
skip_blank_lines : boolean, default True
parse_dates : boolean or list of ints or names or list of lists or dict, default False