pandas.read_csv可以读取CSV(逗号分割)文件、文本类型的文件text、log类型到DataFrame
一、pandas.read_csv常用参数整理
也支持文件的部分导入和选择迭代,更多帮助参见:http://pandas.pydata.org/pandas-docs/stable/io.html
参数:
In [1]: import pandas as pd
#读取out.log文件,其他的参数默认
In [2]: out = pd.read_csv('out.log')
In [3]: out
Out[3]:
book kook
0 joke2 dddd
1 fang3 NaN
2 test1 NaN
3 test2 NaN
4 test3 NaN
5 1997/10/2 NaN
实例2:读取股票数据csv文件,返回DataFrame文件
In [4]: stock = pd.read_csv('000777.csv')
In [5]: stock
Out[5]:
date code closing high low opening pre_closing zde \
0 2017/1/20 '000777 21.17 21.29 20.90 20.90 20.86 0.31
1 2017/1/19 '000777 20.86 21.14 20.82 21.12 21.12 -0.26
2 2017/1/18 '000777 21.12 21.44 21.09 21.40 21.37 -0.25
3 2017/1/17 '000777 21.37 21.49 20.75 21.17 21.15 0.22
4 2017/1/16 '000777 21.15 22.50 20.28 22.50 22.53 -1.38
5 2017/1/13 '000777 22.53 22.88 22.43 22.71 22.85 -0.32
6 2017/1/12 '000777 22.85 23.53 22.75 23.41 23.51 -0.66
In [6]: a = pd.read_csv('out.log',sep = '\s')
C:/Anaconda3/Scripts/ipython-script.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
if __name__ == '__main__':
#设定空值作为分隔符,','不会分割每一行的数据
In [7]: a
Out[7]:
book,kook
0 joke2,dddd
1 fang3
2 test1
3 test2
4 test3
5 1997/10/2
#空值的设定两种方式:' ' or '\s'
In [9]: a = pd.read_csv('out.log',sep = ' ')
In [10]: a
Out[10]:
book,kook
0 joke2,dddd
1 fang3
2 test1
3 test2
4 test3
5 1997/10/2
In [13]: a = pd.read_csv('out.log',sep = ', ',delimiter='o')
#此时sep = ','设定失效
In [14]: a
Out[14]:
b Unnamed: 1 k,k Unnamed: 3 k
0 j ke2,dddd NaN NaN NaN
1 fang3 NaN NaN NaN NaN
2 test1 NaN NaN NaN NaN
3 test2 NaN NaN NaN NaN
4 test3 NaN NaN NaN NaN
5 1997/10/2 NaN NaN NaN NaN
In [20]: a = pd.read_csv('out.log',delim_whitespace = True)
In [21]: a
Out[21]:
book,kook
0 joke2,dddd
1 fang3
2 test1
3 test2
4 test3
5 1997/10/2
In [22]: a = pd.read_csv('out.log',delim_whitespace = True,header = None)
In [23]: a
Out[23]:
0
0 book,kook
1 joke2,dddd
2 fang3
3 test1
4 test2
5 test3
6 1997/10/2
In [32]: a = pd.read_csv('out.log',names='ko')
In [33]: a
Out[33]:
k o
0 book kook
1 joke2 dddd
2 fang3 NaN
3 test1 NaN
4 test2 NaN
5 test3 NaN
6 1997/10/2 NaN
In [45]: a = pd.read_csv('out.log',header=None,prefix='XX',index_col=0)
#指定第一列作为行索引
In [46]: a
Out[46]:
XX1
XX0
book kook
joke2 dddd
fang3 NaN
test1 NaN
test2 NaN
test3 NaN
1997/10/2 NaN
In [47]: a = pd.read_csv('out.log',header=None,prefix='XX',index_col=1)
#指定第二列作为行索引
In [48]: a
Out[48]:
XX0
XX1
kook book
dddd joke2
NaN fang3
NaN test1
NaN test2
NaN test3
NaN 1997/10/2
In [38]: a = pd.read_csv('out.log',header=None,prefix='XX')
In [39]: a
Out[39]:
XX0 XX1
0 book kook
1 joke2 dddd
2 fang3 NaN
3 test1 NaN
4 test2 NaN
5 test3 NaN
6 1997/10/2 NaN
In [49]: a = pd.read_csv('out.log',header=None,prefix='XX',index_col=1,dtype={'XX0':str})
In [50]: a
Out[50]:
XX0
XX1
kook book
dddd joke2
NaN fang3
NaN test1
NaN test2
NaN test3
NaN 1997/10/2
In [51]: a['XX0'].values
Out[51]: array(['book', 'joke2', 'fang3', 'test1', 'test2', 'test3', '1997/10/2'], dtype=object)
In [52]: a['XX0'].values[0]
Out[52]: 'book'
In [53]: type(a['XX0'].values[0])
Out[53]: str
In [54]: a = pd.read_csv('out.log',header=None,prefix='XX',index_col=1,skiprows= 1)
#略去第二行的数据
In [55]: a
Out[55]:
XX0
XX1
dddd joke2
NaN fang3
NaN test1
NaN test2
NaN test3
NaN 1997/10/2
In [56]: a = pd.read_csv('out.log',header=None,prefix='XX',index_col=1,skiprows= 1,nrows=4)
In [57]: a
Out[57]:
XX0
XX1
dddd joke2
NaN fang3
NaN test1
NaN test2