创建数据
随机数据
创建一个Series,pandas可以生成一个默认的索引
s = pd.Series([1,3,5,np.nan,6,8])
通过numpy创建DataFrame,包含一个日期索引,以及标记的列
dates = pd.date_range('20170101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
df
Out[4]:
A B C D
2016-10-10 0.630275 1.081899 -1.594402 -2.571683
2016-10-11 -0.211379 -0.166089 -0.480015 -0.346706
2016-10-12 -0.416171 -0.640860 0.944614 -0.756651
2016-10-13 0.652248 0.186364 0.943509 0.053282
2016-10-14 -0.430867 -0.494919 -0.280717 -1.327491
2016-10-15 0.306519 -2.103769 -0.019832 0.035211
其中,np.random.randn
可以返回一个随机数组
通过Dict创建
df2 = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20130102'),
'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
'D' : np.array([3] * 4,dtype='int32'),
'E' : pd.Categorical(["test","train","test","train"]),
'F' : 'foo' })
Out[20]:
A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo
通过nparray创建
data = [[2000,1,2],
[2001,1,3]
]
df = pd.DataFrame(data,
index=['one','two'],
columns=['year','state','pop'])
# 也可以转置后创建
out = array([data_real_np, ydz_np]).T
df = pd.DataFrame(out)
df.to_csv('final.csv', encoding='utf-8', index=0, header=None)
创建TimeStamp
有几个方法可以构造一个Timestamp对象
- pd.Timestamp
import pandas as pd
from datetime import datetime as dt
p1=pd.Timestamp(2017,6,19)
p2=pd.Timestamp(dt(2017,6,19,hour=9,minute=13,second=45))
p3=pd.Timestamp("2017-6-19 9:13:45")
print("type of p1:",type(p1))
print(p1)
print("type of p2:",type(p2))
print(p2)
print("type of p3:",type(p3))
print(p3)
('type of p1:', )
2017-06-19 00:00:00
('type of p2:', )
2017-06-19 09:13:45
('type of p3:', )
2017-06-19 09:13:45
- to_datetime()
import pandas as pd
from datetime import datetime as dt
p4=pd.to_datetime("2017-6-19 9:13:45")
p5=pd.to_datetime(dt(2017,6,19,hour=9,minute=13,second=45))
print("type of p4:",type(p4))
print(p4)
print("type of p5:",type(p5))
print(p5)
('type of p4:', )
2017-06-19 09:13:45
('type of p5:', )
2017-06-19 09:13:45