结果为:
2016-07-14 22:03:47.969000
2016 7 14
datetime是以毫秒形式存储日期和时间。datetime.timedelta表示两个datetime对象之间的时间差
delta=datetime.datetime(2011,1,7)-datetime.datetime(2008,6,24,8,15)
print delta
print delta.days
print delta.seconds
结果为:
926 days, 15:45:00
926
56700
可以给datetime对象加上(或减去)一个或多个timedelta,这样回产生一个新对象
from datetime import timedelta
start=datetime.datetime(2011,1,7)
print start+timedelta(12)
print start-2*timedelta(12)
结果为:
2011-01-19 00:00:00
2010-12-14 00:00:00
利用str或strftime方法(传入一个格式化字符串),datetime对象和pandas的Timestamp对象可以被格式化为字符串
stamp=datetime.datetime(2011,11,3)
print str(stamp)
print stamp.strftime('%Y-%m-%d')
结果为:
2011-11-03 00:00:00
2011-11-03
datetime.strptime也可以用这些格式化编码将字符串转换为日期
value='2011-01-03'
print datetime.datetime.strptime(value,'%Y-%m-%d')
datestrs=['7/6/2011','8/6/2011']
print [datetime.datetime.strptime(x,'%m/%d/%Y') for x in datestrs]
结果为:
2011-01-03 00:00:00
[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]
datetime.strptime是已知格式进行日期解析最佳方式。但是每次都要编写格式定义是很麻烦的事情,尤其是对于一些常见的日期格式。此时,你可以用dateutil这个第三方包中的parser.parse方法
from dateutil.parser import parse
print parse('2011-01-03')
结果为:
2011-01-03 00:00:00
dateutil可以解析几乎所有人类能够理解的日期表示形式
print parse('Jan 31,1997 10:45 PM')
结果为:
1997-01-31 22:45:00
在国际通用的格式中,日通常出现在月的前面,传入dayfirst=True即可解决这个问题
print parse('6/12/2011',dayfirst=True)
结果为:
2011-12-06 00:00:00
pandas通常是用于处理成组日期的,不管这些日期是DataFrame是轴索引还是列。to_datetime方法可以解析多种不同的日期表示形式
print datestrs
print pd.to_datetime(datestrs)
结果为:
['7/6/2011', '8/6/2011']
DatetimeIndex(['2011-07-06', '2011-08-06'], dtype='datetime64[ns]', freq=None)
它还可以处理缺失值(None、空字符串)
idx=pd.to_datetime(datestrs+[None])
print idx
print idx[2]
print pd.isnull(idx)
结果为:
DatetimeIndex(['2011-07-06', '2011-08-06', 'NaT'], dtype='datetime64[ns]', freq=None)
NaT
[False False True]
NaT是pandas中时间戳数据的NA值
pandas最基本的时间序列类型就是以时间戳为索引的Series
dates=[datetime.datetime(2011,1,2),datetime.datetime(2011,1,5),
datetime.datetime(2011,1,7),datetime.datetime(2011,1,8),
datetime.datetime(2011,1,10),datetime.datetime(2011,1,12)]
ts=Series(np.random.randn(6),index=dates)
print ts
结果为:
2011-01-02 -1.106547
2011-01-05 -0.597101
2011-01-07 0.694952
2011-01-08 0.194699
2011-01-10 0.717888
2011-01-12 2.566585
dtype: float64
这些datetime对象实际上是被放在一个DatetimeIndex中的。现在,变量ts就变成一个TimeSeries了
print type(ts)
print ts.index
结果为:
DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
'2011-01-10', '2011-01-12'],
dtype='datetime64[ns]', freq=None)
跟其他Series一样,不同索引的时间序列之间的算术运算会自动按日期对齐
print ts+ts[::2]
结果为:
2011-01-02 -2.254791
2011-01-05 NaN
2011-01-07 1.692174
2011-01-08 NaN
2011-01-10 2.372520
2011-01-12 NaN
dtype: float64
pandas用NumPy的datetime64数据类型以纳秒形式存储时间戳:
print ts.index.dtype
结果为:
datetime64[ns]
DatetimeIndex中的各个标量值是pandas的Timestamp对象:
stamp=ts.index[0]
print stamp
结果为:
2011-01-02 00:00:00
stamp=ts.index[2]
print ts[stamp]
结果为:
-0.645641748175
还有一种更为方便的用法:传入一个可以被解释的日期的字符串
print ts['1/10/2011']
print ts['20110110']
结果为:
-0.117236376733
-0.117236376733
对于较长的时间序列,只需传入“年”或“年月”即可轻松选取数据的切片
longer_ts=Series(np.random.randn(1000),
index=pd.date_range('1/1/2000',periods=1000))
print longer_ts
结果为:
2000-01-01 1.003063
2000-01-02 -0.947965
2000-01-03 -0.404399
2000-01-04 -0.033821
2000-01-05 -0.654715
2000-01-06 1.223395
2000-01-07 0.149323
2000-01-08 -1.616053
2000-01-09 -1.311268
2000-01-10 0.000778
2000-01-11 -0.441500
2000-01-12 -2.213972
2000-01-13 0.089075
2000-01-14 -0.631931
2000-01-15 0.822742
2000-01-16 -0.722570
2000-01-17 -1.216340
2000-01-18 2.072723
2000-01-19 0.600281
2000-01-20 1.776766
2000-01-21 0.411868
2000-01-22 -0.878912
2000-01-23 -0.374965
2000-01-24 1.062388
2000-01-25 0.638327
2000-01-26 0.061927
2000-01-27 0.393699
2000-01-28 0.188632
2000-01-29 0.854748
2000-01-30 -0.115810
2002-08-28 1.356288
2002-08-29 -1.292846
2002-08-30 -0.009617
2002-08-31 0.329827
2002-09-01 0.956538
2002-09-02 1.236926
2002-09-03 0.471744
2002-09-04 -1.549726
2002-09-05 0.417653
2002-09-06 0.559392
2002-09-07 -0.660213
2002-09-08 0.195481
2002-09-09 -1.455739
2002-09-10 -0.366381
2002-09-11 -0.902717
2002-09-12 -1.042264
2002-09-13 0.350009
2002-09-14 2.763109
2002-09-15 0.198505
2002-09-16 1.475192
2002-09-17 0.881241
2002-09-18 -1.856624
2002-09-19 -1.606431
2002-09-20 0.479071
2002-09-21 -1.040308
2002-09-22 -2.047667
2002-09-23 -0.137599
2002-09-24 1.254294
2002-09-25 -0.672502
2002-09-26 -1.347711
Freq: D, dtype: float64
print longer_ts['2001']
结果为:
2001-01-01 0.186586
2001-01-02 -0.563347
2001-01-03 0.776158
2001-01-04 -0.772014
2001-01-05 0.222589
2001-01-06 0.313157
2001-01-07 -0.303758
2001-01-08 1.311669
2001-01-09 0.870798
2001-01-10 0.151725
2001-01-11 1.112179
2001-01-12 0.235967
2001-01-13 -0.541240
2001-01-14 -0.207337
2001-01-15 0.353214
2001-01-16 0.452525
2001-01-17 0.339655
2001-01-18 -1.049329
2001-01-19 -1.111571
2001-01-20 0.549896
2001-01-21 0.111973
2001-01-22 0.417121
2001-01-23 0.029097
2001-01-24 1.592062
2001-01-25 -0.890549
2001-01-26 0.742432
2001-01-27 -0.109821
2001-01-28 -0.038566
2001-01-29 1.746662
2001-01-30 1.244180
2001-12-02 0.229817
2001-12-03 -0.649869
2001-12-04 -1.451352
2001-12-05 0.806682
2001-12-06 1.204209
2001-12-07 0.584577
2001-12-08 0.266454
2001-12-09 0.093509
2001-12-10 0.462942
2001-12-11 -0.059904
2001-12-12 0.226536
2001-12-13 0.775319
2001-12-14 -1.231252
2001-12-15 -1.829049
2001-12-16 -0.533634
2001-12-17 -1.083400
2001-12-18 0.284043
2001-12-19 1.019876
2001-12-20 1.579437
2001-12-21 -0.164538
2001-12-22 -0.274655
2001-12-23 -0.678584
2001-12-24 0.729727
2001-12-25 -0.019092
2001-12-26 2.027947
2001-12-27 1.181993
2001-12-28 1.032252
2001-12-29 -0.056518
2001-12-30 1.252869
2001-12-31 0.056498
Freq: D, dtype: float64
print longer_ts['2001-05']
结果为:
2001-05-01 0.340483
2001-05-02 -1.607938
2001-05-03 -1.361838
2001-05-04 0.665583
2001-05-05 0.301647
2001-05-06 1.667312
2001-05-07 -0.752533
2001-05-08 0.422559
2001-05-09 -0.109185
2001-05-10 0.320016
2001-05-11 -0.212505
2001-05-12 -1.811861
2001-05-13 -1.037873
2001-05-14 0.380419
2001-05-15 -0.272823
2001-05-16 -0.401167
2001-05-17 2.176063
2001-05-18 0.583590
2001-05-19 0.088130
2001-05-20 -0.310341
2001-05-21 -0.499819
2001-05-22 1.271531
2001-05-23 -0.629907
2001-05-24 1.331061
2001-05-25 -0.589642
2001-05-26 -0.391698
2001-05-27 0.253783
2001-05-28 -0.667759
2001-05-29 0.367114
2001-05-30 0.483991
2001-05-31 0.965554
Freq: D, dtype: float64
通过日期进行切片的方式只对规则Series有效
print ts[datetime.datetime(2011,1,7):]
结果为:
2011-01-07 -0.663891
2011-01-08 0.755214
2011-01-10 0.901104
2011-01-12 -0.561792
dtype: float64
由于大部分时间序列数据都是按照时间先后顺序,因此你也可以用不存在于该时间序列中的时间戳对其进行切片(即范围查询)
print ts
print ts['1/6/2011':'1/11/2011']
结果为:
2011-01-02 1.375670
2011-01-05 1.220767
2011-01-07 -0.943502
2011-01-08 -0.842676
2011-01-10 -0.585317
2011-01-12 -1.123255
dtype: float64
2011-01-07 -0.943502
2011-01-08 -0.842676
2011-01-10 -0.585317
dtype: float64
这里可以传入字符串日期、datetime或Timestamp。这样切片所产生的是源时间序列的视图,跟NumPy数组的切片运算是一样的。还有一个等价的实例方法也可以截取两个日期之间TimeSeries
print ts.truncate(after='1/9/2011')
结果为:
2011-01-02 0.304181
2011-01-05 -1.189177
2011-01-07 0.114558
2011-01-08 2.628653
dtype: float64
上面的操作对DateFrame也有效
dates=pd.date_range('1/1/2000',periods=100,freq='W-WED')
long_df=DataFrame(np.random.randn(100,4),
index=dates,
columns=['Colorado','Texas','New York','Ohio'])
print long_df.ix['5-2001']
结果为:
Colorado Texas New York Ohio
2001-05-02 0.494924 0.349833 1.147751 -0.547143
2001-05-09 -1.877639 0.402772 -1.220391 -1.346147
2001-05-16 0.663365 -0.527024 -0.599353 0.285385
2001-05-23 -1.526113 0.223588 -0.918547 1.953351
2001-05-30 -0.691462 0.805141 1.233492 -0.837095