《Python for Data Analysis》
In [2]: from datetime import datetime
In [3]: now = datetime.now()
In [4]: now
Out[4]: datetime.datetime(2017, 5, 25, 13, 55, 30, 39000)
1: 利用datetime的strftime和strptime方法转换字符串和日期。
In [5]: stamp = datetime(2011, 1, 3)
In [6]: str(stamp)
Out[6]: '2011-01-03 00:00:00'
In [7]: stamp.strftime('%Y-%m-%d')
Out[7]: '2011-01-03'
In [10]: datetime.strptime('2011-01-03', '%Y-%m-%d')
Out[10]: datetime.datetime(2011, 1, 3, 0, 0)
In [11]: datestrs = ['7/6/2011', '8/6/2011']
In [12]: [datetime.strptime(x,'%m/%d/%Y') for x in datestrs]
Out[12]: [datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]
2: dateutil可以解析几乎所有人类能够理解的日期表示形式(中文不行)。
In [13]: from dateutil.parser import parse
In [14]: parse('2011-01-03')
Out[14]: datetime.datetime(2011, 1, 3, 0, 0)
# 在国际通用格式,日通常在月的前面
In [15]: parse('6/12/2011', dayfirst=True)
Out[15]: datetime.datetime(2011, 12, 6, 0, 0)
In [16]: parse('6/12/2011')
Out[16]: datetime.datetime(2011, 6, 12, 0, 0)
In [17]: parse('Jan 31, 1998 10:45 PM')
Out[17]: datetime.datetime(1998, 1, 31, 22, 45)
3: pandas.to_datetime()方法,通常用于处理++成组日期++,
In [22]: datestrs = ['7/6/2011', '8/6/2011']
In [23]: pd.to_datetime(datestrs)
Out[23]: DatetimeIndex(['2011-07-06', '2011-08-06'], dtype='datetime64[ns]', freq=None)
处理缺失值:
idx = pd.to_datetime(datestrs + [None])
print idx
print idx[2]
print pd.isnull(idx)
DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)
NaT
[False False True]
In [11]: from datetime import datetime
...: dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
...: datetime(2011, 1, 7), datetime(2011, 1, 8),
...: datetime(2011, 1, 10), datetime(2011, 1, 12)]
...: ts = pd.Series(np.random.randn(6), index=dates)
...: ts
...:
Out[11]:
2011-01-02 0.092908
2011-01-05 0.281746
2011-01-07 0.769023
2011-01-08 1.246435
2011-01-10 1.007189
2011-01-12 -1.296221
dtype: float64
In [12]: type(ts)
Out[12]: pandas.core.series.Series
In [13]: ts.index
Out[13]:
DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
'2011-01-10', '2011-01-12'],
dtype='datetime64[ns]', freq=None)
In [14]: ts.index[0]
Out[14]: Timestamp('2011-01-02 00:00:00')
In [15]: ts[::2]
Out[15]:
2011-01-02 0.092908
2011-01-07 0.769023
2011-01-10 1.007189
dtype: float64
In [16]: ts + ts[::2]
Out[16]:
2011-01-02 0.185816
2011-01-05 NaN
2011-01-07 1.538045
2011-01-08 NaN
2011-01-10 2.014379
2011-01-12 NaN
dtype: float64
In [17]: stamp = ts.index[2]
...: ts[stamp]
...:
Out[17]: 0.76902256761183874
In [18]: ts['1/10/2011']
Out[18]: 1.0071893575830049
In [19]: ts['20110110']
Out[19]: 1.0071893575830049
In [20]: ts['1/6/2011':'1/11/2011']
Out[20]:
2011-01-07 0.769023
2011-01-08 1.246435
2011-01-10 1.007189
dtype: float64
In [21]: ts.truncate(after='1/9/2011')
Out[21]:
2011-01-02 0.092908
2011-01-05 0.281746
2011-01-07 0.769023
2011-01-08 1.246435
dtype: float64
In [22]: ts[datetime(2011, 1, 7):]
Out[22]:
2011-01-07 0.769023
2011-01-08 1.246435
2011-01-10 1.007189
2011-01-12 -1.296221
dtype: float64
In [23]: longer_ts = pd.Series(np.random.randn(1000),
...: index=pd.date_range('1/1/2000', periods=1000))
In [24]: longer_ts
Out[24]:
2000-01-01 0.274992
...
2002-09-25 0.884111
2002-09-26 -0.608506
Freq: D, dtype: float64
In [25]: longer_ts['2001']
Out[25]:
2001-01-01 -1.308228
...
2001-12-31 -0.502678
Freq: D, dtype: float64
In [26]: longer_ts['2001-05']
Out[26]:
2001-05-01 1.489410
...
2001-05-31 -0.241235
Freq: D, dtype: float64
In [27]: dates = pd.date_range('1/1/2000', periods=100, freq='W-WED')
...: long_df = pd.DataFrame(np.random.randn(100, 4),
...: index=dates,
...: columns=['Colorado', 'Texas',
...: 'New York', 'Ohio'])
...: long_df.loc['5-2001']
...:
Out[27]:
Colorado Texas New York Ohio
2001-05-02 0.927335 1.513906 0.538600 1.273768
2001-05-09 0.667876 -0.969206 1.676091 -0.817649
2001-05-16 0.050188 1.951312 3.260383 0.963301
2001-05-23 1.201206 -1.852001 2.406778 0.841176
2001-05-30 -0.749181 -2.989741 -1.295289 -1.690195