pandas时间序列

《Python for Data Analysis》

日期和时间数据类型及工具

Python datatime模块

In [2]: from datetime import datetime

In [3]: now = datetime.now()

In [4]: now
Out[4]: datetime.datetime(2017, 5, 25, 13, 55, 30, 39000)

1: 利用datetime的strftime和strptime方法转换字符串和日期。

In [5]: stamp = datetime(2011, 1, 3)

In [6]: str(stamp)
Out[6]: '2011-01-03 00:00:00'

In [7]: stamp.strftime('%Y-%m-%d')
Out[7]: '2011-01-03'

In [10]: datetime.strptime('2011-01-03', '%Y-%m-%d')
Out[10]: datetime.datetime(2011, 1, 3, 0, 0)

In [11]: datestrs = ['7/6/2011', '8/6/2011']

In [12]: [datetime.strptime(x,'%m/%d/%Y') for x in datestrs]
Out[12]: [datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

第三方日期解析

2: dateutil可以解析几乎所有人类能够理解的日期表示形式(中文不行)。

In [13]: from dateutil.parser import parse

In [14]: parse('2011-01-03')
Out[14]: datetime.datetime(2011, 1, 3, 0, 0)

# 在国际通用格式,日通常在月的前面
In [15]: parse('6/12/2011', dayfirst=True)
Out[15]: datetime.datetime(2011, 12, 6, 0, 0)

In [16]: parse('6/12/2011')
Out[16]: datetime.datetime(2011, 6, 12, 0, 0)

In [17]: parse('Jan 31, 1998 10:45 PM')
Out[17]: datetime.datetime(1998, 1, 31, 22, 45)

pandas日期解析

3: pandas.to_datetime()方法,通常用于处理++成组日期++,

In [22]: datestrs = ['7/6/2011', '8/6/2011']

In [23]: pd.to_datetime(datestrs)
Out[23]: DatetimeIndex(['2011-07-06', '2011-08-06'], dtype='datetime64[ns]', freq=None)

处理缺失值:

idx = pd.to_datetime(datestrs + [None])
print idx
print idx[2]
print pd.isnull(idx)

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)
NaT
[False False  True]

时间序列基础

以时间戳为索引的Series

In [11]: from datetime import datetime
    ...: dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
    ...:          datetime(2011, 1, 7), datetime(2011, 1, 8),
    ...:          datetime(2011, 1, 10), datetime(2011, 1, 12)]
    ...: ts = pd.Series(np.random.randn(6), index=dates)
    ...: ts
    ...:
Out[11]:
2011-01-02    0.092908
2011-01-05    0.281746
2011-01-07    0.769023
2011-01-08    1.246435
2011-01-10    1.007189
2011-01-12   -1.296221
dtype: float64

In [12]: type(ts)
Out[12]: pandas.core.series.Series

In [13]: ts.index
Out[13]:
DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [14]: ts.index[0]
Out[14]: Timestamp('2011-01-02 00:00:00')
In [15]: ts[::2]
Out[15]:
2011-01-02    0.092908
2011-01-07    0.769023
2011-01-10    1.007189
dtype: float64

In [16]: ts + ts[::2]
Out[16]:
2011-01-02    0.185816
2011-01-05         NaN
2011-01-07    1.538045
2011-01-08         NaN
2011-01-10    2.014379
2011-01-12         NaN
dtype: float64

索引、选取、子集构造

In [17]: stamp = ts.index[2]
    ...: ts[stamp]
    ...:
Out[17]: 0.76902256761183874

In [18]: ts['1/10/2011']
Out[18]: 1.0071893575830049

In [19]: ts['20110110']
Out[19]: 1.0071893575830049
In [20]: ts['1/6/2011':'1/11/2011']
Out[20]:
2011-01-07    0.769023
2011-01-08    1.246435
2011-01-10    1.007189
dtype: float64

In [21]: ts.truncate(after='1/9/2011')
Out[21]:
2011-01-02    0.092908
2011-01-05    0.281746
2011-01-07    0.769023
2011-01-08    1.246435
dtype: float64

In [22]: ts[datetime(2011, 1, 7):]
Out[22]:
2011-01-07    0.769023
2011-01-08    1.246435
2011-01-10    1.007189
2011-01-12   -1.296221
dtype: float64
In [23]: longer_ts = pd.Series(np.random.randn(1000),
    ...:                       index=pd.date_range('1/1/2000', periods=1000))

In [24]: longer_ts
Out[24]:
2000-01-01    0.274992
                ...
2002-09-25    0.884111
2002-09-26   -0.608506
Freq: D, dtype: float64

In [25]: longer_ts['2001']
Out[25]:
2001-01-01   -1.308228
                ...
2001-12-31   -0.502678
Freq: D, dtype: float64

In [26]: longer_ts['2001-05']
Out[26]:
2001-05-01    1.489410
                ...
2001-05-31   -0.241235
Freq: D, dtype: float64
In [27]: dates = pd.date_range('1/1/2000', periods=100, freq='W-WED')
    ...: long_df = pd.DataFrame(np.random.randn(100, 4),
    ...:                        index=dates,
    ...:                        columns=['Colorado', 'Texas',
    ...:                                 'New York', 'Ohio'])
    ...: long_df.loc['5-2001']
    ...:
Out[27]:
            Colorado     Texas  New York      Ohio
2001-05-02  0.927335  1.513906  0.538600  1.273768
2001-05-09  0.667876 -0.969206  1.676091 -0.817649
2001-05-16  0.050188  1.951312  3.260383  0.963301
2001-05-23  1.201206 -1.852001  2.406778  0.841176
2001-05-30 -0.749181 -2.989741 -1.295289 -1.690195

你可能感兴趣的:(pandas,数据分析)