时间序列分析(python)

1.时间序列的生成

1.1固定周期(period)&时间间隔(interval)
1.2时间戳(timestamp)&时间区间(freq)
时间序列分析图例:


时间序列分析(python)_第1张图片
图例

1.1固定周期(period)&时间间隔(interval)

import pandas as pd
import numpy as np
##TIMES #2016 Jul 1 7/1/2016 1/7/2016 2016-07-01 2016/07/01
##生成时间序列,periods代表周期,
##freq代表频率(默认是D):D代表天,M代表月,H代表小时
rng = pd.date_range('2016-07-01', periods = 10, freq = '3D')
rng
###结果
DatetimeIndex(['2016-07-01', '2016-07-04', '2016-07-07', '2016-07-10',
               '2016-07-13', '2016-07-16', '2016-07-19', '2016-07-22',
               '2016-07-25', '2016-07-28'],
              dtype='datetime64[ns]', freq='3D')
时间序列分析(python)_第2张图片
freq参数
##用2016,1,1-2016,1,20为标签生成一组时间序列
time=pd.Series(np.random.randn(20),
           index=pd.date_range(dt.datetime(2016,1,1),periods=20))
print(time)
###结果
2016-01-01   -0.129379
2016-01-02    0.164480
2016-01-03   -0.639117
2016-01-04   -0.427224
2016-01-05    2.055133
2016-01-06    1.116075
2016-01-07    0.357426
2016-01-08    0.274249
2016-01-09    0.834405
2016-01-10   -0.005444
2016-01-11   -0.134409
2016-01-12    0.249318
2016-01-13   -0.297842
2016-01-14   -0.128514
2016-01-15    0.063690
2016-01-16   -2.246031
2016-01-17    0.359552
2016-01-18    0.383030
2016-01-19    0.402717
2016-01-20   -0.694068
Freq: D, dtype: float64
##截断,取2016-1-10之后的
time.truncate(before='2016-1-10之后的')
###结果
2016-01-10   -0.005444
2016-01-11   -0.134409
2016-01-12    0.249318
2016-01-13   -0.297842
2016-01-14   -0.128514
2016-01-15    0.063690
2016-01-16   -2.246031
2016-01-17    0.359552
2016-01-18    0.383030
2016-01-19    0.402717
2016-01-20   -0.694068
Freq: D, dtype: float64
##截断,取2016-1-10之前的
time.truncate(after='2016-1-10')
###结果:
2016-01-01   -0.129379
2016-01-02    0.164480
2016-01-03   -0.639117
2016-01-04   -0.427224
2016-01-05    2.055133
2016-01-06    1.116075
2016-01-07    0.357426
2016-01-08    0.274249
2016-01-09    0.834405
2016-01-10   -0.005444
Freq: D, dtype: float64
##取出某个日期的数据
print(time['2016-01-15'])
##结果0.063690487247
#切片
print(time['2016-01-15':'2016-01-20'])
###结果
2016-01-15    0.063690
2016-01-16   -2.246031
2016-01-17    0.359552
2016-01-18    0.383030
2016-01-19    0.402717
2016-01-20   -0.694068
Freq: D, dtype: float64
##设定起始时间和终止时间生成时间序列
data=pd.date_range('2010-01-01','2011-01-01',freq='M')
print(data)
###结果
DatetimeIndex(['2010-01-31', '2010-02-28', '2010-03-31', '2010-04-30',
               '2010-05-31', '2010-06-30', '2010-07-31', '2010-08-31',
               '2010-09-30', '2010-10-31', '2010-11-30', '2010-12-31'],
              dtype='datetime64[ns]', freq='M')

1.2时间戳(timestamp)&时间区间

#时间戳
pd.Timestamp('2016-07-10')
##结果:Timestamp('2016-07-10 00:00:00')

# 可以指定更多细节
pd.Timestamp('2016-07-10 10')
##结果:Timestamp('2016-07-10 10:00:00')

pd.Timestamp('2016-07-10 10:15')
##结果:Timestamp('2016-07-10 10:15:00')


#时间区间函数
pd.Period('2016-01')
##按月的固定区间,结果:Period('2016-01', 'M')  
pd.Period('2016-01-01')
##按天的固定区间,结果:Period('2016-01-01', 'D')


#TIME OFFSETS时间偏移量
pd.Timedelta('1 day')
##结果:Timedelta('1 days 00:00:00')

#指定区间偏移一天
pd.Period('2016-01-01 10:10') + pd.Timedelta('1 day')
##结果:Period('2016-01-02 10:10', 'T')
##指定时间戳偏移一天
pd.Timestamp('2016-01-01 10:10') + pd.Timedelta('1 day')
##结果:Timestamp('2016-01-02 10:10:00')
#指定时间戳偏移15ns
pd.Timestamp('2016-01-01 10:10') + pd.Timedelta('15 ns')
##结果:Timestamp('2016-01-01 10:10:00.000000015')

#时间间隔为25H小时,区间为10生成时间序列的两种方式
p1 = pd.period_range('2016-01-01 10:10', freq = '25H', periods = 10)
p2 = pd.period_range('2016-01-01 10:10', freq = '1D1H', periods = 10)
##p1,p2结果一样:
PeriodIndex(['2016-01-01 10:00', '2016-01-02 11:00', '2016-01-03 12:00',
             '2016-01-04 13:00', '2016-01-05 14:00', '2016-01-06 15:00',
             '2016-01-07 16:00', '2016-01-08 17:00', '2016-01-09 18:00',
             '2016-01-10 19:00'],
            dtype='period[25H]', freq='25H')

# 指定索引
rng = pd.date_range('2016 Jul 1', periods = 10, freq = 'D')
#用rng作为时间索引
pd.Series(range(len(rng)), index = rng)
##结果:
2016-07-01    0
2016-07-02    1
2016-07-03    2
2016-07-04    3
2016-07-05    4
2016-07-06    5
2016-07-07    6
2016-07-08    7
2016-07-09    8
2016-07-10    9
Freq: D, dtype: int32

#手动输入三个区间
periods = [pd.Period('2016-01'), pd.Period('2016-02'), pd.Period('2016-03')]
ts = pd.Series(np.random.randn(len(periods)), index = periods)
ts
##结果:
2016-01   -0.015837
2016-02   -0.923463
2016-03   -0.485212
Freq: M, dtype: float64

# 时间戳和时间周期可以转换
ts = pd.Series(range(10), pd.date_range('07-10-16 8:00', periods = 10, freq = 'H'))
ts
##结果
2016-07-10 08:00:00    0
2016-07-10 09:00:00    1
2016-07-10 10:00:00    2
2016-07-10 11:00:00    3
2016-07-10 12:00:00    4
2016-07-10 13:00:00    5
2016-07-10 14:00:00    6
2016-07-10 15:00:00    7
2016-07-10 16:00:00    8
2016-07-10 17:00:00    9
Freq: H, dtype: int32

##用to_period()可以把时间戳转化成时间周期
ts_period = ts.to_period()
ts_period
##结果:
2016-07-10 08:00    0
2016-07-10 09:00    1
2016-07-10 10:00    2
2016-07-10 11:00    3
2016-07-10 12:00    4
2016-07-10 13:00    5
2016-07-10 14:00    6
2016-07-10 15:00    7
2016-07-10 16:00    8
2016-07-10 17:00    9
Freq: H, dtype: int32

##时间周期和时间戳的区别
ts_period['2016-07-10 08:30':'2016-07-10 11:45'] 
##结果:
2016-07-10 08:00    0
2016-07-10 09:00    1
2016-07-10 10:00    2
2016-07-10 11:00    3
Freq: H, dtype: int32
##时间周期算头算尾,时间戳从下一个整点开始算
ts['2016-07-10 08:30':'2016-07-10 11:45'] 
##结果:
2016-07-10 09:00:00    1
2016-07-10 10:00:00    2
2016-07-10 11:00:00    3
Freq: H, dtype: int32

2.数据重采样与插值

2.1数据重采样 (时间数据由一个频率转换到另一个频率)
降采样
升采样
2.2 插值填充

2.1数据重采样

import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2011', periods=90, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts.head()
##结果:
2011-01-01   -1.025562
2011-01-02    0.410895
2011-01-03    0.660311
2011-01-04    0.710293
2011-01-05    0.444985
Freq: D, dtype: float64
##把天转化为月,再把每个月每天的数值累加
ts.resample('M').sum()
##结果:
2011-01-31    2.510102
2011-02-28    0.583209
2011-03-31    2.749411
Freq: M, dtype: float64
##把天转化为3天,再把每个三天的数值累加
ts.resample('3D').sum()
##结果:
2011-01-01    0.045643
2011-01-04   -2.255206
2011-01-07    0.571142
2011-01-10    0.835032
2011-01-13   -0.396766
2011-01-16   -1.156253
2011-01-19   -1.286884
2011-01-22    2.883952
2011-01-25    1.566908
2011-01-28    1.435563
2011-01-31    0.311565
2011-02-03   -2.541235
2011-02-06    0.317075
2011-02-09    1.598877
2011-02-12   -1.950509
2011-02-15    2.928312
2011-02-18   -0.733715
2011-02-21    1.674817
2011-02-24   -2.078872
2011-02-27    2.172320
2011-03-02   -2.022104
2011-03-05   -0.070356
2011-03-08    1.276671
2011-03-11   -2.835132
2011-03-14   -1.384113
2011-03-17    1.517565
2011-03-20   -0.550406
2011-03-23    0.773430
2011-03-26    2.244319
2011-03-29    2.951082
Freq: 3D, dtype: float64

##把天转化为3天,再把每个三天的数值累加并求每天的平均值
day3Ts = ts.resample('3D').mean()
day3Ts
##结果:
2011-01-01    0.015214
2011-01-04   -0.751735
2011-01-07    0.190381
2011-01-10    0.278344
2011-01-13   -0.132255
2011-01-16   -0.385418
2011-01-19   -0.428961
2011-01-22    0.961317
2011-01-25    0.522303
2011-01-28    0.478521
2011-01-31    0.103855
2011-02-03   -0.847078
2011-02-06    0.105692
2011-02-09    0.532959
2011-02-12   -0.650170
2011-02-15    0.976104
2011-02-18   -0.244572
2011-02-21    0.558272
2011-02-24   -0.692957
2011-02-27    0.724107
2011-03-02   -0.674035
2011-03-05   -0.023452
2011-03-08    0.425557
2011-03-11   -0.945044
2011-03-14   -0.461371
2011-03-17    0.505855
2011-03-20   -0.183469
2011-03-23    0.257810
2011-03-26    0.748106
2011-03-29    0.983694
Freq: 3D, dtype: float64

##升采样,把三天为单位的转化为一天为单位的,
##nan为缺失值,需要用到插值填充
print(day3Ts.resample('D').asfreq())
##结果:
2011-01-01    0.015214
2011-01-02         NaN
2011-01-03         NaN
2011-01-04   -0.751735
2011-01-05         NaN
2011-01-06         NaN
2011-01-07    0.190381
2011-01-08         NaN
2011-01-09         NaN
2011-01-10    0.278344
2011-01-11         NaN
2011-01-12         NaN
2011-01-13   -0.132255
2011-01-14         NaN
2011-01-15         NaN
2011-01-16   -0.385418
2011-01-17         NaN
2011-01-18         NaN
2011-01-19   -0.428961
2011-01-20         NaN
2011-01-21         NaN
2011-01-22    0.961317
2011-01-23         NaN
2011-01-24         NaN
2011-01-25    0.522303
2011-01-26         NaN
2011-01-27         NaN
2011-01-28    0.478521
2011-01-29         NaN
2011-01-30         NaN
                ...   
2011-02-28         NaN
2011-03-01         NaN
2011-03-02   -0.674035
2011-03-03         NaN
2011-03-04         NaN
2011-03-05   -0.023452
2011-03-06         NaN
2011-03-07         NaN
2011-03-08    0.425557
2011-03-09         NaN
2011-03-10         NaN
2011-03-11   -0.945044
2011-03-12         NaN
2011-03-13         NaN
2011-03-14   -0.461371
2011-03-15         NaN
2011-03-16         NaN
2011-03-17    0.505855
2011-03-18         NaN
2011-03-19         NaN
2011-03-20   -0.183469
2011-03-21         NaN
2011-03-22         NaN
2011-03-23    0.257810
2011-03-24         NaN
2011-03-25         NaN
2011-03-26    0.748106
2011-03-27         NaN
2011-03-28         NaN
2011-03-29    0.983694
Freq: D, Length: 88, dtype: float64

2.2 插值填充

插值方法:
ffill 空值取前面的值
bfill 空值取后面的值
interpolate 线性取值

##把前面的一个值填充下去
day3Ts.resample('D').ffill(1)
##结果:
2011-01-01    0.015214
2011-01-02    0.015214
2011-01-03         NaN
2011-01-04   -0.751735
2011-01-05   -0.751735
2011-01-06         NaN
2011-01-07    0.190381
2011-01-08    0.190381
2011-01-09         NaN
2011-01-10    0.278344
2011-01-11    0.278344
2011-01-12         NaN
2011-01-13   -0.132255
2011-01-14   -0.132255
2011-01-15         NaN
2011-01-16   -0.385418
2011-01-17   -0.385418
2011-01-18         NaN
2011-01-19   -0.428961
2011-01-20   -0.428961
2011-01-21         NaN
2011-01-22    0.961317
2011-01-23    0.961317
2011-01-24         NaN
2011-01-25    0.522303
2011-01-26    0.522303
2011-01-27         NaN
2011-01-28    0.478521
2011-01-29    0.478521
2011-01-30         NaN
                ...   
2011-02-28    0.724107
2011-03-01         NaN
2011-03-02   -0.674035
2011-03-03   -0.674035
2011-03-04         NaN
2011-03-05   -0.023452
2011-03-06   -0.023452
2011-03-07         NaN
2011-03-08    0.425557
2011-03-09    0.425557
2011-03-10         NaN
2011-03-11   -0.945044
2011-03-12   -0.945044
2011-03-13         NaN
2011-03-14   -0.461371
2011-03-15   -0.461371
2011-03-16         NaN
2011-03-17    0.505855
2011-03-18    0.505855
2011-03-19         NaN
2011-03-20   -0.183469
2011-03-21   -0.183469
2011-03-22         NaN
2011-03-23    0.257810
2011-03-24    0.257810
2011-03-25         NaN
2011-03-26    0.748106
2011-03-27    0.748106
2011-03-28         NaN
2011-03-29    0.983694
Freq: D, Length: 88, dtype: float64

##把后面的一个值插入到前面
day3Ts.resample('D').bfill(1)
##结果:
2011-01-01    0.015214
2011-01-02         NaN
2011-01-03   -0.751735
2011-01-04   -0.751735
2011-01-05         NaN
2011-01-06    0.190381
2011-01-07    0.190381
2011-01-08         NaN
2011-01-09    0.278344
2011-01-10    0.278344
2011-01-11         NaN
2011-01-12   -0.132255
2011-01-13   -0.132255
2011-01-14         NaN
2011-01-15   -0.385418
2011-01-16   -0.385418
2011-01-17         NaN
2011-01-18   -0.428961
2011-01-19   -0.428961
2011-01-20         NaN
2011-01-21    0.961317
2011-01-22    0.961317
2011-01-23         NaN
2011-01-24    0.522303
2011-01-25    0.522303
2011-01-26         NaN
2011-01-27    0.478521
2011-01-28    0.478521
2011-01-29         NaN
2011-01-30    0.103855
                ...   
2011-02-28         NaN
2011-03-01   -0.674035
2011-03-02   -0.674035
2011-03-03         NaN
2011-03-04   -0.023452
2011-03-05   -0.023452
2011-03-06         NaN
2011-03-07    0.425557
2011-03-08    0.425557
2011-03-09         NaN
2011-03-10   -0.945044
2011-03-11   -0.945044
2011-03-12         NaN
2011-03-13   -0.461371
2011-03-14   -0.461371
2011-03-15         NaN
2011-03-16    0.505855
2011-03-17    0.505855
2011-03-18         NaN
2011-03-19   -0.183469
2011-03-20   -0.183469
2011-03-21         NaN
2011-03-22    0.257810
2011-03-23    0.257810
2011-03-24         NaN
2011-03-25    0.748106
2011-03-26    0.748106
2011-03-27         NaN
2011-03-28    0.983694
2011-03-29    0.983694
Freq: D, Length: 88, dtype: float64

##线性插值,拟合一条线,
##例如:
##这里会把2011-01-01和2011-01-04连一条直线,来得到2011-01-02和2011-01-03的值
day3Ts.resample('D').interpolate('linear')
##结果:
2011-01-01    0.015214
2011-01-02   -0.240435
2011-01-03   -0.496085
2011-01-04   -0.751735
2011-01-05   -0.437697
2011-01-06   -0.123658
2011-01-07    0.190381
2011-01-08    0.219702
2011-01-09    0.249023
2011-01-10    0.278344
2011-01-11    0.141478
2011-01-12    0.004611
2011-01-13   -0.132255
2011-01-14   -0.216643
2011-01-15   -0.301030
2011-01-16   -0.385418
2011-01-17   -0.399932
2011-01-18   -0.414447
2011-01-19   -0.428961
2011-01-20    0.034465
2011-01-21    0.497891
2011-01-22    0.961317
2011-01-23    0.814979
2011-01-24    0.668641
2011-01-25    0.522303
2011-01-26    0.507709
2011-01-27    0.493115
2011-01-28    0.478521
2011-01-29    0.353632
2011-01-30    0.228744
                ...   
2011-02-28    0.258060
2011-03-01   -0.207988
2011-03-02   -0.674035
2011-03-03   -0.457174
2011-03-04   -0.240313
2011-03-05   -0.023452
2011-03-06    0.126218
2011-03-07    0.275887
2011-03-08    0.425557
2011-03-09   -0.031310
2011-03-10   -0.488177
2011-03-11   -0.945044
2011-03-12   -0.783820
2011-03-13   -0.622595
2011-03-14   -0.461371
2011-03-15   -0.138962
2011-03-16    0.183446
2011-03-17    0.505855
2011-03-18    0.276080
2011-03-19    0.046306
2011-03-20   -0.183469
2011-03-21   -0.036376
2011-03-22    0.110717
2011-03-23    0.257810
2011-03-24    0.421242
2011-03-25    0.584674
2011-03-26    0.748106
2011-03-27    0.826636
2011-03-28    0.905165
2011-03-29    0.983694
Freq: D, Length: 88, dtype: float64

滑动窗口

%matplotlib inline 
import matplotlib.pylab
import numpy as np
import pandas as pd
#先随机生成一组时间序列
df = pd.Series(np.random.randn(600), index = pd.date_range('7/1/2016', freq = 'D', periods = 600))
##查看一下数据前五行
df.head()
##结果:
2016-07-01   -0.192140
2016-07-02    0.357953
2016-07-03   -0.201847
2016-07-04   -0.372230
2016-07-05    1.414753
Freq: D, dtype: float64

##滑动窗口,类似股票的均线,例如五日均线,把近五天的值的平均值作为当前天的值,
##这样处理线会更加平缓,单独的值也更加有代表性,在我们这里取10天
r = df.rolling(window = 10)
#可以取中位数,方差,标准差等r.max, r.median, r.std, r.skew, r.sum, r.var 
print(r.mean())
##结果:
2016-07-01         NaN
2016-07-02         NaN
2016-07-03         NaN
2016-07-04         NaN
2016-07-05         NaN
2016-07-06         NaN
2016-07-07         NaN
2016-07-08         NaN
2016-07-09         NaN
2016-07-10    0.300133
2016-07-11    0.284780
2016-07-12    0.252831
2016-07-13    0.220699
2016-07-14    0.167137
2016-07-15    0.018593
2016-07-16   -0.061414
2016-07-17   -0.134593
2016-07-18   -0.153333
2016-07-19   -0.218928
2016-07-20   -0.169426
2016-07-21   -0.219747
2016-07-22   -0.181266
2016-07-23   -0.173674
2016-07-24   -0.130629
2016-07-25   -0.166730
2016-07-26   -0.233044
2016-07-27   -0.256642
2016-07-28   -0.280738
2016-07-29   -0.289893
2016-07-30   -0.379625
                ...   
2018-01-22   -0.211467
2018-01-23    0.034996
2018-01-24   -0.105910
2018-01-25   -0.145774
2018-01-26   -0.089320
2018-01-27   -0.164370
2018-01-28   -0.110892
2018-01-29   -0.205786
2018-01-30   -0.101162
2018-01-31   -0.034760
2018-02-01    0.229333
2018-02-02    0.043741
2018-02-03    0.052837
2018-02-04    0.057746
2018-02-05   -0.071401
2018-02-06   -0.011153
2018-02-07   -0.045737
2018-02-08   -0.021983
2018-02-09   -0.196715
2018-02-10   -0.063721
2018-02-11   -0.289452
2018-02-12   -0.050946
2018-02-13   -0.047014
2018-02-14    0.048754
2018-02-15    0.143949
2018-02-16    0.424823
2018-02-17    0.361878
2018-02-18    0.363235
2018-02-19    0.517436
2018-02-20    0.368020
Freq: D, Length: 600, dtype: float64
##画出结果
import matplotlib.pyplot as plt
%matplotlib inline

plt.figure(figsize=(15, 5))
##原先用红色虚线
df.plot(style='r--')
##求玩均值用蓝色折线
df.rolling(window=10).mean().plot(style='b')
时间序列分析(python)_第3张图片
图例

你可能感兴趣的:(时间序列分析(python))