数据分析学习之时间序列(2)-数据的重采样

重采样

重采样:指的是将时间序列从一个频率转化为另一个频率进行处理的过程,将高频率数据转化为低频率数据为降采样,低频率转化为高频率为升采样。

降采样

t = pd.DataFrame(np.random.uniform(10,50,(100,1)),index=pd.date_range('20170101',periods=100))
t

				0
2017-01-01	42.009320
2017-01-02	27.031279
2017-01-03	14.262344
2017-01-04	29.221443
2017-01-05	22.895785
2017-01-06	26.814856
2017-01-07	19.565748
2017-01-08	47.893685
2017-01-09	43.359727
2017-01-10	39.692903
2017-01-11	24.697132
2017-01-12	16.197977
2017-01-13	27.223503
2017-01-14	38.642974
2017-01-15	46.869408
2017-01-16	12.787360
2017-01-17	11.652354
2017-01-18	35.901554
2017-01-19	17.519836
2017-01-20	44.663240
2017-01-21	44.642161
2017-01-22	37.451723
2017-01-23	21.280267
2017-01-24	44.991936
2017-01-25	49.983729
2017-01-26	44.994922
2017-01-27	26.077919
2017-01-28	25.978752
2017-01-29	14.818770
2017-01-30	14.156555
...	...
2017-03-12	43.302700
2017-03-13	15.709008
2017-03-14	35.354453
2017-03-15	37.885999
2017-03-16	38.062864
2017-03-17	29.039956
2017-03-18	37.100101
2017-03-19	14.473501
2017-03-20	48.391104
2017-03-21	24.301725
2017-03-22	36.347639
2017-03-23	42.361770
2017-03-24	30.042126
2017-03-25	27.018687
2017-03-26	22.962364
2017-03-27	47.031464
2017-03-28	28.647002
2017-03-29	43.053664
2017-03-30	32.750043
2017-03-31	29.264535
2017-04-01	49.336224
2017-04-02	21.064076
2017-04-03	18.191110
2017-04-04	40.548393
2017-04-05	17.578473
2017-04-06	19.759165
2017-04-07	28.063757
2017-04-08	26.345850
2017-04-09	35.661071
2017-04-10	32.292340
100 rows × 1 columns

这是一个100行的时间序列的数据,并且间隔是天,现在我们把间隔变成月,这样它的频率就从高变低了,称之为降采样

# 时间间隔从天 变到月 
# 利用resample函数
t.resample('M').sum()
				0
2017-01-31	948.297346
2017-02-28	888.290936
2017-03-31	960.922047
2017-04-30	288.840458

升采样

与降采样相反的,把低频的变成高频的就是升采样

#升采样
frame = pd.DataFrame(np.random.randn(2, 4),
                    index=pd.date_range('1/1/2000', periods=2,freq='W-WED'),
                    columns=['上海', '北京', '深圳', '广州'])
frame


				上海		北京		深圳		广州
2000-01-05	1.010248	0.251598	1.131810	0.035474
2000-01-12	-0.221884	1.136224	-0.761822	0.056637

# asfreq 反转频率 就变成升采样了
frame.resample('D').asfreq()
# 没有的值会用nan填充


				上海		北京		深圳		广州
2000-01-05	1.010248	0.251598	1.131810	0.035474
2000-01-06		NaN			NaN			NaN			NaN
2000-01-07		NaN			NaN			NaN			NaN
2000-01-08		NaN			NaN			NaN			NaN
2000-01-09		NaN			NaN			NaN			NaN
2000-01-10		NaN			NaN			NaN			NaN
2000-01-11		NaN			NaN			NaN			NaN
2000-01-12	-0.221884	1.136224	-0.761822	0.056637

# 填充缺失值
frame.resample('D').ffill()# 会以上一个值填充nan
				上海		北京		深圳		广州
2000-01-05	1.010248	0.251598	1.131810	0.035474
2000-01-06	1.010248	0.251598	1.131810	0.035474
2000-01-07	1.010248	0.251598	1.131810	0.035474
2000-01-08	1.010248	0.251598	1.131810	0.035474
2000-01-09	1.010248	0.251598	1.131810	0.035474
2000-01-10	1.010248	0.251598	1.131810	0.035474
2000-01-11	1.010248	0.251598	1.131810	0.035474
2000-01-12	-0.221884	1.136224	-0.761822	0.056637

你可能感兴趣的:(数据分析学习)