演示步骤:
0. 读取连续3年的天气数据
pct_change、shift、diff,都实现了跨越多行的数据计算
import pandas as pd
%matplotlib inline
fpath = "./datas/beijing_tianqi/beijing_tianqi_2017-2019.csv"
df = pd.read_csv(fpath, index_col="ymd", parse_dates=True)
df.head(3)
bWendu | yWendu | tianqi | fengxiang | fengli | aqi | aqiInfo | aqiLevel | |
---|---|---|---|---|---|---|---|---|
ymd | ||||||||
2017-01-01 | 5℃ | -3℃ | 霾~晴 | 南风 | 1-2级 | 450 | 严重污染 | 6 |
2017-01-02 | 7℃ | -6℃ | 晴~霾 | 南风 | 1-2级 | 246 | 重度污染 | 5 |
2017-01-03 | 5℃ | -5℃ | 霾 | 南风 | 1-2级 | 320 | 严重污染 | 6 |
# 替换掉温度的后缀℃
df["bWendu"] = df["bWendu"].str.replace("℃", "").astype('int32')
df.head(3)
bWendu | yWendu | tianqi | fengxiang | fengli | aqi | aqiInfo | aqiLevel | |
---|---|---|---|---|---|---|---|---|
ymd | ||||||||
2017-01-01 | 5 | -3℃ | 霾~晴 | 南风 | 1-2级 | 450 | 严重污染 | 6 |
2017-01-02 | 7 | -6℃ | 晴~霾 | 南风 | 1-2级 | 246 | 重度污染 | 5 |
2017-01-03 | 5 | -5℃ | 霾 | 南风 | 1-2级 | 320 | 严重污染 | 6 |
# 新的df,为每个月的平均最高温
df = df[["bWendu"]].resample("M").mean()
# 将索引按照日期升序排列
df.sort_index(ascending=True, inplace=True)
df.head()
bWendu | |
---|---|
ymd | |
2017-01-31 | 3.322581 |
2017-02-28 | 7.642857 |
2017-03-31 | 14.129032 |
2017-04-30 | 23.700000 |
2017-05-31 | 29.774194 |
df.index
DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30',
'2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31',
'2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31',
'2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
'2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31',
'2018-09-30', '2018-10-31', '2018-11-30', '2018-12-31',
'2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
'2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31',
'2019-09-30', '2019-10-31', '2019-11-30', '2019-12-31'],
dtype='datetime64[ns]', name='ymd', freq='M')
df.plot()
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NY4NNjz6-1611064082104)(output_11_1.png)]
pct_change方法直接算好了"(新-旧)/旧"的百分比
官方文档地址:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.pct_change.html
df["bWendu_way1_huanbi"] = df["bWendu"].pct_change(periods=1)
df["bWendu_way1_tongbi"] = df["bWendu"].pct_change(periods=12)
df.head(15)
bWendu | bWendu_way1_huanbi | bWendu_way1_tongbi | |
---|---|---|---|
ymd | |||
2017-01-31 | 3.322581 | NaN | NaN |
2017-02-28 | 7.642857 | 1.300277 | NaN |
2017-03-31 | 14.129032 | 0.848658 | NaN |
2017-04-30 | 23.700000 | 0.677397 | NaN |
2017-05-31 | 29.774194 | 0.256295 | NaN |
2017-06-30 | 30.966667 | 0.040051 | NaN |
2017-07-31 | 31.612903 | 0.020869 | NaN |
2017-08-31 | 30.129032 | -0.046939 | NaN |
2017-09-30 | 27.866667 | -0.075089 | NaN |
2017-10-31 | 17.225806 | -0.381849 | NaN |
2017-11-30 | 9.566667 | -0.444632 | NaN |
2017-12-31 | 4.483871 | -0.531303 | NaN |
2018-01-31 | 1.322581 | -0.705036 | -0.601942 |
2018-02-28 | 4.892857 | 2.699477 | -0.359813 |
2018-03-31 | 14.129032 | 1.887685 | 0.000000 |
shift用于移动数据,但是保持索引不变
官方文档地址:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.shift.html
# 见识一下shift做了什么事情
# 使用pd.concat合并Series列表变成一个大的df
pd.concat(
[df["bWendu"],
df["bWendu"].shift(periods=1),
df["bWendu"].shift(periods=12)],
axis=1
).head(15)
bWendu | bWendu | bWendu | |
---|---|---|---|
ymd | |||
2017-01-31 | 3.322581 | NaN | NaN |
2017-02-28 | 7.642857 | 3.322581 | NaN |
2017-03-31 | 14.129032 | 7.642857 | NaN |
2017-04-30 | 23.700000 | 14.129032 | NaN |
2017-05-31 | 29.774194 | 23.700000 | NaN |
2017-06-30 | 30.966667 | 29.774194 | NaN |
2017-07-31 | 31.612903 | 30.966667 | NaN |
2017-08-31 | 30.129032 | 31.612903 | NaN |
2017-09-30 | 27.866667 | 30.129032 | NaN |
2017-10-31 | 17.225806 | 27.866667 | NaN |
2017-11-30 | 9.566667 | 17.225806 | NaN |
2017-12-31 | 4.483871 | 9.566667 | NaN |
2018-01-31 | 1.322581 | 4.483871 | 3.322581 |
2018-02-28 | 4.892857 | 1.322581 | 7.642857 |
2018-03-31 | 14.129032 | 4.892857 | 14.129032 |
# 环比
series_shift1 = df["bWendu"].shift(periods=1)
df["bWendu_way2_huanbi"] = (df["bWendu"]-series_shift1)/series_shift1
# 同比
series_shift2 = df["bWendu"].shift(periods=12)
df["bWendu_way2_tongbi"] = (df["bWendu"]-series_shift2)/series_shift2
df.head(15)
bWendu | bWendu_way1_huanbi | bWendu_way1_tongbi | bWendu_way2_huanbi | bWendu_way2_tongbi | |
---|---|---|---|---|---|
ymd | |||||
2017-01-31 | 3.322581 | NaN | NaN | NaN | NaN |
2017-02-28 | 7.642857 | 1.300277 | NaN | 1.300277 | NaN |
2017-03-31 | 14.129032 | 0.848658 | NaN | 0.848658 | NaN |
2017-04-30 | 23.700000 | 0.677397 | NaN | 0.677397 | NaN |
2017-05-31 | 29.774194 | 0.256295 | NaN | 0.256295 | NaN |
2017-06-30 | 30.966667 | 0.040051 | NaN | 0.040051 | NaN |
2017-07-31 | 31.612903 | 0.020869 | NaN | 0.020869 | NaN |
2017-08-31 | 30.129032 | -0.046939 | NaN | -0.046939 | NaN |
2017-09-30 | 27.866667 | -0.075089 | NaN | -0.075089 | NaN |
2017-10-31 | 17.225806 | -0.381849 | NaN | -0.381849 | NaN |
2017-11-30 | 9.566667 | -0.444632 | NaN | -0.444632 | NaN |
2017-12-31 | 4.483871 | -0.531303 | NaN | -0.531303 | NaN |
2018-01-31 | 1.322581 | -0.705036 | -0.601942 | -0.705036 | -0.601942 |
2018-02-28 | 4.892857 | 2.699477 | -0.359813 | 2.699477 | -0.359813 |
2018-03-31 | 14.129032 | 1.887685 | 0.000000 | 1.887685 | 0.000000 |
pandas.Series.diff用于新值减去旧值
官方文档:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.diff.html
pd.concat(
[df["bWendu"],
df["bWendu"].diff(periods=1),
df["bWendu"].diff(periods=12)],
axis=1
).head(15)
bWendu | bWendu | bWendu | |
---|---|---|---|
ymd | |||
2017-01-31 | 3.322581 | NaN | NaN |
2017-02-28 | 7.642857 | 4.320276 | NaN |
2017-03-31 | 14.129032 | 6.486175 | NaN |
2017-04-30 | 23.700000 | 9.570968 | NaN |
2017-05-31 | 29.774194 | 6.074194 | NaN |
2017-06-30 | 30.966667 | 1.192473 | NaN |
2017-07-31 | 31.612903 | 0.646237 | NaN |
2017-08-31 | 30.129032 | -1.483871 | NaN |
2017-09-30 | 27.866667 | -2.262366 | NaN |
2017-10-31 | 17.225806 | -10.640860 | NaN |
2017-11-30 | 9.566667 | -7.659140 | NaN |
2017-12-31 | 4.483871 | -5.082796 | NaN |
2018-01-31 | 1.322581 | -3.161290 | -2.00 |
2018-02-28 | 4.892857 | 3.570276 | -2.75 |
2018-03-31 | 14.129032 | 9.236175 | 0.00 |
# 环比
series_diff1 = df["bWendu"].diff(periods=1)
df["bWendu_way3_huanbi"] = series_diff1/(df["bWendu"]-series_diff1)
# 同比
series_diff2 = df["bWendu"].diff(periods=12)
df["bWendu_way3_tongbi"] = series_diff2/(df["bWendu"]-series_diff2)
df.head(15)
bWendu | bWendu_way1_huanbi | bWendu_way1_tongbi | bWendu_way2_huanbi | bWendu_way2_tongbi | bWendu_way3_huanbi | bWendu_way3_tongbi | |
---|---|---|---|---|---|---|---|
ymd | |||||||
2017-01-31 | 3.322581 | NaN | NaN | NaN | NaN | NaN | NaN |
2017-02-28 | 7.642857 | 1.300277 | NaN | 1.300277 | NaN | 1.300277 | NaN |
2017-03-31 | 14.129032 | 0.848658 | NaN | 0.848658 | NaN | 0.848658 | NaN |
2017-04-30 | 23.700000 | 0.677397 | NaN | 0.677397 | NaN | 0.677397 | NaN |
2017-05-31 | 29.774194 | 0.256295 | NaN | 0.256295 | NaN | 0.256295 | NaN |
2017-06-30 | 30.966667 | 0.040051 | NaN | 0.040051 | NaN | 0.040051 | NaN |
2017-07-31 | 31.612903 | 0.020869 | NaN | 0.020869 | NaN | 0.020869 | NaN |
2017-08-31 | 30.129032 | -0.046939 | NaN | -0.046939 | NaN | -0.046939 | NaN |
2017-09-30 | 27.866667 | -0.075089 | NaN | -0.075089 | NaN | -0.075089 | NaN |
2017-10-31 | 17.225806 | -0.381849 | NaN | -0.381849 | NaN | -0.381849 | NaN |
2017-11-30 | 9.566667 | -0.444632 | NaN | -0.444632 | NaN | -0.444632 | NaN |
2017-12-31 | 4.483871 | -0.531303 | NaN | -0.531303 | NaN | -0.531303 | NaN |
2018-01-31 | 1.322581 | -0.705036 | -0.601942 | -0.705036 | -0.601942 | -0.705036 | -0.601942 |
2018-02-28 | 4.892857 | 2.699477 | -0.359813 | 2.699477 | -0.359813 | 2.699477 | -0.359813 |
2018-03-31 | 14.129032 | 1.887685 | 0.000000 | 1.887685 | 0.000000 | 1.887685 | 0.000000 |