33. Pandas计算同比环比指标的3种方法

Pandas计算同比环比指标的3种方法

同比和环比:环比和同比用于描述统计数据的变化情况

  • 环比:表示本次统计段与相连的上次统计段之间的比较。
    • 比如2010年中国第一季度GDP为G2010Q1亿元,第二季度GDP为G2010Q2亿元,则第二季度GDP环比增长(G2010Q2-G2010Q1)/G2010Q1;
  • 同比:即同期相比,表示某个特定统计段今年与去年之间的比较。
    • 比如2009年中国第一季度GDP为G2009Q1亿元,则2010年第一季度的GDP同比增长为(G2010Q1-G2009Q1)/G2009Q1。

演示步骤:
0. 读取连续3年的天气数据

  1. 方法1:pandas.Series.pct_change
  2. 方法2:pandas.Series.shift
  3. 方法3:pandas.Series.diff

pct_change、shift、diff,都实现了跨越多行的数据计算

0. 读取连续3年的天气数据

import pandas as pd
%matplotlib inline
fpath = "./datas/beijing_tianqi/beijing_tianqi_2017-2019.csv"
df = pd.read_csv(fpath, index_col="ymd", parse_dates=True)
df.head(3)
bWendu yWendu tianqi fengxiang fengli aqi aqiInfo aqiLevel
ymd
2017-01-01 5℃ -3℃ 霾~晴 南风 1-2级 450 严重污染 6
2017-01-02 7℃ -6℃ 晴~霾 南风 1-2级 246 重度污染 5
2017-01-03 5℃ -5℃ 南风 1-2级 320 严重污染 6
# 替换掉温度的后缀℃
df["bWendu"] = df["bWendu"].str.replace("℃", "").astype('int32')
df.head(3)
bWendu yWendu tianqi fengxiang fengli aqi aqiInfo aqiLevel
ymd
2017-01-01 5 -3℃ 霾~晴 南风 1-2级 450 严重污染 6
2017-01-02 7 -6℃ 晴~霾 南风 1-2级 246 重度污染 5
2017-01-03 5 -5℃ 南风 1-2级 320 严重污染 6
# 新的df,为每个月的平均最高温
df = df[["bWendu"]].resample("M").mean()
# 将索引按照日期升序排列
df.sort_index(ascending=True, inplace=True)
df.head()
bWendu
ymd
2017-01-31 3.322581
2017-02-28 7.642857
2017-03-31 14.129032
2017-04-30 23.700000
2017-05-31 29.774194
df.index
DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30',
               '2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31',
               '2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31',
               '2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
               '2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31',
               '2018-09-30', '2018-10-31', '2018-11-30', '2018-12-31',
               '2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
               '2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31',
               '2019-09-30', '2019-10-31', '2019-11-30', '2019-12-31'],
              dtype='datetime64[ns]', name='ymd', freq='M')
df.plot()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NY4NNjz6-1611064082104)(output_11_1.png)]

方法1:pandas.Series.pct_change

pct_change方法直接算好了"(新-旧)/旧"的百分比

官方文档地址:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.pct_change.html

df["bWendu_way1_huanbi"] = df["bWendu"].pct_change(periods=1)
df["bWendu_way1_tongbi"] = df["bWendu"].pct_change(periods=12)
df.head(15)
bWendu bWendu_way1_huanbi bWendu_way1_tongbi
ymd
2017-01-31 3.322581 NaN NaN
2017-02-28 7.642857 1.300277 NaN
2017-03-31 14.129032 0.848658 NaN
2017-04-30 23.700000 0.677397 NaN
2017-05-31 29.774194 0.256295 NaN
2017-06-30 30.966667 0.040051 NaN
2017-07-31 31.612903 0.020869 NaN
2017-08-31 30.129032 -0.046939 NaN
2017-09-30 27.866667 -0.075089 NaN
2017-10-31 17.225806 -0.381849 NaN
2017-11-30 9.566667 -0.444632 NaN
2017-12-31 4.483871 -0.531303 NaN
2018-01-31 1.322581 -0.705036 -0.601942
2018-02-28 4.892857 2.699477 -0.359813
2018-03-31 14.129032 1.887685 0.000000

方法2:pandas.Series.shift

shift用于移动数据,但是保持索引不变

官方文档地址:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.shift.html

# 见识一下shift做了什么事情
# 使用pd.concat合并Series列表变成一个大的df
pd.concat(
    [df["bWendu"], 
     df["bWendu"].shift(periods=1), 
     df["bWendu"].shift(periods=12)],
    axis=1
).head(15)
bWendu bWendu bWendu
ymd
2017-01-31 3.322581 NaN NaN
2017-02-28 7.642857 3.322581 NaN
2017-03-31 14.129032 7.642857 NaN
2017-04-30 23.700000 14.129032 NaN
2017-05-31 29.774194 23.700000 NaN
2017-06-30 30.966667 29.774194 NaN
2017-07-31 31.612903 30.966667 NaN
2017-08-31 30.129032 31.612903 NaN
2017-09-30 27.866667 30.129032 NaN
2017-10-31 17.225806 27.866667 NaN
2017-11-30 9.566667 17.225806 NaN
2017-12-31 4.483871 9.566667 NaN
2018-01-31 1.322581 4.483871 3.322581
2018-02-28 4.892857 1.322581 7.642857
2018-03-31 14.129032 4.892857 14.129032
# 环比
series_shift1 = df["bWendu"].shift(periods=1)
df["bWendu_way2_huanbi"] = (df["bWendu"]-series_shift1)/series_shift1

# 同比
series_shift2 = df["bWendu"].shift(periods=12)
df["bWendu_way2_tongbi"] = (df["bWendu"]-series_shift2)/series_shift2
df.head(15)
bWendu bWendu_way1_huanbi bWendu_way1_tongbi bWendu_way2_huanbi bWendu_way2_tongbi
ymd
2017-01-31 3.322581 NaN NaN NaN NaN
2017-02-28 7.642857 1.300277 NaN 1.300277 NaN
2017-03-31 14.129032 0.848658 NaN 0.848658 NaN
2017-04-30 23.700000 0.677397 NaN 0.677397 NaN
2017-05-31 29.774194 0.256295 NaN 0.256295 NaN
2017-06-30 30.966667 0.040051 NaN 0.040051 NaN
2017-07-31 31.612903 0.020869 NaN 0.020869 NaN
2017-08-31 30.129032 -0.046939 NaN -0.046939 NaN
2017-09-30 27.866667 -0.075089 NaN -0.075089 NaN
2017-10-31 17.225806 -0.381849 NaN -0.381849 NaN
2017-11-30 9.566667 -0.444632 NaN -0.444632 NaN
2017-12-31 4.483871 -0.531303 NaN -0.531303 NaN
2018-01-31 1.322581 -0.705036 -0.601942 -0.705036 -0.601942
2018-02-28 4.892857 2.699477 -0.359813 2.699477 -0.359813
2018-03-31 14.129032 1.887685 0.000000 1.887685 0.000000

方法3. pandas.Series.diff

pandas.Series.diff用于新值减去旧值

官方文档:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.diff.html

pd.concat(
    [df["bWendu"], 
     df["bWendu"].diff(periods=1), 
     df["bWendu"].diff(periods=12)],
    axis=1
).head(15)
bWendu bWendu bWendu
ymd
2017-01-31 3.322581 NaN NaN
2017-02-28 7.642857 4.320276 NaN
2017-03-31 14.129032 6.486175 NaN
2017-04-30 23.700000 9.570968 NaN
2017-05-31 29.774194 6.074194 NaN
2017-06-30 30.966667 1.192473 NaN
2017-07-31 31.612903 0.646237 NaN
2017-08-31 30.129032 -1.483871 NaN
2017-09-30 27.866667 -2.262366 NaN
2017-10-31 17.225806 -10.640860 NaN
2017-11-30 9.566667 -7.659140 NaN
2017-12-31 4.483871 -5.082796 NaN
2018-01-31 1.322581 -3.161290 -2.00
2018-02-28 4.892857 3.570276 -2.75
2018-03-31 14.129032 9.236175 0.00
# 环比
series_diff1 = df["bWendu"].diff(periods=1)
df["bWendu_way3_huanbi"] = series_diff1/(df["bWendu"]-series_diff1)

# 同比
series_diff2 = df["bWendu"].diff(periods=12)
df["bWendu_way3_tongbi"] = series_diff2/(df["bWendu"]-series_diff2)
df.head(15)
bWendu bWendu_way1_huanbi bWendu_way1_tongbi bWendu_way2_huanbi bWendu_way2_tongbi bWendu_way3_huanbi bWendu_way3_tongbi
ymd
2017-01-31 3.322581 NaN NaN NaN NaN NaN NaN
2017-02-28 7.642857 1.300277 NaN 1.300277 NaN 1.300277 NaN
2017-03-31 14.129032 0.848658 NaN 0.848658 NaN 0.848658 NaN
2017-04-30 23.700000 0.677397 NaN 0.677397 NaN 0.677397 NaN
2017-05-31 29.774194 0.256295 NaN 0.256295 NaN 0.256295 NaN
2017-06-30 30.966667 0.040051 NaN 0.040051 NaN 0.040051 NaN
2017-07-31 31.612903 0.020869 NaN 0.020869 NaN 0.020869 NaN
2017-08-31 30.129032 -0.046939 NaN -0.046939 NaN -0.046939 NaN
2017-09-30 27.866667 -0.075089 NaN -0.075089 NaN -0.075089 NaN
2017-10-31 17.225806 -0.381849 NaN -0.381849 NaN -0.381849 NaN
2017-11-30 9.566667 -0.444632 NaN -0.444632 NaN -0.444632 NaN
2017-12-31 4.483871 -0.531303 NaN -0.531303 NaN -0.531303 NaN
2018-01-31 1.322581 -0.705036 -0.601942 -0.705036 -0.601942 -0.705036 -0.601942
2018-02-28 4.892857 2.699477 -0.359813 2.699477 -0.359813 2.699477 -0.359813
2018-03-31 14.129032 1.887685 0.000000 1.887685 0.000000 1.887685 0.000000

你可能感兴趣的:(pandas,数据分析,数据挖掘,人工智能,大数据)