使用拉格朗日插值法填充缺失值,报错‘Passing list-likes to .loc or [] with any missing labels is no longer supported

学习拉格朗日插值法时,曾经参考第一篇文章https://blog.csdn.net/playgoon2/article/details/77051285做了一段算法代码,

import pandas as pd
import matplotlib.pyplot as plt
from scipy.interpolate import lagrange
def polyinterp(data,k=5):
    df1=data.copy()
    print("原始数据(含缺失值):",'\n',data)
import pandas as pd
import matplotlib.pyplot as plt
from scipy.interpolate import lagrange
def polyinterp(data,k=5):
    df1=data.copy()
    print("原始数据(含缺失值):",'\n',data)
    for j in range(data.shape[1]):
            for i in range(len(df1)):
                if np.isnan(df1.iloc[i,j]):
                    list1=list(range(i-k,i))+list(range(i+1,i+1+k))
                    #取数索引范围,向插值前取k个,向后取k个
                    list0=[i for i in list1 if i <max(df1.index)]#去掉超过范围的索引
                    interdf = df1.iloc[list0, j]#取数
                    interdf = interdf[interdf.notnull()]#去掉缺失值
                    list_x = list(interdf. index)#对应的x
                    list_y = list(interdf. values)#对应的y
                    f = lagrange(list_x, list_y)#根据取点,构造函数关系插值
                    df1.iloc[i,j] = f(i)
    print("副本插值后:",'\n',df1)
    return(df1)
def chart_view(df01,df1):
    df1.rename(columns={
     'y': 'New y'}, inplace=True)
    df01['y'].plot(style='k--')
    df1['New y'].plot(alpha=0.5)
    plt.legend(loc='best')
    plt.show()
if __name__=='__main__':
    x=np.linspace(0,10,11)
    y=x**3+10
    data1=np.vstack((x,y))
    df0=pd.DataFrame(data1.T,columns=['x','y'])
    print(df0)
    df01=df0.copy()#建立副本
    df01.loc[2:3,"y"]=np.NaN#构造缺失值
    df1=df01.copy()
    
    new_data=polyinterp(df1,5)#插值后
    chart_view(df01,new_data)#插值前后绘图for j in range(data.shape[1]):
        for i in range(len(df1)):
            if np.isnan(df1.iloc[i,j]):
                list1=list(range(i-k,i))+list(range(i+1,i+1+k))
                #取数索引范围,向插值前取k个,向后取k个
                list0=[i for i in list1 if i <max(df1.index)]#去掉超过范围的索引
                interdf = df1.iloc[list0, j]#取数
                interdf = interdf[interdf.notnull()]#去掉缺失值
                list_x = list(interdf. index)#对应的x
                list_y = list(interdf. values)#对应的y
                f = lagrange(list_x, list_y)#根据取点,构造函数关系插值
                df1.iloc[i,j] = f(i)
  print("副本插值后:",'\n',df1)
 return(df1)
def chart_view(df01,df1):
    df1.rename(columns={
     'y': 'New y'}, inplace=True)
    df01['y'].plot(style='k--')
    df1['New y'].plot(alpha=0.5)
    plt.legend(loc='best')
    plt.show()
if __name__=='__main__':
    x=np.linspace(0,10,11)
    y=x**3+10
    data1=np.vstack((x,y))
    df0=pd.DataFrame(data1.T,columns=['x','y'])
    print(df0)
    df01=df0.copy()#建立副本
    df01.loc[2:3,"y"]=np.NaN#构造缺失值
    df1=df01.copy()
    
    new_data=polyinterp(df1,5)#插值后
    chart_view(df01,new_data)#插值前后绘图

输出结果:

       x       y
0    0.0    10.0
1    1.0    11.0
2    2.0    18.0
3    3.0    37.0
4    4.0    74.0
5    5.0   135.0
6    6.0   226.0
7    7.0   353.0
8    8.0   522.0
9    9.0   739.0
10  10.0  1010.0
原始数据(含缺失值): 
        x       y
0    0.0    10.0
1    1.0    11.0
2    2.0     NaN
3    3.0     NaN
4    4.0    74.0
5    5.0   135.0
6    6.0   226.0
7    7.0   353.0
8    8.0   522.0
9    9.0   739.0
10  10.0  1010.0
副本插值后: 
        x       y
0    0.0    10.0
1    1.0    11.0
2    2.0    18.0
3    3.0    37.0
4    4.0    74.0
5    5.0   135.0
6    6.0   226.0
7    7.0   353.0
8    8.0   522.0
9    9.0   739.0
10  10.0  1010.0

使用拉格朗日插值法填充缺失值,报错‘Passing list-likes to .loc or [] with any missing labels is no longer supported_第1张图片似乎一切没有什么问题,当我再看到另外一篇文章https://blog.csdn.net/shener_m/article/details/81706358想验证自己的代码时,却发现自己的算法结果与他人不同,于是进行算法改进。

def polyinterp(data,k=5):
    df1=data.copy()
    print("原始数据(含缺失值):",'\n',data)
    for i in range(len(df1)):
        if (df1['y'].isnull())[i]:
            #取数索引范围,向插值前取k个,向后取k个
            #index_=list(range(i-k, i)) + list(range(i+1, i+1+k))#Series索引不为负数
            #list0=[j for j in index_ if j in df1['y'].sort_index()]
            #y= df1['y'][list0]
            y= df1['y'][list(range(i-k, i)) + list(range(i+1, i+1+k))]
            y = y[y.notnull()]#索引为负则为缺失值,去掉缺失值
            f = lagrange(y.index, list(y))
            df1.iloc[i,1] = f(i)
    print("副本插值后:",'\n',df1)
    return(df1)

一开始是在python IDLE上进行代码测试并没有出现问题,但是把代码搬到Jupyter时,出现了自己不想看到的报错:提示前往官网查看https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike’,具体原因是pd.Series的索引方式出了问题,在jupyter上不支持旧版写法:
在这里插入图片描述
期间还发现一个致命问题,pd.Series的index是没有反向索引的(-1,-2……),是自己学艺不精o(╥﹏╥)o。使用拉格朗日插值法填充缺失值,报错‘Passing list-likes to .loc or [] with any missing labels is no longer supported_第2张图片原始数据(含缺失值):
x y
0 0 NaN
1 1 NaN
2 2 NaN
3 3 459.0
4 4 NaN
5 5 456.0
6 6 NaN
7 7 448.0
8 8 450.0
9 9 442.0
10 10 450.0
11 11 421.0
12 12 421.0
13 13 421.0
14 14 500.0
15 15 500.0
16 16 500.0
17 17 500.0
18 18 492.0
19 19 492.0
20 20 473.0
21 21 469.0
22 22 469.0
23 23 NaN
24 24 NaN
25 25 453.0
26 26 NaN
27 27 152.0
28 28 70.0
29 29 54.0
30 30 30.0
31 31 23.0
32 32 23.0
33 33 26.0
34 34 149.0
35 35 226.0
36 36 143.0
37 37 317.0
38 38 NaN

第一种算法结果:

副本插值后:

​ x y

0 0 459.000000
1 1 1155.943959
2 2 859.543272
3 3 459.000000
4 4 356.883931
5 5 456.000000
6 6 484.287458
7 7 448.000000
8 8 450.000000
9 9 442.000000
10 10 450.000000
11 11 421.000000
12 12 421.000000
13 13 421.000000
14 14 500.000000
15 15 500.000000
16 16 500.000000
17 17 500.000000
18 18 492.000000
19 19 492.000000
20 20 473.000000
21 21 469.000000
22 22 469.000000
23 23 471.343906
24 24 480.046886
25 25 453.000000
26 26 318.202383
27 27 152.000000
28 28 70.000000
29 29 54.000000
30 30 30.000000
31 31 23.000000
32 32 23.000000
33 33 26.000000
34 34 149.000000
35 35 226.000000
36 36 143.000000
37 37 317.000000
38 38 310.000000

第二种算法结果:

副本插值后:
x y

0 0 463.500000
1 1 462.000000
2 2 460.410714
3 3 459.000000
4 4 458.253401
5 5 456.000000
6 6 449.101130
7 7 448.000000
8 8 450.000000
9 9 442.000000
10 10 450.000000
11 11 421.000000
12 12 421.000000
13 13 421.000000
14 14 500.000000
15 15 500.000000
16 16 500.000000
17 17 500.000000
18 18 492.000000
19 19 492.000000
20 20 473.000000
21 21 469.000000
22 22 469.000000
23 23 471.343906
24 24 480.046886
25 25 453.000000
26 26 318.202383
27 27 152.000000
28 28 70.000000
29 29 54.000000
30 30 30.000000
31 31 23.000000
32 32 23.000000
33 33 26.000000
34 34 149.000000
35 35 226.000000
36 36 143.000000
37 37 317.000000
38 38 1696.000000

每天与你分享一点儿学习经验,做过路过别忘了点赞,小女子在此谢过O(∩_∩)O

参考文献:

1.https://blog.csdn.net/playgoon2/article/details/77051285拉格朗日插值法在数据分析中的应用——Python插值scimpy,lagrange

2.https://blog.csdn.net/shener_m/article/details/81706358拉格朗日插值法python实现与应用

你可能感兴趣的:(特征工程-缺失值处理,python)