随着python版本的更迭,机器学习100天中部分原有的代码无法与较新版本的python相容,为了让大家更方便的运行代码,了解其中原理,笔者将其中需要改正的部分列举出来,供大家参考。
1.ImportError: cannot import name 'Imputer' from 'sklearn.preprocessing'
原代码:
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = "NaN", strategy = "mean", axis = 0)
imputer = imputer.fit(X[ : , 1:3])
X[ : , 1:3] = imputer.transform(X[ : , 1:3])
错误原因:
在0.22版本的sklearn中,imputer不在preprocessing里了,而是在sklearn.impute里。SimpleImputer进行缺失值处理,详细用法见缺失值处理
修改后代码:
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan, strategy = "mean")
imputer = imputer.fit(X[:,1:3])
X[:, 1:3] = imputer.transform(X[:,1:3])
2.TypeError: __init__() got an unexpected keyword argument 'categorical_features'
原代码:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[ : , 0] = labelencoder_X.fit_transform(X[ : , 0])
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
labelencoder_Y = LabelEncoder()
Y = labelencoder_Y.fit_transform(Y)
错误原因:
sklearn 版本在0.22.1往后,OneHotEncoder没有参数categorical_features,可以引入ColumnTransformer(),将OneHotEncoder这个转换器放在里边。
修改后代码:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
labelencoder_X = LabelEncoder()
X[ : , 0] = labelencoder_X.fit_transform(X[ : , 0])
onehotencoder = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder = 'passthrough')
X = onehotencoder.fit_transform(X)
labelencoder_Y = LabelEncoder()
Y = labelencoder_Y.fit_transform(Y)
3.ModuleNotFoundError: No module named 'sklearn.cross_validation'
原代码:
from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split( X , Y , test_size = 0.2, random_state = 0)
错误原因:
在sklearn 0.18及以上的版本中,出现了sklearn.cross_validation无法导入的情况,原因是新版本中此包被废弃,只需将 cross_validation 改为 model_selection 即可。
修改后代码:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split( X , Y , test_size = 0.2, random_state = 0)
----------------------------------------------------------------------------------------------
附完整代码:
import numpy as np
import pandas as pd
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:,:-1].values
Y = dataset.iloc[:, 3].values
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan, strategy = "mean")
imputer = imputer.fit(X[:,1:3])
X[:, 1:3] = imputer.transform(X[:,1:3])
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
labelencoder_X = LabelEncoder()
X[ : , 0] = labelencoder_X.fit_transform(X[ : , 0])
onehotencoder = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder = 'passthrough')
X = onehotencoder.fit_transform(X)
labelencoder_Y = LabelEncoder()
Y = labelencoder_Y.fit_transform(Y)
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split( X , Y , test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.fit_transform(X_test)
参考文章:
https://blog.csdn.net/qq_44635691/article/details/104374481
https://blog.csdn.net/yunfenglw/article/details/105835732
https://blog.csdn.net/qq_43965708/article/details/115625768