【机器学习100天】Day-1数据预处理-代码调整

 随着python版本的更迭,机器学习100天中部分原有的代码无法与较新版本的python相容,为了让大家更方便的运行代码,了解其中原理,笔者将其中需要改正的部分列举出来,供大家参考。


1.ImportError: cannot import name 'Imputer' from 'sklearn.preprocessing'

 原代码:

from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = "NaN", strategy = "mean", axis = 0)
imputer = imputer.fit(X[ : , 1:3])
X[ : , 1:3] = imputer.transform(X[ : , 1:3])

 错误原因:

        在0.22版本的sklearn中,imputer不在preprocessing里了,而是在sklearn.impute里。SimpleImputer进行缺失值处理,详细用法见缺失值处理

修改后代码:

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan, strategy = "mean")
imputer = imputer.fit(X[:,1:3])
X[:, 1:3] = imputer.transform(X[:,1:3])

2.TypeError: __init__() got an unexpected keyword argument 'categorical_features'

  

原代码:

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[ : , 0] = labelencoder_X.fit_transform(X[ : , 0])

onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()

labelencoder_Y = LabelEncoder()
Y =  labelencoder_Y.fit_transform(Y)

错误原因:

        sklearn 版本在0.22.1往后,OneHotEncoder没有参数categorical_features,可以引入ColumnTransformer(),将OneHotEncoder这个转换器放在里边。

修改后代码:

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
labelencoder_X = LabelEncoder()
X[ : , 0] = labelencoder_X.fit_transform(X[ : , 0])

onehotencoder = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder = 'passthrough')
X = onehotencoder.fit_transform(X)

labelencoder_Y = LabelEncoder()
Y =  labelencoder_Y.fit_transform(Y)

3.ModuleNotFoundError: No module named 'sklearn.cross_validation'

原代码:

from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split( X , Y , test_size = 0.2, random_state = 0)

错误原因:

        在sklearn 0.18及以上的版本中,出现了sklearn.cross_validation无法导入的情况,原因是新版本中此包被废弃,只需将 cross_validation 改为 model_selection 即可。

修改后代码:

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split( X , Y , test_size = 0.2, random_state = 0)

 ----------------------------------------------------------------------------------------------

附完整代码:

import numpy as np
import pandas as pd

dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:,:-1].values
Y = dataset.iloc[:, 3].values

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan, strategy = "mean")
imputer = imputer.fit(X[:,1:3])
X[:, 1:3] = imputer.transform(X[:,1:3])

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
labelencoder_X = LabelEncoder()
X[ : , 0] = labelencoder_X.fit_transform(X[ : , 0])

onehotencoder = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder = 'passthrough')
X = onehotencoder.fit_transform(X)


labelencoder_Y = LabelEncoder()
Y =  labelencoder_Y.fit_transform(Y)

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split( X , Y , test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.fit_transform(X_test)


参考文章:

https://blog.csdn.net/qq_44635691/article/details/104374481

https://blog.csdn.net/yunfenglw/article/details/105835732

https://blog.csdn.net/qq_43965708/article/details/115625768​​​​​​​

你可能感兴趣的:(机器学习,python,人工智能)