关于解决“泰坦尼克船员获救数据分析”的KFold()报错问题。
报错信息:
TypeError: __init__() got an unexpected keyword argument 'n_folds'
报错代码主体如下:
#Import the linear regression class
from sklearn.linear_model import LinearRegression
#Sklearn also has a helper that makes it easy to do cross validation
from sklearn.cross_validation import KFold
#The columns we'll use to predict the target
predictors = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]
#Initialize our algorithm class
alg = LinearRegression()
#Generate cross validation folds for the titanic dataset. It return the row indices corresponding to train and test.
#We set random_state to ensure we get the same splits every time we run this.
kf = KFold(titanic.shape[0], n_folds=3, random_state=1)
predictions = []
for train, test in kf:
# The predictors we're using the train the algorithm. Note how we only take the rows in the train folds.
train_predictors = (titanic[predictors].iloc[train,:])
# The target we're using to train the algorithm.
train_target = titanic["Survived"].iloc[train]
# Training the algorithm using the predictors and target.
alg.fit(train_predictors, train_target)
# We can now make predictions on the test fold
test_predictions = alg.predict(titanic[predictors].iloc[test,:])
predictions.append(test_predictions)
错误原因1:
from sklearn.cross_validation import KFold
已经淘汰,需要改为from sklearn.model_selection import KFold
,具体信息参见Sklearn官方文档kf = KFold(titanic.shape[0], n_folds=3, random_state=1)
由于sklearn的更新,Kfold的参数已经更改, n_folds
更改为n_splits
,前文代码更改为kf = KFold(n_splits=3, shuffle=False, random_state=1)
,如果不更改,会发生报错TypeError: __init__() got multiple values for argument 'n_splits'
for train, test in kf:
同时更改为for train, test in kf.split(titanic[predictions]):
此时相当于用predictions
来进行折叠交叉划分。以上内容是通过多方度娘总结得出,可参考的链接如下: