R语言估算不同分类器的预测误差

说明

为了比较不同的分类器,我们通过将多种分类算法采用ipred包的erroreset函数进行10折交叉验证,来证明集成分类器是否比单一决策树分类效果更优。

操作

仍然使用telecom churn的数据集作为输入数据源来完成对不再分类器错分率的评估。
bagging模型的错分率方法如下:

churn.bagging = errorest(churn ~ .,data = trainset,model = bagging)
churn.bagging

Call:
errorest.data.frame(formula = churn ~ ., data = trainset, model = bagging)

     10-fold cross-validation estimator of misclassification error 

Misclassification error:  0.0549 

boosting模型的错分率评估方法如下:

churn.boosting

Call:
errorest.data.frame(formula = churn ~ ., data = trainset, model = ada)

     10-fold cross-validation estimator of misclassification error 

Misclassification error:  0.0479 

评估随机森林的错分率:

churn.randomforest = errorest(churn ~ .,data = trainset,model = randomForest)
churn.randomforest

Call:
errorest.data.frame(formula = churn ~ ., data = trainset, model = randomForest)

     10-fold cross-validation estimator of misclassification error 

Misclassification error:  0.0518 

调用churn.predict 对测试数据集进行分类预测,并对单棵决策树的错分率进行评估:

churn.predict1 = function(object,newdata){predict(object,newdata = newdata,type = "class")}
> churn.tree = errorest(churn ~ .,data = trainset,model = rpart,predict = churn.predict1)
> churn.tree

Call:
errorest.data.frame(formula = churn ~ ., data = trainset, model = rpart, 
    predict = churn.predict1)

     10-fold cross-validation estimator of misclassification error 

Misclassification error:  0.0648 

原理

本节使用ipred包中的errorest函数对boosting,bagging,随机森林,以及单颗决策分类树四种分类器的错分率进行了评估。errorest函对每种分类器都执行10折交叉验证,然后计算分类模型的错分率,从中可以看出boosting的方法错分率最低,然后依次是随机森林、bagging,而单颗决策树的性能是最差的。

你可能感兴趣的:(R语言集成学习)