决策树对于事物的分类特别有用,能够对新出现的事物给出正确的分类,比如生死,如何进行治疗等。比起文本描述的规则, 图形的方式展现分类结果就非常的直观——决策树结果可视化。
核心函数:
rpart.plot()
rattle::fancyRpartPlot()
方法一:rpart.plot()
示例数据:ptitanic(rpart.plot)
ptitanic:不包含乘客姓名和其他细节的Titanic数据。
二叉树
p_load(rpart.plot)
data("ptitanic")
head(ptitanic,3)
## pclass survived sex age sibsp parch
## 1 1st survived female 29.0000 0 0
## 2 1st survived male 0.9167 1 2
## 3 1st died female 2.0000 1 2
binary.model <- rpart(survived~.,data = ptitanic,cp=0.02)
rpart.plot(binary.model,
type = 1,#调整样式
box.palette = "yellow" #调整节点颜色
)
区分不同的节点:box.palette设置多种颜色
rpart.plot(binary.model,type = 2,box.palette = c("pink","gray"))
多个分叉(连续变量)
anova.model <- rpart(Mileage~.,data = cu.summary)
rpart.plot(anova.model,
shadow.col = "gray",
main="miles per gallon\n(continuous response)\n")
多个分叉(分类变量)
multi.class.model <- rpart(Reliability~.,data = cu.summary)
rpart.plot(multi.class.model,
main="vehicle reliability\n(multi class response")
方法二:rattle::fancyRpartPlot()
p_load(rattle)
p_load(RColorBrewer)
p_load(biotops)
# 模拟数据
set.seed(42)
ds <- weather
target <- "RainTomorrow"
risk <- "RISK_MM"
ignore <- c("Date", "Location", risk)
vars <- setdiff(names(ds), ignore)
nobs <- nrow(ds)
form <- formula(paste(target, "~ ."))
train <- sample(nobs, 0.7*nobs)
test <- setdiff(seq_len(nobs), train)
actual <- ds[test, target]
risks <- ds[test, risk]
fit <- rpart(form,data = ds[train,vars])
fancyRpartPlot(fit,
type = 0,#调整颜色
palettes = c("Greys","Blues")#调整颜色
)
参考文献:
rpart.plot: https://CRAN.R-project.org/package=rpart.plot
rattle: https://rattle.togaware.com/