提示:本文使用R语言实现决策树,并对决策树结构图进行美化
group就是分类结果:case或control两个标签;其余为自变量
部分数据内容:
> mydata
age sex dishistory index1 index2 index3 index4 index5 index6 group
1 80 male yes 26 4.2 1.4 53.0 50.0 0.10 case
2 80 male yes 22 6.6 2.0 63.1 67.2 1.90 control
3 80 male yes 24 6.9 1.8 58.1 50.3 2.10 control
4 80 male yes 21 6.9 2.9 57.3 53.0 -2.10 control
5 73 male yes 28 4.3 0.9 72.0 70.0 3.00 case
6 73 male yes 22 6.3 0.8 50.2 50.3 -1.40 control
7 73 male yes 25 6.9 1.6 53.1 61.0 1.90 control
8 73 male yes 22 5.6 0.9 53.4 53.0 -1.60 control
9 82 male yes 27 5.7 0.8 69.3 71.2 3.40 case
10 82 male yes 24 7.1 1.8 59.0 51.0 0.40 control
11 82 male yes 22 5.9 2.1 59.1 53.1 -2.90 control
12 82 male yes 21 5.9 3.0 62.4 64.0 -1.30 control
13 77 male yes 23 5.0 1.5 72.3 62.8 1.60 case
14 77 male yes 25 6.1 2.9 49.9 50.1 -1.30 control
15 77 male yes 23 7.2 2.6 54.2 51.0 0.60 control
16 77 male yes 24 5.1 1.7 51.6 64.1 -1.30 control
17 76 male yes 22 4.9 1.9 53.4 51.9 -0.60 case
18 76 male yes 22 6.4 1.5 65.2 60.3 -1.60 control
19 76 male yes 26 6.0 2.3 51.3 67.2 -1.20 control
20 76 male yes 24 6.0 2.6 53.4 51.3 -2.10 control
21 68 male yes 26 4.8 1.8 65.5 67.5 0.10 case
22 68 male yes 22 6.5 2.3 51.3 50.6 -1.90 control
23 68 male yes 20 6.9 1.7 59.6 62.7 0.90 control
24 68 male yes 22 6.9 2.8 57.2 52.3 -1.90 control
25 75 male no 26 6.3 2.6 69.3 62.7 0.40 case
26 75 male no 21 6.4 2.3 53.1 64.9 1.00 control
27 75 male no 22 4.7 1.6 59.1 50.1 -2.30 control
28 75 male no 25 6.9 2.8 57.2 53.0 0.60 control
29 81 male yes 27 4.2 1.4 64.8 63.8 0.20 case
30 81 male yes 20 5.8 2.9 52.9 53.1 -1.30 control
31 81 male yes 22 6.9 2.6 56.2 56.4 -1.60 control
32 81 male yes 21 5.9 1.2 57.4 52.1 -1.90 control
33 69 male yes 27 5.4 1.8 68.2 64.1 0.60 case
34 69 male yes 23 7.2 3.0 68.4 61.2 0.90 control
35 69 male yes 23 9.8 1.8 63.1 49.3 -1.10 control
36 69 male yes 21 7.2 2.5 60.5 62.9 1.50 control
37 84 male yes 22 5.2 2.9 69.8 65.4 0.30 case
38 84 male yes 21 6.2 1.4 51.3 51.3 1.90 control
39 84 male yes 22 6.8 2.9 65.2 60.2 0.10 control
40 84 male yes 24 6.2 2.0 59.1 52.1 1.50 control
41 72 male yes 21 5.1 1.2 71.2 66.1 1.30 case
42 72 male yes 21 7.1 1.7 55.9 49.0 -2.30 control
43 72 male yes 21 7.1 3.0 61.3 62.1 -1.80 control
44 72 male yes 27 5.9 1.7 56.1 55.1 0.90 control
45 77 male yes 22 4.7 2.1 70.6 67.2 1.60 case
46 77 male yes 22 7.2 3.1 64.9 51.4 0.60 control
47 77 male yes 22 5.1 2.6 69.0 53.6 -2.80 control
48 77 male yes 21 7.2 2.9 55.9 65.2 0.40 control
49 80 male yes 26 4.6 1.5 69.0 68.9 0.80 case
50 80 male yes 21 7.1 2.6 68.4 52.8 0.90 control
51 80 male yes 21 6.0 2.6 56.2 55.3 -1.80 control
52 80 male yes 22 7.1 2.8 52.9 50.1 -1.30 control
53 67 male yes 27 5.8 1.4 62.5 69.0 1.40 case
54 67 male yes 21 6.0 2.6 57.3 50.5 -0.20 control
55 67 male yes 23 5.6 2.1 58.1 53.6 -1.40 control
56 67 male yes 22 6.1 1.2 70.0 52.0 -1.82 control
57 76 male yes 24 6.0 1.7 59.8 70.2 1.20 case
58 76 male yes 26 6.1 1.8 57.3 66.2 -0.20 control
59 76 male yes 22 6.5 2.9 68.4 64.0 -2.10 control
60 76 male yes 22 7.1 2.6 55.3 59.7 1.90 control
61 81 male yes 24 5.9 2.4 72.1 61.2 2.10 case
62 81 male yes 21 7.3 3.1 60.5 53.1 0.90 control
63 81 male yes 22 5.2 1.6 56.8 48.9 1.50 control
64 81 male yes 24 5.8 1.0 54.2 50.1 -1.90 control
65 92 male yes 23 5.5 2.6 60.4 60.5 2.00 case
66 91 male yes 21 7.2 2.8 67.5 49.3 -1.82 control
67 92 male yes 24 6.6 2.0 68.4 59.7 -1.30 control
68 92 male yes 20 5.4 2.6 57.3 50.1 0.90 control
69 83 male yes 26 4.4 1.8 69.4 69.3 2.20 case
70 83 male yes 24 7.2 1.6 65.2 62.8 -1.60 control
71 83 male yes 20 6.9 1.2 52.0 60.2 -2.60 control
72 90 female yes 23 6.7 0.9 51.6 52.9 -2.90 control
73 64 male yes 28 5.2 1.7 68.9 58.1 -1.20 case
74 64 male yes 21 6.5 3.2 59.1 51.0 -2.90 control
75 64 male yes 23 6.3 3.0 54.0 51.3 0.80 control
76 64 male yes 22 5.3 1.9 57.2 59.4 -2.30 control
77 78 male yes 28 4.8 2.6 67.5 57.3 -1.80 case
78 78 male yes 20 5.4 1.1 58.7 53.0 0.60 control
79 78 male yes 25 6.4 2.9 52.0 64.0 1.50 control
80 78 male yes 23 7.2 2.9 58.1 59.0 0.80 control
81 70 male yes 29 4.7 1.1 71.6 56.4 -2.60 case
82 70 male yes 22 6.4 2.6 69.0 48.0 -1.10 control
83 70 male yes 23 7.2 2.4 58.3 62.8 0.90 control
84 70 male yes 23 6.5 2.1 64.9 50.1 -1.60 control
85 81 male yes 29 4.8 2.1 71.9 55.2 -2.10 case
86 81 male yes 23 5.6 1.2 53.1 28.1 -1.30 control
87 81 male yes 21 5.6 1.7 57.2 52.1 -0.20 control
88 81 male yes 23 6.2 3.0 61.3 51.3 -1.90 control
89 73 male yes 26 5.2 2.1 72.0 69.1 0.10 case
90 73 male yes 22 6.5 2.8 50.6 62.9 -1.30 control
91 73 male yes 22 6.9 1.6 50.6 49.6 -1.60 control
92 73 male yes 21 6.3 1.0 57.2 68.9 -1.30 control
93 77 male no 27 5.1 2.0 70.9 55.6 0.60 case
94 77 male no 20 7.2 2.6 64.9 62.9 -1.82 control
95 84 female no 27 7.1 1.8 56.2 61.2 -1.90 control
96 84 female no 20 6.4 2.1 54.0 51.0 -2.60 control
97 68 male yes 27 5.6 1.8 70.5 70.0 0.90 case
98 68 male yes 21 6.2 1.6 56.1 50.4 -1.80 control
99 68 male yes 24 5.8 2.7 56.8 50.9 -2.90 control
100 68 male yes 24 6.9 2.4 54.2 61.2 0.80 control
提示:以上为部分数据,实际数据共392个样例
为实现决策树并对决策树的结构图进行美化,需要用到的R包如下:
加载代码如下:
library(rpart)
library(tibble)
library(bitops)
library(rattle)
library(rpart.plot)
library(RColorBrewer)
read.table("D:\\Rprojects\\tree.csv",header=TRUE,sep=",")->mydata #读取数据
260个样例作为训练集,其余作为测试集
sub<-sample(1:392,260)
train<-mydata[sub,]
test<-mydata[-sub,]
利用训练集构建决策树:
model <- rpart(group~age+dishistory+index1+index2+index3+index4+index5+index6,data = train)
fancyRpartPlot(model)
利用验证集对模型结果进行验证:
x<-subset(test,select=-group)
pred<-predict(model,x,type="class")
k<-test[,"group"]
table(pred,k)
得到结果矩阵如下:
真实值 | case | control |
---|---|---|
预测值 | - | - |
case | 31 | 1 |
control | 7 | 93 |
准确率:(31+93)/(31+93+7+1)
灵敏度:31/(31+7)
特异度:31/(93+1)
以上就是本文所分享的内容,本文简单介绍了R语言实现决策树的基本操作以及利用fancyRpartplot()对生成的决策树结构图美化,具体可根据自己喜好调整。