目录
一、贝叶斯判别(最大后验概率法,需要正态假定)
1.线性判别函数——lda(x,……)(在MASS中)
编辑2.二次判别函数——qda(x,……)
3.应用——【例5.2.3破产企业】
【补充1】矩阵求特定列的方法
【补充2】回代法和交叉验证法的比较
二、费希尔判别
三、逐步判别
> library(MASS)#MASS包不用加载,自带
> ?lda
打开httpd帮助服务器… 好了
> d5.2.3=read.csv('D:/个人成长/学业/课程/应用多元统计分析/上机/上机二/examp5.2.3.csv',header=1)
> attach(d5.2.3)#解析变量(阐释各个变量的含义)
The following objects are masked _by_ .GlobalEnv:
x1, x2, x3
#协方差矩阵相等的距离判别(线性判别函数,等概率)
> mod1=lda(g~x1+x2+x3+x4,prior=c(0.5,0.5),d5.2.3)
#协方差矩阵不等的距离判别(二次判别函数)
> mod2=qda(g~x1+x2+x3+x4,prior=c(0.5,0.5),d5.2.3)
> Z=predict(mod1)#对原训练样本进行判别
> newg1=Z$class#得到判别的分组
> table(g,newg1)#得到真实分组与判别分组的分布表
newg1
g 1 2
1 18 3
2 1 24
> W=predict(mod2)
> newg2=W$class
> table(g,newg2)
newg2
g 1 2
1 19 2
2 1 24
> mod2=lda(cbind(d5.2.3[,1:4]),g,prior=c(0.1,0.9))
> xnew=data.frame(x1=-0.16,x2=-0.1,x3=1.45,x4=0.51)#数据框
> xnew
x1 x2 x3 x4
1 -0.16 -0.1 1.45 0.51
> pre=predict(mod2,xnew)
> pre
$class
[1] 2
Levels: 1 2
$posterior
1 2
[1,] 0.4770096 0.5229904
$x
LD1
[1,] -1.867176
> pre$class
[1] 2
Levels: 1 2
> pre$posterior
1 2
[1,] 0.4770096 0.5229904
> detach(d5.2.3)#释放变量
data[,c('x1','x2','x4')]
data[,1:4]
data[,-5]
data[,c(1,2,4)]#取特定某几列
> d5.2.3=read.csv('D:/个人成长/学业/课程/应用多元统计分析/上机/上机二/examp5.2.3.csv',header=1)
> mod1=lda(d5.2.3[,'g']~x1+x2+x3+x4,prior=c(0.5,0.5),d5.2.3)
> mod12=lda(d5.2.3[,'g']~x1+x2+x3+x4,prior=c(0.5,0.5),d5.2.3,CV=1)#交叉验证法的判别
> newg12=mod12$class
> cbind(d5.2.3[,'g'],round(mod12$posterior,3),newg12)
1 2 newg12
1 1 0.998 0.002 1
2 1 0.964 0.036 1
3 1 0.769 0.231 1
4 1 0.778 0.222 1
5 1 0.881 0.119 1
6 1 0.921 0.079 1
7 1 0.747 0.253 1
8 1 0.817 0.183 1
9 1 0.679 0.321 1
10 1 0.902 0.098 1
11 1 0.995 0.005 1
12 1 0.569 0.431 1
13 1 0.514 0.486 1
14 1 0.984 0.016 1
15 1 0.254 0.746 2
16 1 0.121 0.879 2
17 1 0.817 0.183 1
18 1 0.664 0.336 1
19 1 0.811 0.189 1
20 1 0.255 0.745 2
21 1 0.981 0.019 1
22 2 0.131 0.869 2
23 2 0.493 0.507 2
24 2 0.023 0.977 2
25 2 0.215 0.785 2
26 2 0.012 0.988 2
27 2 0.011 0.989 2
28 2 0.323 0.677 2
29 2 0.441 0.559 2
30 2 0.169 0.831 2
31 2 0.452 0.548 2
32 2 0.354 0.646 2
33 2 0.492 0.508 2
34 2 0.967 0.033 1
35 2 0.175 0.825 2
36 2 0.320 0.680 2
37 2 0.173 0.827 2
38 2 0.343 0.657 2
39 2 0.135 0.865 2
40 2 0.642 0.358 1
41 2 0.490 0.510 2
42 2 0.003 0.997 2
43 2 0.242 0.758 2
44 2 0.037 0.963 2
45 2 0.137 0.863 2
46 2 0.000 1.000 2
> table12=table(d5.2.3[,'g'],newg12)
> table12
newg12
1 2
1 18 3
2 2 23
> round(prop.table(table12,1),3)
newg12
1 2
1 0.857 0.143
2 0.080 0.920
> Z=predict(mod1)#回代法的判别
> newg1=Z$class
> cbind(newg1,newg12)
newg1 newg12
[1,] 1 1
[2,] 1 1
[3,] 1 1
[4,] 1 1
[5,] 1 1
[6,] 1 1
[7,] 1 1
[8,] 1 1
[9,] 1 1
[10,] 1 1
[11,] 1 1
[12,] 1 1
[13,] 1 1
[14,] 1 1
[15,] 2 2
[16,] 2 2
[17,] 1 1
[18,] 1 1
[19,] 1 1
[20,] 2 2
[21,] 1 1
[22,] 2 2
[23,] 2 2
[24,] 2 2
[25,] 2 2
[26,] 2 2
[27,] 2 2
[28,] 2 2
[29,] 2 2
[30,] 2 2
[31,] 2 2
[32,] 2 2
[33,] 2 2
[34,] 1 1
[35,] 2 2
[36,] 2 2
[37,] 2 2
[38,] 2 2
[39,] 2 2
[40,] 2 1
[41,] 2 2
[42,] 2 2
[43,] 2 2
[44,] 2 2
[45,] 2 2
[46,] 2 2
> d5.4.2=read.csv('D:/个人成长/学业/课程/应用多元统计分析/上机/上机二/examp5.4.2.csv',header=1)
> ldf=lda(Taxon~x1+x2+x3+x4,d5.4.2,prior=c(1,1,1)/3)
#得到未经中心化的判别函数,第三部分是判别函数系数,第四部分是每个判别函数的贡献率
> ldf
Call:
lda(Taxon ~ x1 + x2 + x3 + x4, data = d5.4.2, prior = c(1, 1,
1)/3)
Prior probabilities of groups:
1 2 3
0.3333333 0.3333333 0.3333333
Group means:
x1 x2 x3 x4
1 50.06 34.28 14.62 2.46
2 59.36 27.70 42.60 13.26
3 65.88 29.74 55.52 20.26
Coefficients of linear discriminants:
LD1 LD2
x1 0.08293776 -0.002410215
x2 0.15344731 -0.216452123
x3 -0.22012117 0.093192121
x4 -0.28104603 -0.283918785
Proportion of trace:
LD1 LD2
0.9912 0.0088
> Z=predict(ldf)#对原训练样本进行判别
> newg=Z$class#得到判别的分组
> table(d5.4.2[,'Taxon'],newg)#得到真实分组与判别分组的分布表
newg
1 2 3
1 50 0 0
2 0 48 2
3 0 1 49
> plot(Z$x)
> text(Z$x[,1],Z$x[,2],d5.4.2[,'Taxon'],adj=-0.8,cex=0.8)#标注,adj和cex控制字的大小和圈的大小
> install.packages('MorphoTools2')
> library('MorphoTools2')
> morpho5.4.2=read.morphodata('D:/个人成长/学业/课程/应用多元统计分析/上机/上机二/examp5.4.2.txt')
> can5.4.2=cda.calc(morpho5.4.2)
> round(can5.4.2$coeffs.raw,3)
Can1 Can2
x1 0.083 0.002
x2 0.153 0.216
x3 -0.220 -0.093
x4 -0.281 0.284
> sc=can5.4.2$objects$scores
Usgae
stepdisc.calc(objects,FToEnter=0.15,FToStay=0.15)
Arguments
objects:an object of class morphodata
FToEnter:significance levels for a variable to enter the subset
FToStay:significance levels for a varible to stay in the subset
> stepdisc.calc(morpho5.4.2,FToEnter=0.05,FToStay=0.05)
Entered Removed Partial R-square F-value Pr > F
1 x3 0.94137172 1180.161182 2.856777e-91
2 x2 0.37088193 43.035453 2.029774e-15
3 x4 0.32286458 34.568686 5.296345e-13
4 x1 0.06153651 4.721152 1.032884e-02
Selected characters:
x3, x2, x4, x1