R in Action 学习笔记 - 第九章-Analysis of Variance

One-way ANOVA


> install.packages("multcomp")

> library(multcomp)

> attach(cholesterol)


条件:因为 one-way ANOVA 的因变量需要满足正态分布,并且各组等方差

In a one-way ANOVA, the dependent variable is assumed to be normally distributed, and have equal variance in each group 


Step-1:用QQPlot检查是否符合正态分布条件

use a Q-Q plot to assess the normality assumption 


> install.packages("car")

> library(car)

> qqPlot(lm(response~trt,data=cholesterol),simulate=TRUE,main="Q-Q Plot",labels=FALSE)

#data fall within the 95 percent confidence envelope, 

#suggesting that the normality assumption has been met fairly well 




Step-2: 用 ANOVA

> fit <- aov(response~trt)

> summary(fit)
            Df Sum Sq Mean Sq F value   Pr(>F)    
trt          4 1351.4   337.8   32.43 9.82e-13 ***
Residuals   45  468.8    10.4                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


结论:

The ANOVA F test for treatment (trt) is significant (p < .0001), providing evidence that the five treatments aren't all equally effective


可视化

> library(gplots)

> plotmeans(response~trt,xlab="Treatment",ylab="Response",main="Mean Plot with 95% CI")





Step-3: 用其他test方法可得到相同结论

Bartlett’s test 

> bartlett.test()

Fligner–Killeen test  

> fligner.test()

Brown–Forsythe test 

> hov()


> bartlett.test(response~trt,data=cholesterol)


Bartlett test of homogeneity of variances


data:  response by trt
Bartlett's K-squared = 0.57975, df = 4, p-value = 0.9653#indicates that the variances in the five groups don’t differ significantly 




Step-4:看具体两两组间的差异

tell you which treatments differ from one another 


> TukeyHSD(fit)#TukeyHSD() function provides a test of all pairwise differences between group means 
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = response ~ trt)

$trt
                  diff        lwr       upr     p adj
2times-1time   3.44300 -0.6582817  7.544282 0.1380949#
for1timeand2timesaren’t significantly different from each other (p = 0.138) 
4times-1time   6.59281  2.4915283 10.694092 0.0003542#
the difference between 1time and 4times is significantly different (p < .001) 
drugD-1time    9.57920  5.4779183 13.680482 0.0000003
drugE-1time   15.16555 11.0642683 19.266832 0.0000000
4times-2times  3.14981 -0.9514717  7.251092 0.2050382
drugD-2times   6.13620  2.0349183 10.237482 0.0009611
drugE-2times  11.72255  7.6212683 15.823832 0.0000000
drugD-4times   2.98639 -1.1148917  7.087672 0.2512446
drugE-4times   8.57274  4.4714583 12.674022 0.0000037
drugE-drugD    5.58635  1.4850683  9.687632 0.0030633


可视化

> par(las=2)rotatesthe axis labels 

> par(mar=c(5,8,4,2))increases the left margin area so that the labels fit 

> plot(TukeyHSD(fit))





可视化#reproduces the Tukey HSD test, along with a different graphical representation of the results 

> par(mar=c(5,4,6,2))#increased the top margin to fit the letter array 

> tuk <- glht(fit,linfct=mcp(trt="Tukey"))

#glht() provides a much more comprehensive set of methods for multiple mean comparisons 

#you can use for both linear models and generalized linear models 

> plot(cld(tuk,level=.05),col="lightgrey")#significance level to use 0.05, or 95percent confidence in this case


参考:

http://www.inside-r.org/packages/cran/multcomp/docs/glht

http://www.stat.wmich.edu/wang/664/egs/Rmice.html






图表说明:具有相同字母的两组间没有显著的不同

Groups that have the same letter don’t have significantly different means 

taking the cholesterol-lowering drug in5 mg dosesfour times a day was better than taking a 20 mg dose once per day 

The competitordrugD wasn’t  superior to this four-times-per-day regimen 

competitor drugE was superior to both drugD and all three dosage strategies for our focus drug 




条件:因为 ANOVA 对outlier比较敏感

analysis of variance methodologies can be sensitive to the presence of outliers 



Step-5: 所以要检查是否没有outlier

> outlierTest(fit)

No Studentized residuals with Bonferonni p < 0.05
Largest |rstudent|:
   rstudent unadjusted p-value Bonferonni p
19 2.251149           0.029422           NA #
there’s no indication of outliers in the cholesterol data (NA occurs when p > 1) 



最后:

Taking the Q-Q plot, Bartlett’s test, and outlier test together, the data appear to fit the ANOVA model quite well.  


总结:

1. 先用 qqPlot 看变量是否符合正态分布,car 包

2. 用 aov 再 summary 看多组间是否具有显著差异性,multcomp包

3. 用 ghlt 的 Tukey 看两两间的差异性

4. 用 outlierTest 看是否满足没有 outlier





你可能感兴趣的:(ANOVA,RinAction)