r语言的runmed函数_R语言: 用于多个比较的方差分析(ANOVA)

方差分析在试验科学中有重要的地位,今天谈谈如何用R做方差分析

前提假设:独立、正态、方差齐次(各水平间)

例子:x

数据集用5个因子水平测量,问是否存在差异

光是这样是无法进行分析的,对数据x进行格式转化

b

x a

1 25.6 1

2 22.2 1

3 28.0 1

4 29.8 1

5 24.4 2

6 30.0 2

7 29.0 2

8 27.5 2

9 25.0 3

10 27.7 3

11 23.0 3

12 32.2 3

13 28.8 4

14 28.0 4

15 31.5 4

16 25.9 4

17 20.6 5

18 21.2 5

19 22.0 5

20 21.2 5

在进行方差分析之前先对几条假设进行检验,由于随机抽取,假设总体满足独立、正态,考察方差齐次性(用bartlett检验)

> bartlett.test(x~a,data=b)

Bartlett test of homogeneity of variances

data: x by a

Bartlett's K-squared = 7.0966, df = 4, p-value = 0.1309

方差齐次性符合,因此选用方差分析aov()或者KW检验kruskal.test()均可(。

下面进行方差分析

m1

summary(m1)

Df Sum Sq Mean Sq F value

Pr(>F) a 4 132.0 32.99 4.306 0.0162 *

Residuals 15 114.9 7.66 ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05

‘.’ 0.1 ‘ ’ 1

从这个结果看出差别显著

接下来考察具体的差异(多重比较),

> TukeyHSD(m1)

Tukey multiple comparisons of means

95%

family-wise confidence level

Fit: aov(formula = x ~ a, data = b)

$a

diff lwr upr p adj

2-1 1.325 -4.718582 7.3685818 0.9584566

3-1 0.575 -5.468582 6.6185818 0.9981815

4-1 2.150 -3.893582 8.1935818 0.8046644

5-1 -5.150 -11.193582 0.8935818 0.1140537

3-2 -0.750 -6.793582 5.2935818

0.9949181

4-2 0.825 -5.218582 6.8685818 0.9926905

5-2 -6.475 -12.518582 -0.4314182 0.0330240

4-3 1.575 -4.468582 7.6185818 0.9251337

5-3 -5.725 -11.768582 0.3185818 0.0675152

5-4 -7.300 -13.343582 -1.2564182 0.0146983

除了5、2和5、4间外,其他之间的差异是不显著的。

用于多个比较的方差分析(ANOVA:Analysis of variance)

ANOVA模型能用于比较多个群组之间的均值,这里使用了参数(parametric)的方法,也就是假设这些群组符合Gaussian分布。以下为例子:

----------------------------------------------------

超市连锁店的经理想看看4个店面的耗电量(千瓦)是否相等。他在每个月底收集数据,持续了6个月,结果如下:

Store A: 65, 48, 66, 75, 70, 55

Store B: 64, 44, 70, 70, 68, 59

Store C: 60, 50, 65, 69, 69, 57

Store D: 62, 46, 68, 72, 67, 56

为了使用ANOVA来验证,我们必须首先验证homoskedasticity,也就是方差的同质性检验。R软件提供了两种检验方法:Bartlett检验,和Fligner-Killeen检验。

---------------------------------------------------

我们先看Bartlett检验。首先我们创建4个向量,然后再组合成一个向量:

>a = c(65, 48, 66, 75, 70, 55)

>b = c(64, 44, 70, 70, 68, 59)

>c = c(60, 50, 65, 69, 69, 57)

>d = c(62, 46, 68, 72, 67, 56)

>dati = c(a, b, c, d)

另外我们再创建一个对应用于标示dati分组的4个水平的factor:>groups = factor(rep(letters[1:4], each = 6))

————————————————————————————————————————————————

【如果各组数据量不等:

>groups = factor(c(rep(1,7),rep(2,5), rep(3,8), rep(4,6)))

>groups = factor(c(rep(“CD”,7),rep(“WL",5), rep("LT",8), rep("BJ",6)))

注:第1组数据结有7个,第2组数据结有5个,第3组数据结有8个,第4组数据结有6个.

>g=factor(rep(1:4, c(8,4,7,6)))>g [1] 1 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4

Levels: 1 2 3 4

>g=factor(rep(c("cd","wl","lt","bj"), c(8,4,7,6)))>g [1] cd cd cd cd cd cd cd cd wl wl wl wl lt lt lt lt lt lt lt bj bj bj bj bj bj

Levels: bj cd lt wl

———————————————————————————————————————————————

这样我们就可以进行Bartlett test了:

>bartlett.test(dati, groups)

Bartlett test of homogeneity of variances

data: dati and groups

Bartlett's K-squared = 0.4822, df = 3, p-value = 0.9228

这个函数得到了统计检验的值(K squared)和p-value。因为p-value >

0.05,所以我们可以说这些组的方差是同质的。另一方面,我们也可以比较Barlett的K-squared和查表的chi-square值,使用函数

qchisq,其输入包括alpha值和自由度

>qchisq(0.950, 3)

[1] 7.814728

显然,这里的chi-squared 大于上面计算的Bartlett的K-squared,因此我们接受null hypothesis

H0,即方差都是同质的。

-------------------------------------------------------------------

现在我们试着用Fligner-Killeen test来检测同质性。调用函数的方法和过程都类似:

>a = c(65, 48, 66, 75, 70, 55)

>b = c(64, 44, 70, 70, 68, 59)

>c = c(60, 50, 65, 69, 69, 57)

>d = c(62, 46, 68, 72, 67, 56)

>dati = c(a, b, c, d)

>groups = factor(rep(letters[1:4], each = 6))

>fligner.test(dati, groups)

Fligner-Killeen test of homogeneity of variances

data: dati and groups

Fligner-Killeen:med chi-squared = 0.1316, df = 3, p-value =

0.9878

这里的结论也与Bartlett test类似。

----------------------------------------------------------------------------------

已验证了4个群组的同质性,我们就可以来处理ANOVA模型了。

首先拟合模型:

>fit = lm(formula = dati ~ groups)

然后分析ANOVA模型:

>anova (fit)

Analysis of Variance Table

Response: dati

Df Sum Sq Mean Sq F value

Pr(>F)

groups 3 8.46 2.82 0.0327 0.9918

Residuals 20 1726.50 86.33

函数的输出为经典的ANOVA表,数据如下:

Df = degree of freedom,自由度

Sum Sq = deviance (within groups, and

residual),总方差和(分别有groups和residual的)

Mean Sq = variance (within groups, and

residual),平均方差和(分别有groups和residual的)

F value = the value of the Fisher statistic test, so computed

(variance within groups) / (variance residual),统计检验的值

Pr(>F) = p-value

因为p-value大于0.05,我们接受null hypothesis

H0,即4个样本的均值统计相等。我们也可以比较计算的F-vaue和查表的F-value:

>qf(0.950, 20, 3)

[1] 8.66019

查表的F-value大于之前计算的F-value,因此我们接受null hypothesis。

应用实例:当各样本量大小不一时

>

rw[1:10,]

r pop

1  0.0870 CD

2 -0.0910 CD

3  0.0719 CD

4  0.2516 CD

5  0.1878 CD

6 -0.3318 CD

7 -0.4095 CD

8 -0.0852 CD

9 -0.1585 CD

10 -0.2265 CD

> str(rw)'data.frame': 8472 obs. of 2 variables:

$ r : num 0.087 -0.091 0.0719 0.2516 0.1878 ...

$ pop: Factor w/ 5 levels "BJ","CD","CQ",..: 2 2 2 2 2 2 2 2 2 2 ...

>bartlett.test(r~pop,data=rw) Bartlett test of homogeneity of variances

data: r by pop

Bartlett's K-squared = 74.8494, df = 4, p-value = 2.144e-15 # p<0.05,方差不齐

————————————————————————————————————

方差不齐用:

>kruskal.test(r~pop,data=rw)

Kruskal-Wallis rank sum test

data: r by pop

Kruskal-Wallis chi-squared = 43.3389, df = 4, p-value = 8.801e-09

————————————————————————————————————————————

>m1summary(m1) Df Sum Sq Mean Sq F value Pr(>F)

pop 4 2.0 0.4980 12.8 2.19e-10 ***

Residuals 8467 329.5 0.0389

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

>TukeyHSD(m1)

Tukey multiple comparisons of means

95% family-wise confidence level

Fit: aov(formula = r ~ pop, data = rw)

$pop diff  lwr  upr p adj

CD-BJ 0.025972862 -0.076497522 0.12844325 0.9583788

CQ-BJ 0.048778810 -0.123445811  0.22100343 0.9384799

LT-BJ 0.127254524 0.018965243  0.23554380 0.0117902

WL-BJ 0.032694279 -0.069250584  0.13463914 0.9061475

CQ-CD 0.022805947 -0.116720234  0.16233213 0.9918260

LT-CD 0.101281662 0.062130625  0.14043270 0.0000000

WL-CD 0.006721417 -0.007384981  0.02082782 0.6912301

LT-CQ 0.078475714 -0.065378185  0.22232961 0.5700784

WL-CQ -0.016084530 -0.155225218 0.12305616 0.9978590

WL-LT -0.094560244 -0.132314441 -0.05680605 0.0000000

>re plot(re)

你可能感兴趣的:(r语言的runmed函数)