比较两个及两个以上样本的均值差异
方差分析前提
独立性:样本须是相互独立的随机样本
正态性 :样本来自正态分布总体
方差齐性:各总体方差相等,即方差齐
独立性 - 可能不独立的数据
重复测量
观测值间有时间相关关系
观测值间有空间相关关系
正态性检验
Shaprio-Wilks normality test – if your data is mainly unique values
D'Agostino-Pearson normality test – if you have lots of repeated values
Lilliefors normality test – mean and variance are unknown
Spiegelhalter's T' normality test – powerful non-normality is due to kurtosis, but bad if skewness is responsible
Q-Q plot:确定一组数据是否与另一组或者待测统计分布近似相似,如果两组数据分布相似,QQplot为一条直线。
require(tidyverse)
data <- diamonds %>%
select(color,carat) %>%
filter(color == "D")
# 散点图
ggplot(data = data, aes(x = carat)) +
geom_density()
# 与正态分布的QQ图
ggplot(data = data, aes(sample = carat)) +
stat_qq(distribution = qnorm)+ stat_qq_line()
方差齐性检验
Bartlett检验 - 对于服从正态分布的数据最为适用 bartlett.test {stats}
Levene检验 - 相较于Bartlett检验更为稳健 leveneTest {car}
Fligner-Killeen检验 - 不依赖于对分布的假设 fligner.test {stats}
# H0:The variances in each of the groups (samples) are the same
bartlett.test(bartlett.test(carat ~ color, data = diamonds))
# 当分组变量多于一组,需要interaction函数
# interaction(fcategory, partner.status)
# partner.status = low+ low, low + medium, low + high,
# high + low, high + medium, high + high
bartlett.test(carat ~ interaction(cut, color),
data = diamonds)
协方差齐性
Box's M: boxM{biotools}
单因素方差分析
单因素方差分析即单一因素影响的多组样本的因变量均值差异是否显著
H0: the means of the different groups are the same
如:不同喂食剂量下的体重
library(car)
library(tidyverse)
library(agricolae)
set.seed(13)
low <- rnorm(40, mean = 10, sd = 1)
middle <- rnorm(40, mean = 10.3, sd = 1)
high <- rnorm(40, mean = 10.5, sd = 1)
data <- data.frame(low , middle , high )
data <- data %>%
gather(key = "does", value = "weight",
low , middle, high)
data.aov <- aov(weight ~ does, data = data)
summary(data.aov)
# 或者先用线性模型,再用anova
data.lm <- lm(weight ~ does, data = data)
anova(data.lm)
Summary()函数会输出残差和模型,anova()只会输出结果。
summary(data.lm)
多重比较
在单因素方差分析中,结果p-vallue指明几组变量中有的means 有显著性差异,但是并不能指明具体是那些组,因此需要用多重比较确定具体分组中差异显著的区域。
TukeyHSD {stats}
TukeyHSD(data.aov)
- diff: difference between means of the two groups
- lwr, upr: the lower and the upper end point of the confidence interval at 95% (default)
- p adj: p-value after adjustment for the multiple comparisons.
agricolae包多重比较
result <- LSD.test(data.aov , "does", p.adj="bonferroni")
result
# return p value
LSD.test(data.aov , "does", p.adj="bonferroni", group = FALSE)
除LSD外的其他方法:duncan.test, SNK.test, SNK.test, etc
multcomp包多重比较
可以运用 glht() 进行多重比较。glht表示 general linear hypothesis tests
library(multcomp)
summary(glht(data.aov, linfct = mcp(group = "Tukey")))
- model: a fitted model, for example an object returned by aov().
- lincft(): a specification of the linear hypotheses to be tested. Multiple comparisons in ANOVA models are specified by objects returned from the function mcp().
成对t检验
pairwise.t.test(data$weight, data$does, p.adjust.method = "BH")
双因素方差分析
将公式中的x ~ A 改为x ~ A + B(无交互)或x ~ A*B(有交互),仍用aov()
library(car)
library(tidyverse)
library(agricolae)
library(multcomp)
# 无交互anova
litter.aov <- aov(weight ~ gesttime + dose, data = litter)
summary(litter.aov)
# 或者先用线性模型,再用anova
litter.lm <- lm(weight ~ gesttime + dose, data = litter)
summary(litter.lm)
# 多重比较 - 上述多重比较均可运用
result.aov1 <- LSD.test(litter.aov, "dose", p.adj="bonferroni")
TukeyHSD(litter.aov, "dose")
# 有交互anova
name <- seq(1:40)
set.seed(13)
rep1 <- rnorm(40, mean = 0, sd = 1)
rep2 <- rnorm(40, mean = 0, sd = 1)
rep3 <- rnorm(40, mean = 11, sd = 1)
group <- rep(c(1:4), 10)
group <- factor(group)
data <- data.frame(name, rep1, rep2, rep3, group)
data <- data %>%
gather(key = "rep", value = "test",
rep1, rep2, rep3)
str(data)
fit <- aov(test ~ rep*group, data = data)
summary(fit3)
result3 <- LSD.test(fit , "rep", "group", p.adj="bonferroni")
非均衡设计双因素方差分析
非均衡设计( unbalanced design )指各组观测值数目不相同的数据结构。总所周知,有Type-I, Type-II and Type-III sums of squares 可以进行非均衡设计方差分析。
Anova() {car } 利用 Type-III进行非均衡设计双因素方差分析。
多元方差分析
正态性检验
mshapiro.test( ){ mvnormtest }用Shapiro-Wilk检验来检测多元正态性
检验不同Species中的Sepal.Length和Petal.Length有没有显著性差异
# 数据准备
sep.l <- iris$Sepal.Length
pet.l <- iris$Petal.Length
# MANOVA test
iris.manova <- manova(cbind(sep.l, pet.l) ~ Species, data = iris)
summary(iris.manova)
协方差分析
实验个体初始条件不同,考虑这些不可控的因素的方差分析
ancova(){HH}
ancova(formula, data.in = NULL, ..., x, groups)
formula指定模型;data.in指定输入数据框;x为协方差分析中的协变量,在作图时如果formula中不包括x则需要指明;groups指明绘图时分组的条件因子,当formula中的“|”后面不包括变量时使用。