R进行两因素重复测量方差分析并可视化(双组折线图)

在仙桃学术上的生信工具里面,有一个折线图的绘图工具,可以很快速便捷的得出结论并可视化结果,当然不是说这个功能有多强大,而是统计学方法非常专业。
比如用它自带的数据https://bioinfomatics.xiantao.love/biotools/data/demo/free/linePlot/%E6%8A%98%E7%BA%BF%E5%9B%BE.xlsx
通过无脑式的鼠标点击,可得到下面一系列的结果。

折线图

可以看到有很多结果,包括各种统计学描述、各种格式的图片,以及一个demo.R

比如统计描述

组别1 组别2 数目 最小值 最大值 中位数(Median) 四分位距(IQR) 下四分位 上四分位 均值(Mean) 标准差(SD) 标准误(SE)
Time1 Control 12 2.833 3.306 2.967 0.254 2.885 3.139 3.020 0.154 0.044
Time1 Intervene 12 2.624 3.418 3.074 0.297 2.876 3.173 3.032 0.258 0.074
Time2 Control 12 2.733 3.240 2.930 0.187 2.861 3.047 2.957 0.148 0.043
Time2 Intervene 12 3.419 4.318 4.082 0.388 3.824 4.212 4.005 0.285 0.082
Time3 Control 12 2.611 3.414 2.938 0.392 2.805 3.197 2.979 0.246 0.071
Time3 Intervene 12 5.799 6.285 6.065 0.185 5.916 6.101 6.030 0.139 0.040

又比如异常值分析、正态性检验(Shapiro-Wilk normality test)、方差齐性检验(Levene's test)、协方差同质假设(Homogeneity of covariances assumption/Box’s M-test)、球形假设(Mauchly's Test for Sphericity)、两因素单向重复测量数据方差分析(Two-way mixed ANOVA)、单独效应分析(Simple effect)、多重成对比较(Pairwise Comparisons of Estimated Marginal Means)等等专业术语,并且还给了详细的统计学结果。


image.png
image.png
image.png

这种统计学结果让大家很欣慰,那么这些结果都是如何计算出来的呢???


我们可以点开demo.R这个结果,读一读代码
https://bioinfomatics.xiantao.love/biotools/code/open/lineplot.R

现在我们复现一下:

加载需要的包并导入数据

library(ggplot2)
library(reshape2)
library(car)
library(rstatix)
## 导入数据,比如我们把下载的折线图数据保存在桌面
library(readxl)
data <- read_excel("~/Desktop/折线图.xlsx") ## mac系统代码
data # 显示结果
trt Time1 Time2 Time3
Control 3.185968 2.875802 3.414224
Control 2.832632 2.862111 2.801333
Control 3.123533 2.984142 2.909648
Control 2.880927 3.146924 3.256431
Control 3.090936 2.732620 2.966887
Control 2.920949 2.865287 2.820161
Control 3.306013 2.856390 2.806807
Control 2.885842 3.002480 2.611145
Control 2.955919 3.239596 3.182478
Control 2.978572 2.818567 2.734901
Control 3.190647 3.042065 3.001664
Control 2.883867 3.063226 3.240626
Intervene 2.623973 3.771982 6.041055
Intervene 2.663926 3.841685 6.098856
Intervene 3.045286 4.301342 6.108293
Intervene 3.054712 4.065593 6.085687
Intervene 3.127857 4.097633 6.055522
Intervene 3.147686 4.134947 5.926182
Intervene 3.248095 3.419161 5.855298
Intervene 3.326086 4.182629 5.885426
Intervene 3.093890 4.318014 6.151859
Intervene 3.418347 4.303615 6.284906
Intervene 2.935396 3.986879 5.798966
Intervene 2.697790 3.640084 6.073529

新增一列id,id即为数字,用于后续分析,必不可少

data$id <- 1:nrow(data)
trt Time1 Time2 Time3 id
Control 3.185968 2.875802 3.414224 1
Control 2.832632 2.862111 2.801333 2
Control 3.123533 2.984142 2.909648 3
Control 2.880927 3.146924 3.256431 4
Control 3.090936 2.732620 2.966887 5
Control 2.920949 2.865287 2.820161 6
Control 3.306013 2.856390 2.806807 7
Control 2.885842 3.002480 2.611145 8
Control 2.955919 3.239596 3.182478 9
Control 2.978572 2.818567 2.734901 10
Control 3.190647 3.042065 3.001664 11
Control 2.883867 3.063226 3.240626 12
Intervene 2.623973 3.771982 6.041055 13
Intervene 2.663926 3.841685 6.098856 14
Intervene 3.045286 4.301342 6.108293 15
Intervene 3.054712 4.065593 6.085687 16
Intervene 3.127857 4.097633 6.055522 17
Intervene 3.147686 4.134947 5.926182 18
Intervene 3.248095 3.419161 5.855298 19
Intervene 3.326086 4.182629 5.885426 20
Intervene 3.093890 4.318014 6.151859 21
Intervene 3.418347 4.303615 6.284906 22
Intervene 2.935396 3.986879 5.798966 23
Intervene 2.697790 3.640084 6.073529 24

将短数据转换为长数据

data2 <- gather(data, key = "x", value = "value", -trt, -id)

对各组进行统计学描述

data3 <- data2 %>% 
  group_by(trt, x) %>% 
  get_summary_stats(value)
trt x variable n min max median q1 q3 iqr mad mean sd se ci
Control Time1 value 12 2.833 3.306 2.967 2.885 3.139 0.254 0.156 3.020 0.154 0.044 0.098
Control Time2 value 12 2.733 3.240 2.930 2.861 3.047 0.187 0.137 2.957 0.148 0.043 0.094
Control Time3 value 12 2.611 3.414 2.938 2.805 3.197 0.392 0.252 2.979 0.246 0.071 0.156
Intervene Time1 value 12 2.624 3.418 3.074 2.876 3.173 0.297 0.232 3.032 0.258 0.074 0.164
Intervene Time2 value 12 3.419 4.318 4.082 3.824 4.212 0.388 0.327 4.005 0.285 0.082 0.181
Intervene Time3 value 12 5.799 6.285 6.065 5.916 6.101 0.185 0.097 6.030 0.139 0.040 0.088

批量运行正态性检验(Shapiro-Wilk normality test)

for(i in unique(data[,1])){
  data1 <- data[data[,1] == i,]
  print(lapply(data1[,-1], function(x) shapiro.test(x)))
}

$Time1

Shapiro-Wilk normality test

data: x

W = 0.91913, p-value = 0.2788

$Time2

Shapiro-Wilk normality test

data: x

W = 0.88011, p-value = 0.08792

$Time3

Shapiro-Wilk normality test

data: x

W = 0.73897, p-value = 0.002055

$id

Shapiro-Wilk normality test

data: x

W = 0.95933, p-value = 0.7742

批量运行方差齐性检验(Levene's Test)

for(i in unique(data2$x)){
  data1 <- data2[data2$x == i,]
  print(leveneTest(value~trt, data = data1))
}

Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 1.5617 0.2245
22
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 2.5864 0.122
22
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 3.8375 0.06291 .
22


Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

两因素重复测量方差分析

anova_test(data = data2, dv = value, wid = id, within = x, between = trt) 

ANOVA Table (type II tests)

$ANOVA
Effect DFn DFd F p p<.05 ges
1 trt 1 22 510.657 1.02e-16 * 0.918
2 x 2 44 391.826 9.19e-29 * 0.902
3 trt:x 2 44 407.706 4.01e-29 * 0.905

$Mauchly's Test for Sphericity
Effect W p p<.05
1 x 0.966 0.699
2 trt:x 0.966 0.699

$Sphericity Corrections
Effect GGe DF[GG] p[GG] p[GG]<.05 HFe DF[HF] p[HF] p[HF]<.05
1 x 0.968 1.94, 42.57 6.65e-28 * 1.059 2.12, 46.61 9.19e-29 *
2 trt:x 0.968 1.94, 42.57 2.98e-28 * 1.059 2.12, 46.61 4.01e-29 *

可视化结果

ggplot(data = data3, aes(x = x, y = mean, color = trt, group = trt)) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), color = "black", width = 0.2) +
  geom_line() +
  geom_point() +
  theme_bw()

image.png

当然这个图跟原图不一样,并且没有两两比较,我们可以使用ggpubrrstatix进行二次绘图。

关于rstatix进行统计学分析,其实可以Repeated Measures ANOVA in R: The Ultimate Guide - Datanovia
得到答案,具体的分析,我们暂不说了,只说如何进行统计学分析,并且自动两两比较

res.aov <- anova_test(data = data2, dv = value, wid = id, within = x, between = trt) 
# 组间两两比较
pwc <- data2 %>%
    group_by(x) %>%
    pairwise_t_test(
        value ~ trt, paired = TRUE,
        p.adjust.method = "bonferroni" ## 校正方法,包括 "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none".等
    )
pwc
x .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
Time1 value Control Intervene 12 12 -0.1453235 11 8.87e-01 8.87e-01 ns
Time2 value Control Intervene 12 12 -12.3908123 11 1.00e-07 1.00e-07 ****
Time3 value Control Intervene 12 12 -40.9352857 11 0.00e+00 0.00e+00 ****

可以看到已经自动进行了两两比较,我们使用rstatix包的add_xy_position()函数可以添加两两比较列表和x轴y轴位置。

 pwc <- pwc %>% add_xy_position(x = "x")# 按时间分列
pwc
x .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif y.position groups xmin xmax
1 value Control Intervene 12 12 -0.1453235 11 8.87e-01 8.87e-01 ns 3.7834 Control , Intervene 0.8 1.2
2 value Control Intervene 12 12 -12.3908123 11 1.00e-07 1.00e-07 **** 4.6834 Control , Intervene 1.8 2.2
3 value Control Intervene 12 12 -40.9352857 11 0.00e+00 0.00e+00 **** 6.6504 Control , Intervene 2.8 3.2

可视化绘图

library(ggpubr)
ggline(data2,x='x',y='value',
color = 'trt', #按组配色
add = 'mean_sd', #添加均数标准差,也可以设置均数标准误,CI等。
palette = "aaas" # aaas杂志配色
)+stat_pvalue_manual(pwc, tip.length = 0, hide.ns = TRUE) + # 添加两两比较,隐藏无意义
    labs(
        subtitle = get_test_label(res.aov, detailed = TRUE), # 添加整体差异
        caption = get_pwc_label(pwc) # 右下角条件两两比较方法。
    )
image.png

当然,我们还可以继续修改图片,比如全部显示结果,取消显著性标记的下划线,或者显示具体的p值等

ggline(data2,x='x',y='value',
       color = 'trt', #按组配色
       add = 'mean_sd', #添加均数标准差,也可以设置均数标准误,CI等。
       palette = "aaas",# aaas杂志配色
ggtheme = theme_bw(), #背景
xlab = F,ylab = "Score", #xy轴名称
legend.title="Treatment" ,legend="top", #标签名称和位置
title = "Comparison of two groups at different times" #标题
)+stat_pvalue_manual(pwc, bracket.size = 0) + # 添加两两比较,显示所有结果
    labs(
        subtitle = get_test_label(res.aov, detailed = TRUE), # 添加整体差异
        caption = get_pwc_label(pwc) # 右下角显示事后两两比较方法。
    )
image.png

也可以显示数值

ggline(data2,x='x',y='value',
       color = 'trt', #按组配色
       add = 'mean_sd', #添加均数标准差,也可以设置均数标准误,CI等。
       palette = "aaas",# aaas杂志配色#背景
       xlab = F,ylab = "Score", #xy轴名称
       legend.title="Treatment" ,legend="right",
       title = "Comparison of two groups at different times" #标题
)+stat_pvalue_manual(pwc, bracket.size = 0,label = "p.adj") + # 添加两两比较,显示所有结果
    labs(
        subtitle = get_test_label(res.aov, detailed = TRUE), # 添加整体差异
        caption = get_pwc_label(pwc) # 右下角显示事后两两比较方法。
    )
image.png

你可能感兴趣的:(R进行两因素重复测量方差分析并可视化(双组折线图))