Paper intensive reading (十七):Methods that remove batch effects may lead to exaggerated confidence

论文题目:Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses

scholar 引用:100

页数:11

发表时间:August 13, 2015

发表刊物:Biostatistics

作者:VEGARD NYGAARD , EINAR ANDREAS RØDLAND

摘要:

Removal of, or adjustment for, batch effects or center differences is generally required when such effects are present in data. In particular, when preparing microarray gene expression data from multiple cohorts, array platforms, or batches for later analyses, batch effects can have confounding effects, inducing spurious(虚假) differences between study groups. Many methods and tools exist for removing batch effects from data. However, when study groups are not evenly distributed across batches, actual group differences may induce apparent batch differences, in which case batch adjustments may bias, usually deflate(放出), group differences. Some tools therefore have the option of preserving the difference between study groups, e.g. using a two- way ANOVA model to simultaneously estimate both group and batch effects. Unfortunately, this approach may systematically induce incorrect group differences in downstream analyses when groups are distributed between the batches in an unbalanced manner. The scientific community seems to be largely unaware of how this approach may lead to false discoveries.

当groups在各batch之间分布不均时,批处理校正方法会在下游分析引起不正确的组差异。

结论:

  • The size and impact of the problem will depend greatly on how unbalanced the group–batch distribution is: if it is only moderately unbalanced, it need not be a concern, whereas in heavily unbalanced cases it may have a huge influence. 中等不平衡不需要特别担心。。。
  • The impact is also more likely to be notable when used to analyze a large number of features, e.g. a large set of genes, followed by multiple testing corrections such as using the FDR, as the effect is more pronounced for more extreme values.
  • Studies investigating batch effects were mostly recommending Com- Bat without much concern for potential limitations。基本上推荐ComBat
  • The incorporation of ComBat and other batch adjustment methods into various analysis tools (see Supplementary Material) may make them more accessible, but their useless transparent.
  • The main advice, particularly when batch effects are significant, is to ensure a balanced design in which study groups are evenly distributed across batches.

Introduction:

  • A common example is “batch effects” caused by reagents, microarray chips, and other equipment made in batches that may vary in some way, which often have systematic effects on the measurements. 举了一些batch effect的例子

  • an alternative two-step procedure has emerged. First the batch effects are estimated and removed, creating a “batch effect free” data set. Then, the statistical analyses are performed on the adjusted data without further consideration of batch effects. 大家都希望数据处理分两步,第一步,数据处理,去除批次效应,第二步统计分析,还可以让不同的人来做。但是实际情况却没这么理想。

  • Unfortunately, as we demonstrate in this paper, when the batch–group design is unbalanced, this approach may be unreliable. 批处理组设计不平衡时,这些方法会有问题。

  • Paper intensive reading (十七):Methods that remove batch effects may lead to exaggerated confidence_第1张图片

  • 可能的解决方法:one may simultaneously estimate batch effects and group differences

  • Several tools exist for batch-adjusting gene expression data. e.g. the commercial software Partek Genomics Suite, the R packages limma (Smyth and Speed, 2003), ber (Giordan, 2013), and ComBat (Johnson and others, 2007), which is included in the sva package (Leek and others, 2012). 

  • Most of these use two-way ANOVA, while ComBat uses an empirical Bayes approach to avoid over-correcting which is critical for use with small batches. ComBat适合小批量

  • 最好的方法就是一开始设计batch的时候,就保证group的平衡。。。However, the best approach is to ensure a balanced study design from the start, to avoid data analysis problems as well as the loss of statistical power that ensues when batch and group effects need to be disentangled.

正文组织架构:

1. Introduction

2. Methods for batch effect correction

2.1 Model for data with batch effects

2.2 Standard batch correction methods

3. Results

3.1 A simple sanity check

3.2 Explanation for the simple two-group comparison

       3.2.1 Group comparison from two-way ANOVA

       3.2.2 Group comparison from one-way ANOVA after batch adjustment  

3.3 Distribution of F-statistic in the general case

3.4 Examples of undesired consequences

4. Discussion

4.1 Motivation for this warning

4.2 Practical advice

正文部分内容摘录:

你可能感兴趣的:(Paper,Reading)