6、Marker gene detection

原文链接Chapter 11 Marker gene detection

1、目的

  • we identify the genes that drive separation between clusters. 鉴别直接导致分群结果的marker gene.
  • In the most obvious case, the marker genes for each cluster are a priori associated with particular cell types .即 marker gene 与后面的细胞类型注释直接相关
  • different approaches can be used to consolidate test results into a single ranking of genes for each cluster.
  • 本文主要学习scran包的findmarkers函数其中的t-test方法。
load("fluidigm.clust.RData") 
#加载上一步分析结果
fluidigm.clust
library(scran)  

2、findMarkers

(1)基础用法

markers.fluidigm <- findMarkers(fluidigm.clust,groups=fluidigm.clust$label)
markers.fluidigm
markers.fluidigm$`1`
  • 如上,findMarkers的基本参数为sce对象,clust信息。
  • 其中比较方法默认为t.test


    2-1
关于Top
  • 首先要知道:t.test是基于所有的clust两两之间比较的,而不是一个clust与其它所有clust的均值作比较。
  • 而Top列,以1为例,表示clust1分别与另外三个clust相比差异最显著top1的基因(ranked by p-value)
chosen <- "1"  #查看clust1与其它三个clust的比较
interesting <- markers.fluidigm[[chosen]]
colnames(interesting)
interesting[1:9,1:4]
2-2
  • 热图可视化
best.set <- interesting[interesting$Top <= 3,]
#选取与其它三个clust差异排名前三,共9个基因的矩阵
getMarkerEffects <- function(x, prefix="logFC", strip=TRUE, remove.na.col=FALSE) {
  #prefix="logFC"
  #x=best.set
  regex <- paste0("^", prefix, "\\.")
  i <- grep(regex, colnames(x))
  out <- as.matrix(x[,i])
  
  if (strip) {
    colnames(out) <- sub(regex, "", colnames(out))
  }
  if (remove.na.col) {
    out <- out[,!colAnyNAs(out),drop=FALSE]
  }
  
  out
}#不知为啥下载scran包没有这个函数,找到了函数源代码,同样效果
logFCs <- getMarkerEffects(best.set)
#得到行名为基因,列名为其它三类clust与clust1de1 logFC
library(pheatmap)
pheatmap(logFCs, breaks=seq(-5, 5, length.out=101))
2-3

(2)细节筛选

基于logFC
#仅挑选上调基因
markers.fluidigm.up <- findMarkers(fluidigm.clust, direction="up",
                                   groups=fluidigm.clust$label)
interesting.up <- markers.fluidigm.up[[chosen]]
interesting.up[1:10,1:4]
#logfc>1
markers.fluidigm.up2 <- findMarkers(fluidigm.clust, direction="up", lfc=1,
                                    groups=fluidigm.clust$label)
interesting.up2 <- markers.fluidigm.up2[[chosen]]
interesting.up2[1:10,1:4]

These two settings yield a more focused set of candidate marker genes that are upregulated in cluster 1

best.set <- interesting.up2[interesting.up2$Top <= 5,]
logFCs <- getMarkerEffects(best.set)
library(pheatmap)
pheatmap(logFCs, breaks=seq(-5, 5, length.out=101))
2-4
基于比较方法
  • 如前所述,基础用法是所有clust两两比较;
  • 可以通过设置参数pval.type=修改p.value的比较关系

(1)pval.type="all"

  • A more stringent approach would only consider genes that are differentially expressed in all pairwise comparisons involving the cluster of interest.
  • 即clust1的某gene必须与其它所有clust表达都有差异才认定为DE
markers.fluidigm.up3 <- findMarkers(fluidigm.clust, pval.type="all", 
                                direction="up",
                                groups=fluidigm.clust$label)
interesting.up3 <- markers.fluidigm.up3[[chosen]]
head(interesting.up3)

这种方法看起来很合理,但是也有缺点

  • 此时的p.value为the maximum of the p-values from all pairwise comparisons.
  • A gene will only achieve a low combined p-value if it is strongly DE in all comparisons to other clusters.
  • However, any gene that is expressed at the same level in two or more clusters will simply not be detected.
  • This is likely to discard many interesting genes, especially if the clusters are finely resolved with weak separation.

(2)pval.type="any"

  • will be DE between at least one pair of subpopulations.
  • too generous.

(3)pval.type="some"

  • at least 50% of the individual pairwise comparisons exhibit no DE
  • take the middle-most value as the combined p-value.
  • reasonable

(3)其它参数

block=
  • handle blocking factor
m.out <- findMarkers(fluidigm.clust, 
                     block=fluidigm.clust$Coverage_Type, 
                     direction="up",
                     groups=fluidigm.clust$label) 
demo <- m.out[["1"]] 
demo[demo$Top <= 5,1:4]
test=
  • 默认为t.tests;
  • 可以设定、采用其它方法,例如wilcox,binomial等;具体可参考原文,不再记录。
save(markers.fluidigm, file = "markergene.Rdata")

以上是第十一章Clustering部分的简单流程笔记,主要学习了基于t.test的marker gene detection,其它如wilcox方法见原文Chapter 10 Marker gene detection
本系列笔记基于OSCA全流程的大致流程梳理,详细原理可参考原文。如有错误,恳请指正!
此外还有刘小泽老师整理的全文翻译笔记,详见目录。

你可能感兴趣的:(6、Marker gene detection)