原文链接Chapter 11 Marker gene detection

1、目的

we identify the genes that drive separation between clusters. 鉴别直接导致分群结果的marker gene.
In the most obvious case, the marker genes for each cluster are a priori associated with particular cell types .即 marker gene 与后面的细胞类型注释直接相关
different approaches can be used to consolidate test results into a single ranking of genes for each cluster.
本文主要学习scran包的findmarkers函数其中的t-test方法。

load("fluidigm.clust.RData") 
#加载上一步分析结果
fluidigm.clust
library(scran)

2、`findMarkers`

（1）基础用法

markers.fluidigm <- findMarkers(fluidigm.clust,groups=fluidigm.clust$label)
markers.fluidigm
markers.fluidigm$`1`

如上，findMarkers的基本参数为sce对象，clust信息。
其中比较方法默认为t.test

2-1

关于`Top`列

首先要知道：t.test是基于所有的clust两两之间比较的，而不是一个clust与其它所有clust的均值作比较。
而Top列，以1为例，表示clust1分别与另外三个clust相比差异最显著top1的基因（ranked by p-value）

chosen <- "1"  #查看clust1与其它三个clust的比较
interesting <- markers.fluidigm[[chosen]]
colnames(interesting)
interesting[1:9,1:4]

2-2

热图可视化

best.set <- interesting[interesting$Top <= 3,]
#选取与其它三个clust差异排名前三，共9个基因的矩阵
getMarkerEffects <- function(x, prefix="logFC", strip=TRUE, remove.na.col=FALSE) {
  #prefix="logFC"
  #x=best.set
  regex <- paste0("^", prefix, "\\.")
  i <- grep(regex, colnames(x))
  out <- as.matrix(x[,i])
  
  if (strip) {
    colnames(out) <- sub(regex, "", colnames(out))
  }
  if (remove.na.col) {
    out <- out[,!colAnyNAs(out),drop=FALSE]
  }
  
  out
}#不知为啥下载scran包没有这个函数，找到了函数源代码，同样效果
logFCs <- getMarkerEffects(best.set)
#得到行名为基因，列名为其它三类clust与clust1de1 logFC
library(pheatmap)
pheatmap(logFCs, breaks=seq(-5, 5, length.out=101))

2-3

（2）细节筛选

基于logFC

#仅挑选上调基因
markers.fluidigm.up <- findMarkers(fluidigm.clust, direction="up",
                                   groups=fluidigm.clust$label)
interesting.up <- markers.fluidigm.up[[chosen]]
interesting.up[1:10,1:4]

#logfc>1
markers.fluidigm.up2 <- findMarkers(fluidigm.clust, direction="up", lfc=1,
                                    groups=fluidigm.clust$label)
interesting.up2 <- markers.fluidigm.up2[[chosen]]
interesting.up2[1:10,1:4]

These two settings yield a more focused set of candidate marker genes that are upregulated in cluster 1

best.set <- interesting.up2[interesting.up2$Top <= 5,]
logFCs <- getMarkerEffects(best.set)
library(pheatmap)
pheatmap(logFCs, breaks=seq(-5, 5, length.out=101))

2-4

基于比较方法

如前所述，基础用法是所有clust两两比较；
可以通过设置参数pval.type=修改p.value的比较关系

（1）pval.type="all"

A more stringent approach would only consider genes that are differentially expressed in all pairwise comparisons involving the cluster of interest.
即clust1的某gene必须与其它所有clust表达都有差异才认定为DE

markers.fluidigm.up3 <- findMarkers(fluidigm.clust, pval.type="all", 
                                direction="up",
                                groups=fluidigm.clust$label)
interesting.up3 <- markers.fluidigm.up3[[chosen]]
head(interesting.up3)

这种方法看起来很合理，但是也有缺点

此时的p.value为the maximum of the p-values from all pairwise comparisons.
A gene will only achieve a low combined p-value if it is strongly DE in all comparisons to other clusters.
However, any gene that is expressed at the same level in two or more clusters will simply not be detected.
This is likely to discard many interesting genes, especially if the clusters are finely resolved with weak separation.

（2）pval.type="any"

will be DE between at least one pair of subpopulations.
too generous.

（3）pval.type="some"

at least 50% of the individual pairwise comparisons exhibit no DE
take the middle-most value as the combined p-value.
reasonable

（3）其它参数

`block=`

handle blocking factor

m.out <- findMarkers(fluidigm.clust, 
                     block=fluidigm.clust$Coverage_Type, 
                     direction="up",
                     groups=fluidigm.clust$label) 
demo <- m.out[["1"]] 
demo[demo$Top <= 5,1:4]

`test=`

默认为t.tests；
可以设定、采用其它方法，例如wilcox,binomial等；具体可参考原文，不再记录。

save(markers.fluidigm, file = "markergene.Rdata")

以上是第十一章Clustering部分的简单流程笔记，主要学习了基于t.test的marker gene detection，其它如wilcox方法见原文Chapter 10 Marker gene detection
本系列笔记基于OSCA全流程的大致流程梳理，详细原理可参考原文。如有错误，恳请指正！
此外还有刘小泽老师整理的全文翻译笔记，详见目录。

6、Marker gene detection

1、目的

2、`findMarkers`

（1）基础用法

关于`Top`列

（2）细节筛选

基于logFC

基于比较方法

（3）其它参数

`block=`

`test=`

你可能感兴趣的:(6、Marker gene detection)

6、Marker gene detection

1、目的

2、findMarkers

（1）基础用法

关于Top列

（2）细节筛选

基于logFC

基于比较方法

（3）其它参数

block=

test=

你可能感兴趣的:(6、Marker gene detection)

2、`findMarkers`

关于`Top`列

`block=`

`test=`