Rcis Target：基因集找其调控因子和motif

写在前面：今天学习了Rcis Target这个包，每一次在学习一个包之前，苦恼和痛苦是不知道这个R包用来do for what？why to do？今天看了https://www.jianshu.com/p/6e1d71db4220周老师的，感觉讲的就是我自己，下面我将用我的生物学知识来讲关于Rcis Target的理解，假如有错的地方欢迎大家指正与批评！！！

前言：我们在单细胞分析的时候，通过差异基因来判断哪些clusters是什么类型的细胞，所以每一种细胞类型是一些基因（即基因集）选择表达的结果，而基因的表达是受转录调控的（分子生物学知识），单从转录因子（TF）这一个方面来说，不同的转录因子调控不同的基因，进而使细胞表现出不同的状态或类型。当然转录调控不只有TF调控还有很多其他的如：顺式作用元件。

1.do for what：用B细胞为例，我们可以对B细胞中高度相关的基因集（即：让这些细胞为B细胞的基因）进行Rcis Target分析，就可以知道哪一些TF在B细胞是富集或特有的，笼统的说Rcis Target分析就是用来预测基因集的转录因子。（还可以这样运用，即：当发现tumor中的T细胞数比normal中的T细胞数明显多，我们可以将tumor的T细胞与normal的T细胞进行比较，通过筛选过滤找出二者具有统计学差异的基因集，进行Rcis Target分析，就可以看出是什么TF导致了tumor和normal中T细胞的差异）

需要三个文件，基因集，Gene-motif rankings::提供每个motif的所有基因的排名(~得分)，转录因子的motif注释

2.流程
2.1.得到genelist

library(RcisTarget)
library(Seurat)
library(SeuratData)
library(tidyverse)
#导入seruat对象
scRNA <- readRDS("scRNAsub.rds")   

#获取genelist
Idents(scRNA_harmony) <- "celltype"
geneLists <- FindAllMarkers(scRNA)  

#筛选有意义的genelist
genelists <- geneLists %>% filter(cluster == 'endothelial-cells') %>% top_n(35,avg_log2FC)
head(geneLists) 
genelists <- genelists$gene  #这里也可以像周老师一样弄成一个list

2.2使用RcisTarget内置的转录因子的motif注释数据

#这里有两个motif注释数据可以选  根据自己分析对象进行选择
# mouse:
# data(motifAnnotations_mgi)
# human:
data(motifAnnotations_hgnc)  #选择人的

2.3导入Gene-motif rankings:提供每个motif的所有基因的排名(~得分)

#这个数据是比较难的，这个数据要从一个数据库上下载，
#很大，每一个文件大概1G，经常会下载不完整，会导致读入不行，会报错
#方法一，用R直接下载并读取
featherURL <- "https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather" 
download.file(featherURL, destfile=basename(featherURL)) #这样就会直接下载到当前工作目录下
#读入
motifRankings <- importRankings("hg19-tss-centered-10kb-7species.mc9nr.feather")      #这里假如数据下载不全读取R会奔溃并restart，不是电脑的配置问题

#方法二，读取我自己提前下好的数据，
motifRankings <- importRankings("cisTarget_databases/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather")  #我的数据存放在cisTarget_databases文件夹下，跟网上下载有啥不同呢   hg19与hg38就是数据的版本不同，10kb与500bp就是想要研究基因上下调控的范围   10kb更大  所以读入hg19或hg38都可以。

假如大家想要我的参考数据可以私信我

77ba7845a6394c1905d538fd49efc24.png

这样3个读入的文件已经搞定了
2.4开始分析
可以直接使用cisTarget进行一步法分析，cisTarget()运行按顺序执行的步骤
(1)motif 富集分析,
(2)motif-TF注释,和
(3)选择的重要基因。

motifEnrichmentTable_wGenes <- cisTarget(genelists, 
         motifRankings,
         motifAnnot=motifAnnotations_hgnc)#一步搞定分析

2.5,分析结果解读

2f1b03028e2867ffc78df27141f2bfe.png

RcisTarget输出：
RcisTarget的最终输出的data.table（motifEnrichmentTable_wGenes）包含有关motif 富集的以下信息：
geneSet：基因集的名称
motif：motif的ID
NES：基因集中基序的标准化富集得分
AUC：曲线下的面积（用于计算NES）
TFinDB：指示突出显示的TF是包含在高置信度注释（两个星号）还是低置信度注释（一个星号）中。
TF_highConf：根据'motifAnnot_highConfCat'注释到基序的转录因子。 这个是最主要的就可以从这里看出输入的基因集富集到了哪一些转录因子，从结果中也可以看出一个TF可以与多个motif结合调节不同的基因。
TF_lowConf：根据'motifAnnot_lowConfCat'注释到主题的转录因子。
erichedGenes：在给定motif上排名较高的基因。
nErnGenes：高度排名的基因数量
rankAtMax：在最大富集时的排名，用于确定富集的基因数。

2.6结果plot：plot前几个富集到的motif和转录因子，蛮鸡肋的

motifEnrichmentTable_wGenes_wLogo <- addLogo(motifEnrichmentTable_wGenes)

resultsSubset <- motifEnrichmentTable_wGenes_wLogo[1:10,]    #

library(DT)
datatable(resultsSubset[,-c("enrichedGenes", "TF_lowConf"), with=FALSE], 
          escape = FALSE, # To show the logo
          filter="top", options=list(pageLength=5))

c8b0b321823427c833f6fbe27afa0fa.png

关于motif的logo如何看：https://zhuanlan.zhihu.com/p/428416814

显然一步分析法很难得到我们想要的结果，我们也可以将这些步骤作为单独的命令分别运行。对感兴趣的一个输出进行分析，所以建议分步法分析。

3.分步法分析
3.1 motif 富集分析（就是计算输入的基因集富集到哪些motif，并对富集的motif进行打分）

# 1. Calculate AUC
motifs_AUC <- calcAUC(genelists, motifRankings)

3.2 motif-TF注释 对富集到的motif进行筛选，默认筛选分数为3的motifs，进行TF富集。

# 2. Select significant motifs, add TF annotation & format as table
motifEnrichmentTable <- addMotifAnnotation(motifs_AUC,
                                           nesThreshold=3,
                                           motifAnnot=motifAnnotations_hgnc)

3.3 找出每个基序富集的基因 由于一种motif可能调控多个基因，看看每一个motif调控哪一些基因，那么就可以知道TF调控那些基因了，这就是说Rcis Target的分析是基于motif的

## 3. Identify significant genes for each motif
# (i.e. genes from the gene set in the top of the ranking)
# Note: Method 'iCisTarget' instead of 'aprox' is more accurate, but slower
motifEnrichmentTable_wGenes <- addSignificantGenes(motifEnrichmentTable, 
                                                   geneSets=genelists,
                                                   rankings=motifRankings, 
                                                   nCores=1,
                                                   method="aprox")

3.4 利用分步文件构建网络图（基因和motif）

signifMotifNames <- motifEnrichmentTable$motif[1:3]

incidenceMatrix <- getSignificantGenes(genelists, 
                                       motifRankings,
                                       signifRankingNames=signifMotifNames,
                                       plotCurve=TRUE, maxRank=5000, 
                                       genesFormat="incidMatrix",
                                       method="aprox")$incidMatrix

library(reshape2)
edges <- melt(incidenceMatrix)
edges <- edges[which(edges[,3]==1),1:2]
colnames(edges) <- c("from","to")
library(visNetwork)
motifs <- unique(as.character(edges[,1]))
genes <- unique(as.character(edges[,2]))
nodes <- data.frame(id=c(motifs, genes),   
                    label=c(motifs, genes),    
                    title=c(motifs, genes), # tooltip 
                    shape=c(rep("diamond", length(motifs)), rep("elypse", length(genes))),
                    color=c(rep("purple", length(motifs)), rep("skyblue", length(genes))))
visNetwork(nodes, edges) %>% visOptions(highlightNearest = TRUE, 
                                        nodesIdSelection = TRUE)  #这种网络图都是自定义的，也可以画TF与其互作基因的网络图

建议大家可以看看周老师的：
https://www.jianshu.com/u/06ae70ef31bc

4.References:
https://cloud.tencent.com/developer/article/1734803
https://www.jianshu.com/p/6e1d71db4220
https://www.jianshu.com/p/fd82f17db2d8

Rcis Target：基因集找其调控因子和motif

你可能感兴趣的:(Rcis Target：基因集找其调控因子和motif)