RcisTarget || 从基因列表到调控网络

单细胞技术正在加强我们对细胞本身的理解。什么是细胞类型，不就是基因选择性表达的结果吗？而基因的选择表达受到一系列的转录调控，在这个意义上，细胞命运背后的驱动力在于各自转录因子表达的程序化及其靶标基因。有的细胞类型里面是不是有其特异的转录因子呢？

我们不止一次提过，细胞的特征由其表达的基因描述。

所以在单细胞数据分析中，比较重要的一步就是拿到某一群细胞的差异基因（一般是 one to others的方法）。然后，由这个基因集来推断细胞的特征，而基因的表达受到转录因子的调节。不同的转录因子调控不同的基因，进而使细胞表现出不同的状态或类型。进一步地说，细胞保持其状态又需要表观遗传信号来维持。拿到基因集之后，我们要描述该亚群的特点就转变为基因集的特征。本文我们继续跟着SCENIC的内核，来基于基因集分析转录因子（Transcription factor ，TF）。在我们应用工具的时候，往往困难的不是如何使用工具，而是为什么要用？

今天我们介绍的RcisTarget是一个R-package用于识别基因列表中转录因子(TF)的结合基序（binding motifs ）。首先我们获得一个基因列表：

library(RcisTarget)
library(Seurat)
library(SeuratData)
library(tidyverse)

mk <-FindAllMarkers( pbmc3k.final)

 mk %>% filter(cluster == 'B') %>% top_n(35,avg_logFC) %>% head()
                    p_val avg_logFC pct.1 pct.2     p_val_adj cluster      gene
CD79A.3      0.000000e+00  2.987583 0.936 0.041  0.000000e+00       B     CD79A
MS4A1.3      0.000000e+00  2.341555 0.855 0.053  0.000000e+00       B     MS4A1
CD79B.3     2.655974e-274  2.413475 0.916 0.142 3.642403e-270       B     CD79B
LINC00926.1 2.397625e-272  1.970393 0.564 0.009 3.288103e-268       B LINC00926
TCL1A.3     9.481783e-271  2.489493 0.622 0.022 1.300332e-266       B     TCL1A
HLA-DQA1.2  2.942395e-266  2.119572 0.890 0.118 4.035201e-262       B  HLA-DQA1

(mk %>% filter(cluster == 'B') %>% top_n(35,avg_logFC))$gene   -> Bcell
(mk %>% filter(cluster == 'Naive CD4 T') %>% top_n(35,avg_logFC))$gene   -> NaiveCD4T


geneLists <- list('Bcell'=Bcell,'NaiveCD4T'=NaiveCD4T)
geneLists
geneLists
$Bcell
 [1] "CD79A"     "MS4A1"     "CD79B"     "LINC00926" "TCL1A"     "HLA-DQA1"  "VPREB3"   
 [8] "HLA-DQB1"  "CD74"      "HLA-DRA"   "FCER2"     "HLA-DPB1"  "BANK1"     "HLA-DRB1" 
[15] "HLA-DPA1"  "TSPAN13"   "HLA-DQA2"  "FCRLA"     "CD37"      "HLA-DRB5"  "HVCN1"    
[22] "PKIG"      "HLA-DOB"   "BLK"       "HLA-DMA"   "GNG7"      "CD72"      "HLA-DMB"  
[29] "P2RX5"     "IGLL5"     "EAF2"      "PPAPDC1B"  "IRF8"      "SMIM14"    "IGJ"      

$NaiveCD4T
 [1] "RPS12"     "RPS27"     "RPS6"      "RPS25"     "RPL9"      "RPL31"     "RPS3A"    
 [8] "LDHB"      "RPS27A"    "MALAT1"    "RPS13"     "RPS29"     "CCR7"      "CD3D"     
[15] "NPM1"      "CD3E"      "NOSIP"     "LEF1"      "PRKCQ-AS1" "PIK3IP1"   "JUNB"     
[22] "TMEM66"    "FHIT"      "CD7"       "IL7R"      "MAL"       "LINC00176" "TCF7"     
[29] "LDLRAP1"   "C6orf48"   "LEPROTL1"  "NDFIP1"    "C12orf57"  "CD8B"      "DNAJB1"

我们知道这B 细胞和 NaiveCD4T高度相关的两个基因集，里面有我们熟悉的B 细胞的marker 。

我们要用RcisTarget 来预测转录因子，就要知道这个是什么？首先是英语词根（cis-）是顺的意思。也就是在我们的讨论体系里面是顺式转录调控因子（cisTarget），这个是什么？

真核生物转录起始过程十分复杂，往往需要多种蛋白因子的协助，转录因子与RNA聚合酶Ⅱ形成转录起始复合体，共同参与转录起始的过程。根据转录因子的作用特点可分为二类；第一类为普遍转录因子,它们与RNA聚合酶Ⅱ共同组成转录起始复合体时，转录才能在正确的位置开始。除TFⅡD以外，还发现TFⅡA，TFⅡB，TFⅡF，TFⅡE，TFⅡH等，它们在转录起始复合体组装的不同阶段起作用。第二类转录因子为组织细胞特异性转录因子，这些TF是在特异的组织细胞或是受到一些类固醇激素\生长因子或其它刺激后，开始表达某些特异蛋白质分子时，才需要的一类转录因子。

顺式作用元件(cis-acting element)存在于基因旁侧序列中能影响基因表达的序列。顺式作用元件包括启动子、增强子、调控序列和可诱导元件等，它们的作用是参与基因表达的调控。顺式作用元件本身不编码任何蛋白质，仅仅提供一个作用位点，要与反式作用因子相互作用而起作用。

作为独立的R程序包，RcisTarget使用特有的数据库。在运行RcisTarget之前，您需要下载并安装相关数据库：RcisTarget需要两种数据库来分析基因list:

1。Gene-motif rankings::提供每个motif的所有基因的排名(~得分)。
2。转录因子的motif注释。

每对基因基序的得分可以用不同的参数来进行。因此，我们提供多个数据库(motif-rankings)，根据以下几种可能性:

Species: Species of the input gene set. Available values: Human (Homo sapiens), mouse (Mus musculus) or fly (Drosophila melanogaster)
Scoring/search space: determine the search space around the transcription start site (TSS). Available values: 500 bp uptream the TSS, 5kbp or 10kbp around the TSS (e.g. 10kbp upstream and 10kbp downstream of the TSS).
Number of orthologous species taken into account to score the motifs (e.g. conservation of regions). Available values: 7 or 10 species

具体的数据库在：https://resources.aertslab.org/cistarget/

作为一个例子，可以在R里面这样下载：

featherURL <- "https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather" 
download.file(featherURL, destfile=basename(featherURL)) # saved in current dir

当然这个不容易成功。

RcisTarget执行的所有计算都是基于motif。然而，大多数用户会对TFs可能调节基因集感兴趣。motif与转录因子的关联在一个独立的文件中提供。除了被源数据所注释的基序(即直接注释)外，我们还根据基序相似性和基因序列(例如与被注释基序的其他基因的相似性)推断出了一些进一步的注释，即Motif 的注释文件。

下面用一个小的数据库来演示一下。除了完整的数据库(20k motifs)外，这是一个包含4.6k motifs的子集(人类:rcistarget .hg19. motifdb . cisbpont .500bp)。这些子集可用于示范目的。他们将为现有的主题提供相同的AUC评分。但是，为了得到更准确的结果，我们强烈推荐使用完整版本(~20k motif)，因为motif的归一化富集评分(NES)取决于数据库中motif的总数。

# From Bioconductor
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("RcisTarget.hg19.motifDBs.cisbpOnly.500bp") 

library(RcisTarget.hg19.motifDBs.cisbpOnly.500bp)
# Rankings
data(hg19_500bpUpstream_motifRanking_cispbOnly)
motifRankings <- hg19_500bpUpstream_motifRanking_cispbOnly
motifRankings

Rankings for RcisTarget.
  Organism: human (hg19)
  Number of genes: 22284 (22285 available in the full DB)
  Number of MOTIFS: 4687
** This database includes rankings up to 5050

Subset (4.6k cisbp motifs) of the Human database scoring motifs 500bp upstream the TSS (hg19-500bp-upstream-7species.mc9nr)

motif 注释文件

data(motifAnnotations_hgnc)
 motifAnnotations_hgnc[1:4,1:4]
              motif     TF directAnnotation inferred_Orthology
1:   bergman__Abd-B  HOXA9            FALSE               TRUE
2:    bergman__Aef1   ZNF8            FALSE               TRUE
3:     bergman__Cf2 ZNF853            FALSE               TRUE
4: bergman__EcR_usp  NR1H2            FALSE               TRUE


head(motifAnnotations_hgnc[(directAnnotation==TRUE) 
+                       & 
+                           (TF %in% c("FOXP3")), ])
                             motif    TF directAnnotation inferred_Orthology
1:                    cisbp__M3285 FOXP3             TRUE              FALSE
2:                    cisbp__M3286 FOXP3             TRUE              FALSE
3:                    cisbp__M5474 FOXP3             TRUE              FALSE
4:                    cisbp__M6249 FOXP3             TRUE              FALSE
5: hocomoco__FOXP3_HUMAN.H11MO.0.D FOXP3             TRUE              FALSE
6:      swissregulon__hs__FOXP3.p2 FOXP3             TRUE              FALSE
   inferred_MotifSimil annotationSource                description
1:               FALSE directAnnotation gene is directly annotated
2:               FALSE directAnnotation gene is directly annotated
3:               FALSE directAnnotation gene is directly annotated
4:               FALSE directAnnotation gene is directly annotated
5:               FALSE directAnnotation gene is directly annotated
6:               FALSE directAnnotation gene is directly annotated

一旦加载了基因列表和数据库，cisTarget()就可以使用它们。cisTarget()运行按顺序执行的步骤
(1)motif 富集分析,
(2)motif-TF注释,和
(3)选择的重要基因。

motifEnrichmentTable_wGenes <- cisTarget(geneLists, 
         motifRankings,
         motifAnnot=motifAnnotations_hgnc)
motifEnrichmentTable_wGenes_wLogo <- addLogo(motifEnrichmentTable_wGenes)

resultsSubset <- motifEnrichmentTable_wGenes_wLogo[1:10,]

library(DT)
datatable(resultsSubset[,-c("enrichedGenes", "TF_lowConf"), with=FALSE], 
          escape = FALSE, # To show the logo
          filter="top", options=list(pageLength=5))

也可以将这些步骤作为单独的命令分别运行。对用户感兴趣的一个输出进行分析，或者优化工作流以在多个基因列表上运行它。我们将分步骤来演示。

motif 富集分析

估计每个基序在基因集上的过表达（over-representation）的第一步是计算每对基序-基序集的曲线下面积(AUC)。这是根据基因集对基序排序的恢复曲线计算的(根据基序在其邻近度上的得分，基因依次递减，如motifRanking数据库中提供的那样)。

motifs_AUC <- calcAUC(geneLists, motifRankings, nCores=1)
(motifs_AUC )


AUC for 2 gene-sets and 4687 motifs.
  Organism: human (hg19)
  Subset (4.6k cisbp motifs) of the Human database scoring motifs 500bp upstream the TSS (hg19-500bp-upstream-7species.mc9nr)

AUC matrix preview:
          cisbp__M4240 cisbp__M5090 cisbp__M0288 cisbp__M4475 cisbp__M1434
Bcell       0.01490239    0.0476514  0.012773475   0.06948408   0.02396159
NaiveCD4T   0.00000000    0.0000000  0.009829234   0.01775604   0.03664447

AUC是由geneset提供的图形矩阵。原则上，AUC主要是作为下一步的输入。然而，也有可能探索分数的分布，例如:

par(mfrow = c(1,2))

for(i in c("Bcell",'NaiveCD4T')){
auc <- getAUC(motifs_AUC)[i,]
hist(auc, main=i, xlab="AUC histogram",
     breaks=100, col="#ff000050", border="darkred")
nes3 <- (3*sd(auc)) + mean(auc)
abline(v=nes3, col="red")
}

motif-TF注释

显著性motif的选择是基于归一化富集评分( Normalized Enrichment Score,NES)进行的。每个motif的NES是根据基因集中所有基序的AUC分布[(x-mean)/sd]计算。那些通过给定阈值(默认为3.0)的motifs 被认为是重要的。

motifEnrichmentTable <- addMotifAnnotation(motifs_AUC, nesThreshold=3,
                                           motifAnnot=motifAnnotations_hgnc,
                                           highlightTFs=list(Bcell="RFX5",NaiveCD4T='ZNF274'))
head(motifEnrichmentTable[,-"TF_lowConf", with=FALSE])
   geneSet        motif   NES   AUC highlightedTFs TFinDB                 TF_highConf
1:   Bcell cisbp__M4510 13.80 0.281           RFX5     **   RFX5 (directAnnotation). 
2:   Bcell cisbp__M4546 13.10 0.267           RFX5     **   RFX5 (directAnnotation). 
3:   Bcell cisbp__M4476 12.60 0.258           RFX5     **   RFX5 (directAnnotation). 
4:   Bcell cisbp__M0830  7.02 0.150           RFX5                                   
5:   Bcell cisbp__M4575  6.21 0.134           RFX5     **   RFX5 (directAnnotation). 
6:   Bcell cisbp__M6544  5.83 0.127           RFX5        HIVEP1 (directAnnotation).

找出每个基序富集最佳的基因

从RcisTarget搜索motif 的富集结果，找到一个motif 是“enriched”并不意味着所有的gene-list的motif 高分。这样，工作流程的第三步是确定哪些基因(基因集)对每个重要的motifs是高度排序的。有两种方法来识别这些基因:(1)相当于那些用于iRegulon和i-cisTarget(method=“iCisTarget”方法,如果运行时间不是问题，建议使用该方法),和(2)更快的一种是实现基于一个近似使用平均分布在每一个等级(method=“aprox”方法,扫描多个基因集)。

motifEnrichmentTable_wGenes <- addSignificantGenes(motifEnrichmentTable,
                      rankings=motifRankings, 
                      geneSets=geneLists)

motifEnrichmentTable_wGenes[1:4,]
   geneSet        motif   NES   AUC highlightedTFs TFinDB               TF_highConf
1:   Bcell cisbp__M4510 13.80 0.281           RFX5     ** RFX5 (directAnnotation). 
2:   Bcell cisbp__M4546 13.10 0.267           RFX5     ** RFX5 (directAnnotation). 
3:   Bcell cisbp__M4476 12.60 0.258           RFX5     ** RFX5 (directAnnotation). 
4:   Bcell cisbp__M0830  7.02 0.150           RFX5                                 
                                                                                             TF_lowConf nEnrGenes rankAtMax
1: RFX1; RFX2; RFX3 (inferredBy_MotifSimilarity). RFX6; RFX7 (inferredBy_MotifSimilarity_n_Orthology).         10       184
2:                                                      RFX6 (inferredBy_MotifSimilarity_n_Orthology).         10       184
3:                                                      RFX6 (inferredBy_MotifSimilarity_n_Orthology).         10       324
4:                                                                                                             16      3680
                                                                                                                    enrichedGenes
1:                                              CD74;HLA-DMA;HLA-DMB;HLA-DOB;HLA-DPA1;HLA-DPB1;HLA-DQB1;HLA-DRA;HLA-DRB1;HLA-DRB5
2:                                              CD74;HLA-DMA;HLA-DMB;HLA-DOB;HLA-DPA1;HLA-DPB1;HLA-DQB1;HLA-DRA;HLA-DRB1;HLA-DRB5
3:                                              CD74;HLA-DMA;HLA-DMB;HLA-DOB;HLA-DPA1;HLA-DPB1;HLA-DQB1;HLA-DRA;HLA-DRB1;HLA-DRB5
4: BANK1;EAF2;HLA-DMA;HLA-DMB;HLA-DOB;HLA-DPA1;HLA-DPB1;HLA-DQA1;HLA-DQA2;HLA-DQB1;HLA-DRA;HLA-DRB1;HLA-DRB5;IGJ;PPAPDC1B;TSPAN13

geneSetName <- "Bcell"
selectedMotifs <-sample(motifEnrichmentTable$motif, 3)
par(mfrow=c(2,2))
getSignificantGenes(geneLists[[geneSetName]], 
                    motifRankings,
                    signifRankingNames=selectedMotifs,
                    plotCurve=TRUE, maxRank=5000, genesFormat="none",
                    method="aprox")

图中红线为各motif的恢复曲线平均值，绿线为平均值+标准差，蓝线为当前motif的恢复曲线。当前motif与绿色曲线的最大距离点(mean+sd)，为选择的最大富集等级。所有等级较低的基因都被认为是富集的。

RcisTarget的最终输出是一个包含以下领域中基序富集及其注释信息的表:


    geneSet: Name of the gene set
    motif: ID of the motif
    NES: Normalized enrichment score of the motif in the gene-set
    AUC: Area Under the Curve (used to calculate the NES)
    TFinDB: Indicates whether the highlightedTFs are included within the high confidence annotation (two asterisks) or low confidence annotation (one asterisk).
    TF_highConf: Transcription factors annotated to the motif according to ‘motifAnnot_highConfCat’.
    TF_lowConf: Transcription factors annotated to the motif according to ‘motifAnnot_lowConfCat’.
    enrichedGenes: Genes that are highly ranked for the given motif.
    nErnGenes: Number of genes highly ranked
    rankAtMax: Ranking at the maximum enrichment, used to determine the number of enriched genes.

motifEnrichmentTable_wGenes_wLogo <- addLogo(motifEnrichmentTable_wGenes)

resultsSubset <- motifEnrichmentTable_wGenes_wLogo[1:10,]

library(DT)
datatable(resultsSubset[,-c("enrichedGenes", "TF_lowConf"), with=FALSE], 
          escape = FALSE, # To show the logo
          filter="top", options=list(pageLength=5))

TFs注释富集的motif

注意，TFs是基于提供的motif注释。它们可以作为选择相关motif或对某些TFs进行排序的参考，但motif注释并不意味着表中出现的所有TFs都对基因列表进行调控。

anotatedTfs  <- lapply(split(motifEnrichmentTable_wGenes$TF_highConf,
                            motifEnrichmentTable$geneSet),
                     function(x) {
                           genes <- gsub(" \\(.*\\). ", "; ", x, fixed=FALSE)
                          genesSplit <- unique(unlist(strsplit(genes, "; ")))
                         return(genesSplit)
                      })
anotatedTfs  
$Bcell
 [1] "RFX5"   "HIVEP1" "RELA"   "NFKB1"  "GTF3A"  "ZXDA"   "ZXDB"   "ZXDC"   "EBF1"   "REL"    "RREB1"  "ZNF821" "SOX21"  "SOX9"  
[15] "GFI1B"  "NFKB2"  "PPARA"  "SMAD3"  "PRDM4"  "STAT3"  "NFYA"   "SMAD4"  "NKX2-1" "SRY"    "ETV5"   "MECOM"  "POU2F1" "HIVEP2"
[29] "HIVEP3" "ZNF831" "POU2F2" "SOX3"   "TBX3"   "DUX4"   "TBX1"   "SOX2"  

$NaiveCD4T
 [1] "ERG"    "ETS1"   "ERF"    "FLI1"   "GABPA"  "ELK4"   "ELF1"   "ETV2"   "ETV6"   "FEV"    "ELK1"   "ETV5"   "ETV3"   "ELK3"  
[15] "HSF1"   "ETV1"   "ETV4"   "FOXN1"  "FOXN2"  "FOXN3"  "FOXN4"  "FOXR1"  "FOXR2"  "NR2C2"  "ETV7"   "ELF2"   "ELF4"   "ESRRG" 
[29] "FOXO3"  "SP2"    "ESRRA"  "LEF1"   "TCF7"   "TCF7L1" "TCF7L2" "ZNF274"

到这里，两种细胞类型的基因集转化为相应的转录因子集，也就是由基因表达层面转到了转录调控层面。找到转录因子之后，我们可以在jaspar上面来查看进一步的信息。为什么Bcell 是这些转录因子，而T 又是那些呢？关于人类转录因子，不得不提2018年的一篇综述文章：

所以我们看到的大部分转录因子字母缩写不认识可以或者不知道有什么关系，是因为我们的背景知识不够扎实啊。在这篇综述中：

This review considers how TFs are identified and functionally characterized, principally through the lens of a catalog of over 1,600 likely human TFs and binding motifs for two-thirds of them.

创建网络

signifMotifNames <- motifEnrichmentTable$motif[1:5]

incidenceMatrix <- getSignificantGenes(geneLists$Bcell, 
                                       motifRankings,
                                       signifRankingNames=signifMotifNames,
                                       plotCurve=TRUE, maxRank=5000, 
                                       genesFormat="incidMatrix",
                                       method="aprox")$incidMatrix
incidenceMatrix
             BANK1 BLK CD37 CD72 CD74 CD79A CD79B EAF2 FCER2 FCRLA GNG7 HLA-DMA HLA-DMB HLA-DOB HLA-DPA1 HLA-DPB1 HLA-DQA1
cisbp__M4510     0   0    0    0    1     0     0    0     0     0    0       1       1       1        1        1        0
cisbp__M4546     0   0    0    0    1     0     0    0     0     0    0       1       1       1        1        1        0
cisbp__M4476     0   0    0    0    1     0     0    0     0     0    0       1       1       1        1        1        0
cisbp__M0830     1   0    0    0    0     0     0    1     0     0    0       1       1       1        1        1        1
cisbp__M4575     0   0    0    0    1     0     0    0     0     0    0       1       1       1        1        1        1
             HLA-DQA2 HLA-DQB1 HLA-DRA HLA-DRB1 HLA-DRB5 HVCN1 IGJ IGLL5 IRF8 MS4A1 P2RX5 PKIG PPAPDC1B TCL1A TSPAN13 VPREB3
cisbp__M4510        0        1       1        1        1     0   0     0    0     0     0    0        0     0       0      0
cisbp__M4546        0        1       1        1        1     0   0     0    0     0     0    0        0     0       0      0
cisbp__M4476        0        1       1        1        1     0   0     0    0     0     0    0        0     0       0      0
cisbp__M0830        1        1       1        1        1     0   1     0    0     0     0    0        1     0       1      0
cisbp__M4575        0        1       1        1        1     0   0     0    0     0     0    0        1     0       0      0

看到incidenceMatrix 面熟吗？我们可以用igraph直接构建网络啊:

library(igraph)
mugh<- graph_from_incidence_matrix(incidenceMatrix, directed = F)
mugh

IGRAPH 7199158 UN-B 38 58 -- 
+ attr: type (v/l), name (v/c)
+ edges from 7199158 (vertex names):
 [1] cisbp__M4510--CD74     cisbp__M4510--HLA-DMA  cisbp__M4510--HLA-DMB  cisbp__M4510--HLA-DOB  cisbp__M4510--HLA-DPA1
 [6] cisbp__M4510--HLA-DPB1 cisbp__M4510--HLA-DQB1 cisbp__M4510--HLA-DRA  cisbp__M4510--HLA-DRB1 cisbp__M4510--HLA-DRB5
[11] cisbp__M4546--CD74     cisbp__M4546--HLA-DMA  cisbp__M4546--HLA-DMB  cisbp__M4546--HLA-DOB  cisbp__M4546--HLA-DPA1
[16] cisbp__M4546--HLA-DPB1 cisbp__M4546--HLA-DQB1 cisbp__M4546--HLA-DRA  cisbp__M4546--HLA-DRB1 cisbp__M4546--HLA-DRB5
[21] cisbp__M4476--CD74     cisbp__M4476--HLA-DMA  cisbp__M4476--HLA-DMB  cisbp__M4476--HLA-DOB  cisbp__M4476--HLA-DPA1
[26] cisbp__M4476--HLA-DPB1 cisbp__M4476--HLA-DQB1 cisbp__M4476--HLA-DRA  cisbp__M4476--HLA-DRB1 cisbp__M4476--HLA-DRB5
[31] cisbp__M0830--BANK1    cisbp__M0830--EAF2     cisbp__M0830--HLA-DMA  cisbp__M0830--HLA-DMB  cisbp__M0830--HLA-DOB 
[36] cisbp__M0830--HLA-DPA1 cisbp__M0830--HLA-DPB1 cisbp__M0830--HLA-DQA1 cisbp__M0830--HLA-DQA2 cisbp__M0830--HLA-DQB1
+ ... omitted several edges

更改一些属性

V(mugh)[c(1:5)]$color  = 'red'
V(mugh)[c(6:38)]$color ='green'



E(mugh)[grep("cisbp__M4510",as_ids(E(mugh)))]$color <- "red"
E(mugh)[grep("cisbp__M4546",as_ids(E(mugh)))]$color <- "blue"
E(mugh)[grep("cisbp__M0830",as_ids(E(mugh)))]$color <- "green"
E(mugh)[grep("cisbp__M4575",as_ids(E(mugh)))]$color <- "yellow"
E(mugh)[grep("cisbp__M4476",as_ids(E(mugh)))]$color <- "pink"

画出各种布局的调控网络

layouts <- grep("^layout_", ls("package:igraph"), value=TRUE)[-1] 
# Remove layouts that do not apply to our graph.
layouts <- layouts[!grepl("bipartite|merge|norm|sugiyama|tree", layouts)]

length(layouts)
par(mfrow=c(3,5), mar=c(1,1,1,1))
for (layout in layouts) {
    print(layout)
    l <- do.call(layout, list(mugh)) 
    plot(mugh, edge.arrow.mode=0, layout=l, main=layout) }

我们看到有些基因与所有的motif都没有链接，我们把它们去掉。

library(reshape2)
edges <- melt(incidenceMatrix)
edges <- edges[which(edges[,3]==1),1:2]
colnames(edges) <- c("from","to")


library(visNetwork)
motifs <- unique(as.character(edges[,1]))
genes <- unique(as.character(edges[,2]))
nodes <- data.frame(id=c(motifs, genes),   
                    label=c(motifs, genes),    
                    title=c(motifs, genes), # tooltip 
                    shape=c(rep("diamond", length(motifs)), rep("elypse", length(genes))),
                    color=c(rep("purple", length(motifs)), rep("skyblue", length(genes))))
visNetwork(nodes, edges) %>% visOptions(highlightNearest = TRUE, 
                                        nodesIdSelection = TRUE)

Lambert SA, Jolma A, Campitelli LF, Das PK, Yin Y, Albu M, Chen X, Taipale J, Hughes TR, Weirauch MT. The Human Transcription Factors. Cell. 2018;175:598–9.
Cell-type–specific || 单细胞文章新范式
细胞身份何以在分裂中得以保持？
https://www.affixes.org/alpha/c/cis-.html
https://www.cnblogs.com/wangprince2017/p/9813731.html
https://zhuanlan.zhihu.com/p/156613539
https://cloud.tencent.com/developer/article/1376739
https://www.cell.com/cell/fulltext/S0092-8674(18)30106-5