scDAPA:从单细胞转录组数据中检测可变聚腺苷酸化(APA)

多聚腺苷化(polyadenylation,poly(A))是转录本成熟过程中在3'末端发生的重要修饰步骤。选择性多聚腺苷化(Alternative Poly(A),APA)是真核生物中一种广泛存在的基础调控机制,不仅增加细胞中转录组和蛋白组的复杂性,并且影响目标RNA的功能、稳定性、定位和翻译效率。Poly(A)位点标识着转录本末尾,其准确识别是基因注释和转录调控机制研究的基础。APA表现出组织特异性,对细胞增殖和分化具有重要作用。

选择性聚腺苷酸(APA)在真核生物的mRNA稳定性和功能中起着关键的转录后调控作用。单细胞RNA-seq (scRNA-seq)是发现基因表达水平细胞异质性的有力工具。最常用的 10× scRNA-seq 3’丰富的建库策略, 使我们能够将APA的研究分辨率提高到单细胞水平。然而,目前还没有可用的计算工具来调查来自scRNA-seq数据的APA概况。

在这里,我们提出了一个软件包scDAPA检测和可视化动态APA从scRNA-seq数据。以bam/sam文件和细胞簇标签为输入,scDAPA使用基于直方图的方法和Wilcoxon秩和检验检测APA动态,并使用动态APA可视化候选基因。对标结果表明,scDAPA能从scRNA-seq数据中有效识别不同细胞群中具有动态APA的基因。 :https://scdapa.sourceforge.io.

一、APA类型:

(1)3’UTRAPA

大部分APA位点处于含有顺势作用元件(ciselements)的3’UTR区,3’UTR-APA会对转录后基因调控产生许多影响,如mRNA稳定性、mRNA核转移和定位以及编码蛋白定位。

scDAPA:从单细胞转录组数据中检测可变聚腺苷酸化(APA)_第1张图片
图1. 3’UTR APA示意图[1]

(2)Upstream Region APA(UR-APA)

UR-APA位点位于最后一个外显子前,UR-APA引起末端外显子的可变表达,导致mRNA编码序列和3’UTR的变化。根据polyadenylation sites(PAS)的剪接模型,可将UR-APA分为两类:Skipped terminal exon和Composite terminal exon。Skipped terminal exon略过了末端外显子,而Composite terminal exon则由内部外显子延伸产生。

scDAPA:从单细胞转录组数据中检测可变聚腺苷酸化(APA)_第2张图片
图2. UR-APA示意图[1]
unset PYTHONPATH 
source  software/miniconda3/bin/activate software/miniconda3/envs/velocyto

10X_RNA/Development/scDAPA/extractReads.sh -r  10X_RNA/Development/velocyto/example/CellRanger/pbmc5k/outs/possorted_genome_bam.bam -c 10X_RNA/Development/velocyto/example/CellRanger/pbmc5k/outs/analysis/clustering/kmeans_10_clusters/clusters.csv  -o ./result


10X_RNA/Development/scDAPA/extractGenes.sh -i10X_RNA/pipeline2.1/database/10X_Ref/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf  -o hg38.gene.gff 
export PATH=bedtools2/bin/:$PATH
10X_RNA/Development/scDAPA/annotate3Ends.sh  -d 10X_RNA/Development/scDAPA/example/result/  -g  10X_RNA/Development/scDAPA/example/hg38.gene.gff 

scDAPA:从单细胞转录组数据中检测可变聚腺苷酸化(APA)_第3张图片
anno
Column Name Explanation
seqname The name of the sequence
source The program that generated this feature
feature The name of this type of feature
start The starting position of the feature in the sequence
end The ending position of the feature
score A score between 0 and 1000
strand Valid entries include "+", "-", or "."
frame If the feature is not a coding exon, the value should be "."
gene Gene ID and name
start of read The starting positions of reads annoted to this gene, separated by comma
end of read The ending positions of reads annoted to this gene, separated by comma

将上述结果导入R包scDAPAminer

> library(scDAPAminer)
> # creat a folder named 'stat'
> # 1. only compare two specific cell groups
> scDAPAdetect(file1='./result/1.anno',file2='./result/2.anno',type='f2f',output_dir='./stat')
> 
> # 2. compare every two cell groups stored in the ./result directory
> scDAPAdetect(dir='./result',type='d',output_dir='./stat',bin_size=100,count_cutoff=20)
scDAPA:从单细胞转录组数据中检测可变聚腺苷酸化(APA)_第4张图片
Column Name Explanation
chr Name of the chromosome/scaffold
gene Gene ID and name
meanlen1 Mean length of 3′ ends to gene's start site in cell group 1
meanlen2 Mean length of 3′ ends to gene's start site in cell group 2
SDD Site distribution difference SDD∈[0,1]
p.value Statistical test p values
p.adjust Adjusted p values
> dp = scDAPAview(files=c('./result/1.anno','./result/2.anno'),alt_names=c('cell_A','cell_B'),gtf=gtf,gene_id='ENSG00000160062',legend.position = c(0.2,0.8))
> 
> # customize colour theme
> library(ggsci)
> dp + scale_colour_aaas()
> 
> # customize legend title
> dp + labs(colour = "Cell type")
> 
> # customize legend position
> dp + theme(legend.position = c(0.6, 0.9))
> 
> # customize simultaneuouly
> dp + scale_colour_aaas() + labs(colour = "Cell type") + theme(legend.position = c(0.6, 0.9))

scDAPA:从单细胞转录组数据中检测可变聚腺苷酸化(APA)_第5张图片


[1]Tian B, Manley J L. Alternative polyadenylation of mRNA precursors[J]. Nature Reviews Molecular Cell Biology, 2016, 18(1):18.

[2]Abdelghany S E, Hamilton M, Jacobi J L, et al. A survey of the sorghum transcriptome using single-molecule long reads[J]. Nature Communications, 2016, 7:11706.

http://www.frasergen.com/cn/info_173.aspx?itemid=258

Congting Ye, Qian Zhou, Xiaohui Wu, Chen Yu, Guoli Ji, Daniel R Saban, Qingshun Q Li, scDAPA: detection and visualization of dynamic alternative polyadenylation from single cell RNA-seq data, Bioinformatics, , btz701, https://doi.org/10.1093/bioinformatics/btz701

高通量测序技术在可选择性多聚腺苷酸化研究中的应用

你可能感兴趣的:(scDAPA:从单细胞转录组数据中检测可变聚腺苷酸化(APA))