多聚腺苷化(polyadenylation,poly(A))是转录本成熟过程中在3'末端发生的重要修饰步骤。选择性多聚腺苷化(Alternative Poly(A),APA)是真核生物中一种广泛存在的基础调控机制,不仅增加细胞中转录组和蛋白组的复杂性,并且影响目标RNA的功能、稳定性、定位和翻译效率。Poly(A)位点标识着转录本末尾,其准确识别是基因注释和转录调控机制研究的基础。APA表现出组织特异性,对细胞增殖和分化具有重要作用。
选择性聚腺苷酸(APA)在真核生物的mRNA稳定性和功能中起着关键的转录后调控作用。单细胞RNA-seq (scRNA-seq)是发现基因表达水平细胞异质性的有力工具。最常用的 10× scRNA-seq 3’丰富的建库策略, 使我们能够将APA的研究分辨率提高到单细胞水平。然而,目前还没有可用的计算工具来调查来自scRNA-seq数据的APA概况。
在这里,我们提出了一个软件包scDAPA检测和可视化动态APA从scRNA-seq数据。以bam/sam文件和细胞簇标签为输入,scDAPA使用基于直方图的方法和Wilcoxon秩和检验检测APA动态,并使用动态APA可视化候选基因。对标结果表明,scDAPA能从scRNA-seq数据中有效识别不同细胞群中具有动态APA的基因。 :https://scdapa.sourceforge.io.
一、APA类型:
(1)3’UTRAPA
大部分APA位点处于含有顺势作用元件(ciselements)的3’UTR区,3’UTR-APA会对转录后基因调控产生许多影响,如mRNA稳定性、mRNA核转移和定位以及编码蛋白定位。
(2)Upstream Region APA(UR-APA)
UR-APA位点位于最后一个外显子前,UR-APA引起末端外显子的可变表达,导致mRNA编码序列和3’UTR的变化。根据polyadenylation sites(PAS)的剪接模型,可将UR-APA分为两类:Skipped terminal exon和Composite terminal exon。Skipped terminal exon略过了末端外显子,而Composite terminal exon则由内部外显子延伸产生。
unset PYTHONPATH
source software/miniconda3/bin/activate software/miniconda3/envs/velocyto
10X_RNA/Development/scDAPA/extractReads.sh -r 10X_RNA/Development/velocyto/example/CellRanger/pbmc5k/outs/possorted_genome_bam.bam -c 10X_RNA/Development/velocyto/example/CellRanger/pbmc5k/outs/analysis/clustering/kmeans_10_clusters/clusters.csv -o ./result
10X_RNA/Development/scDAPA/extractGenes.sh -i10X_RNA/pipeline2.1/database/10X_Ref/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf -o hg38.gene.gff
export PATH=bedtools2/bin/:$PATH
10X_RNA/Development/scDAPA/annotate3Ends.sh -d 10X_RNA/Development/scDAPA/example/result/ -g 10X_RNA/Development/scDAPA/example/hg38.gene.gff
Column Name | Explanation |
---|---|
seqname | The name of the sequence |
source | The program that generated this feature |
feature | The name of this type of feature |
start | The starting position of the feature in the sequence |
end | The ending position of the feature |
score | A score between 0 and 1000 |
strand | Valid entries include "+", "-", or "." |
frame | If the feature is not a coding exon, the value should be "." |
gene | Gene ID and name |
start of read | The starting positions of reads annoted to this gene, separated by comma |
end of read | The ending positions of reads annoted to this gene, separated by comma |
将上述结果导入R包scDAPAminer
> library(scDAPAminer)
> # creat a folder named 'stat'
> # 1. only compare two specific cell groups
> scDAPAdetect(file1='./result/1.anno',file2='./result/2.anno',type='f2f',output_dir='./stat')
>
> # 2. compare every two cell groups stored in the ./result directory
> scDAPAdetect(dir='./result',type='d',output_dir='./stat',bin_size=100,count_cutoff=20)
Column Name | Explanation |
---|---|
chr | Name of the chromosome/scaffold |
gene | Gene ID and name |
meanlen1 | Mean length of 3′ ends to gene's start site in cell group 1 |
meanlen2 | Mean length of 3′ ends to gene's start site in cell group 2 |
SDD | Site distribution difference SDD∈[0,1] |
p.value | Statistical test p values |
p.adjust | Adjusted p values |
> dp = scDAPAview(files=c('./result/1.anno','./result/2.anno'),alt_names=c('cell_A','cell_B'),gtf=gtf,gene_id='ENSG00000160062',legend.position = c(0.2,0.8))
>
> # customize colour theme
> library(ggsci)
> dp + scale_colour_aaas()
>
> # customize legend title
> dp + labs(colour = "Cell type")
>
> # customize legend position
> dp + theme(legend.position = c(0.6, 0.9))
>
> # customize simultaneuouly
> dp + scale_colour_aaas() + labs(colour = "Cell type") + theme(legend.position = c(0.6, 0.9))
[1]Tian B, Manley J L. Alternative polyadenylation of mRNA precursors[J]. Nature Reviews Molecular Cell Biology, 2016, 18(1):18.
[2]Abdelghany S E, Hamilton M, Jacobi J L, et al. A survey of the sorghum transcriptome using single-molecule long reads[J]. Nature Communications, 2016, 7:11706.
http://www.frasergen.com/cn/info_173.aspx?itemid=258
Congting Ye, Qian Zhou, Xiaohui Wu, Chen Yu, Guoli Ji, Daniel R Saban, Qingshun Q Li, scDAPA: detection and visualization of dynamic alternative polyadenylation from single cell RNA-seq data, Bioinformatics, , btz701, https://doi.org/10.1093/bioinformatics/btz701
高通量测序技术在可选择性多聚腺苷酸化研究中的应用