scTE -10X Genomic RNA-seq 定量Transposable Element

关键词:Transposable Element;ERV内源性反转录病毒;单细胞测序分析;Seurat;scTE。


scTE介绍

背景:

采用scTE对10X 单细胞测序数据进行TE定量,再倒入Seurat进行下游分析。Jiekai 实验室,2021年3月发表在自然通讯杂志。
转座因子 (Transposable Element,TE) 占典型真核生物基因组的大部分,并以不清楚的方式导致细胞异质性。单细胞测序技术是探索细胞的强大工具,但分析通常以基因为中心,并且尚未解决 TE 表达问题。

方法:

1. 安装scTE

# scTE works with python >=3.6.
$ git clone https://github.com/JiekaiLab/scTE.git ## 进入你想要下载scTE的文件夹。
$ cd scTE
$ python setup.py install ## 进行安装

# Building genome indices
$ scTE_build -g mm10 # Mouse
$ scTE_build -g hg38 # Human

2. 对10x的输出结果bam文件进行scTE分析。

$ scTE -i ../run_cellranger_count/run_count_YL002273_S2/outs/possorted_genome_bam.bam -o YL002272_S2 -x /home/ye.liu/yang-secondary/ye/biotools/scTE/mm10.exclusive.idx --hdf5 True -CB CR -UMI UB

--hdf5 True 结果输出是hdf5格式。如果用Seurat进行下游分析需要转换为Seurat object。
-CB cell barcode,要确认bam文件中你的cell barcode的标签是CR还是CB。如果是CR就-CB CR,如果是CB就-CB CB

查看示例bam,倒数第四列是CB:

$ samtools view test.bam
A00519:758:HTCCHDSXY:3:2535:21296:19774 16  chr1    14021   0   90M *   0   0   TGGATTTCTATCTCCCTGGCTTGGTGCCAGTTCCTCCAAGTCGATGGCACCTCCCTCCCTCTCAACCACTTGAGCAAACTCCAAGACATC  ,FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFFFF:FFFFF  NH:i:5  HI:i:1  AS:i:88 nM:i:0  RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:3  RE:A:I  xf:i:0  CR:Z:CTCCCTCCACTGCGAC   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:CTCCCTCCACTGCGAC-1 UR:Z:AAGGCGTAGTAG   UY:Z:FFFFFFFFFFFF   UB:Z:AAGGCGTAGTAG
A00519:758:HTCCHDSXY:1:1355:17237:31720 0   chr1    14260   0   90M *   0   0   CTCCCTCTCATCCCAGAGAAACAGGTCAGCTGGGAGCTTCTGCCCCCACTGCCTAGGGACCAACAGGGGCAGGAGGCAGTCACTGACCCC  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  NH:i:5  HI:i:1  AS:i:88 nM:i:0  RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:1  RE:A:I  xf:i:0  CR:Z:TCGTCCACAGTATGAA   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:TCGTCCACAGTATGAA-1 UR:Z:GACTTATTTTTT   UY:Z:FFFFFFFFFFFF   UB:Z:GACTTATTTTTT
A00519:758:HTCCHDSXY:3:2227:16703:32080 16  chr1    14411   1   90M *   0   0   TCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAG  FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  NH:i:3  HI:i:1  AS:i:88 nM:i:0  RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:3  RE:A:I  xf:i:0  CR:Z:TTGAGTGGTTGTGGCC   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:TTGAGTGGTTGTGGCC-1 UR:Z:TATAATGCTCAG   UY:Z:FFFFFFFFFFFF   UB:Z:TATAATGCTCAG
A00519:758:HTCCHDSXY:3:2563:23665:33802 16  chr1    14411   1   90M *   0   0   TCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAG  FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  NH:i:3  HI:i:1  AS:i:88 nM:i:0  RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:3  RE:A:I  xf:i:0  CR:Z:TGTTGAGAGGCAATGC   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:TGTTGAGAGGCAATGC-1 UR:Z:ACGGGTGTGGAG   UY:Z:FFFFFFFFFFFF   UB:Z:ACGGGTGTGGAG

3. hdf5 转化成Seurat object

使用Convert()进行转换。
using the function Convert from SeuratDisk.

# R
library(SeuratDisk)
library(Seurat)
# 转换为h5seurat 文件
Convert("../../../YL002272_S1.h5ad", dest = "h5seurat", overwrite = TRUE)

# 再将其导入R
Seurat.obj <- LoadH5Seurat("../../../YL002272_S1.h5seurat")

将count matrix中的gene 和 TE分开

# R
## load TE names
te = read.csv('../data/mm10.TEname.txt', sep = '\t', header = F)
##
Gene = subset(Seurat.obj, features = rownames(Seurat.obj)[!rownames(Seurat.obj) %in% te$V1])
TEs = subset(Seurat.obj, features = rownames(Seurat.obj)[rownames(Seurat.obj) %in% te$V1])

TEs可以进行Seurat对应的分析。

如何下载mm10.TEname.txt文件

# hg38
$ wget -c http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz -O hg38.te.txt
$ zcat hg38.te.txt | grep -E 'LINE|SINE|LTR|DNA|Retroposon' | cut -f 11 | sort | uniq > hg38.TEname.txt

# mm10
wget -c http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/rmsk.txt.gz -O mm10.te.txt
zcat mm10.te.txt | grep -E 'LINE|SINE|LTR|DNA|Retroposon' | cut -f 11 | sort | uniq > mm10.TEname.txt

# if you need to know the family and class info for the TE names
zcat hg38.te.txt | grep -E 'LINE|SINE|LTR|DNA|Retroposon' | cut -f 11,12,13 | sort | uniq > hg38.TEnamefamilyclass.txt
zcat mm10.te.txt | grep -E 'LINE|SINE|LTR|DNA|Retroposon' | cut -f 11,12,13 | sort | uniq > mm10.TEnamefamilyclass.txt

### Note: check this page https://github.com/jphe/scTE/issues/3

参考文献:

https://github.com/JiekaiLab/scTE
https://www.nature.com/articles/s41467-021-21808-x

你可能感兴趣的:(scTE -10X Genomic RNA-seq 定量Transposable Element)