RNA 28 SCI 文章中基于RNA-seq数据反褶积揭示肿瘤免疫结构的分子和药理学 (quanTIseq)...

桓峰基因公众号推出转录组分析教程,有需要生信的老师可以联系我们!转录分析教程整理如下:

RNA 1. 基因表达那些事--基于 GEO

RNA 2. SCI文章中基于GEO的差异表达基因之 limma

RNA 3. SCI 文章中基于T CGA 差异表达基因之 DESeq2

RNA 4. SCI 文章中基于TCGA 差异表达之 edgeR

RNA 5. SCI 文章中差异基因表达之 MA 图

RNA 6. 差异基因表达之-- 火山图 (volcano)

RNA 7. SCI 文章中的基因表达——主成分分析 (PCA)

RNA 8. SCI文章中差异基因表达--热图 (heatmap)

RNA 9. SCI 文章中基因表达之 GO 注释

RNA 10. SCI 文章中基因表达富集之--KEGG

RNA 11. SCI 文章中基因表达富集之 GSEA

RNA 12. SCI 文章中肿瘤免疫浸润计算方法之 CIBERSORT

RNA 13. SCI 文章中差异表达基因之 WGCNA

RNA 14. SCI 文章中差异表达基因之 蛋白互作网络 (PPI)

RNA 15. SCI 文章中的融合基因之 FusionGDB2

RNA 16. SCI 文章中的融合基因之可视化

RNA 17. SCI 文章中的筛选 Hub 基因 (Hub genes)

RNA 18. SCI 文章中基因集变异分析 GSVA

RNA 19. SCI 文章中无监督聚类法 (ConsensusClusterPlus)

RNA 20. SCI 文章中单样本免疫浸润分析 (ssGSEA)

RNA 21. SCI 文章中单基因富集分析

RNA 22. SCI 文章中基于表达估计恶性肿瘤组织的基质细胞和免疫细胞(ESTIMATE)

RNA 23. SCI文章中表达基因模型的风险因子关联图(ggrisk)

RNA 24. SCI文章中基于TCGA的免疫浸润细胞分析 (TIMER)

RNA 25. SCI文章中估计组织浸润免疫细胞和基质细胞群的群体丰度(MCP-counter)

RNA 26. SCI文章中基于转录组数据的基因调控网络推断 (GENIE3)

RNA 27 SCI文章中转录因子结合motif富集到调控网络 (RcisTarget)

这期继续介绍基于RNA-seq 测序原始数据 fastq 来反褶积揭示肿瘤免疫结构的分子和药理学,这个对于自己的测序结果是很方便使用的,但是如果是想用于数据库的fastq来分析就比较麻烦,需要下载整理超大的文件,不是一台电脑能解决的事情,也不是一个小服务器就能处理的,有这方面的需求可以联系桓峰基因代为处理数据,我们看看这个软件包怎么使用。



前    言

quanTIseq 介绍

quanTIseq,一种从大量RNA测序数据中量化十种免疫细胞类型分数的方法。quanTIseq 通过模拟、流式细胞术和免疫组织化学数据在血液和肿瘤样本中广泛验证。对8000个肿瘤样本的quanTIseq分析显示,细胞毒性T细胞浸润与 CXCR3/CXCL9 轴激活的相关性比突变负荷更强,并且基于反卷积的细胞评分在几种实体癌症中具有预后价值。最后,使用 quanTIseq 来展示激酶抑制剂如何调节免疫结构,并揭示不同患者对检查点阻滞剂反应的免疫细胞类型。

RNA 28 SCI 文章中基于RNA-seq数据反褶积揭示肿瘤免疫结构的分子和药理学 (quanTIseq)..._第1张图片

quanTIseq 可以对肿瘤样本中不同种类免疫细胞的组成进行预测,支持以下10种类型的免疫细胞:

RNA 28 SCI 文章中基于RNA-seq数据反褶积揭示肿瘤免疫结构的分子和药理学 (quanTIseq)..._第2张图片

软件分析流程

quanTIseq将从肿瘤样本或其他细胞混合物中读取的RNA-seq作为输入FASTQ文件,并通过反卷积量化十种不同免疫细胞类型的比例和异质样本中其他未特征细胞的比例(表1)。quanTIseq分析包括三个步骤(图1):

  1. 使用 Trimmomatic  预处理原始 RNA-seq 数据(单端或双端),以删除 Illumina 接头序列,trim 低质量的 reads,将长读取裁剪到最大长度,并丢弃短 reads。

  2. 以百万分转录本(TPM)和原始计数的方式量化 Kallisto 基因表达。

  3. 表达归一化,基因重新注释,基于约束最小二乘回归的细胞分数反褶积,以及细胞密度的计算。

注:分析可以在单个或多个样本上运行,并且可以在任何步骤启动。例如,可以使用反褶积模块直接分析预先计算的表达式矩阵。

RNA 28 SCI 文章中基于RNA-seq数据反褶积揭示肿瘤免疫结构的分子和药理学 (quanTIseq)..._第3张图片

使用方法见官网:

https://icbi.i-med.ac.at/software/quantiseq/doc/index.html

软件安装

软件直接在git上下载即可:

https://github.com/icbi-lab/quanTIseq

根据说明安装即可

Mac OS X (based on Docker)
Install Docker (instructions here [https://docs.docker.com/engine/install/]).
Download the "quanTIseq_pipeline.sh" script from here.

Linux(based on Singularity)
Install Singularity (instructions here [https://docs.sylabs.io/guides/3.0/user-guide/]).
Download the "quanTIseq_pipeline.sh" script from here.

数据读取

输入数据为 fastq 原始测序数据,我们可以从git上下载,解压之后会看到有文件 test 文件夹里面的数据如下:

RNA 28 SCI 文章中基于RNA-seq数据反褶积揭示肿瘤免疫结构的分子和药理学 (quanTIseq)..._第4张图片

RNA 28 SCI 文章中基于RNA-seq数据反褶积揭示肿瘤免疫结构的分子和药理学 (quanTIseq)..._第5张图片

Input 文件夹中有三个文件,双端测序结果,另一个是单端测序结果,所以这个软件对于测序reads是都可以做分析的:接下来我们使用例子进行测试,我们需要将输入文件写入 rnaSeqInfoFile.txt,作为输入文件。需要注意:

RNA 28 SCI 文章中基于RNA-seq数据反褶积揭示肿瘤免疫结构的分子和药理学 (quanTIseq)..._第6张图片

--inputfile:输入文件的路径(必选)。输入文件是一个以制表符分隔的文本文件,没有标题,包含有关要分析的RNA-seq数据的信息,每行一个样本。对于每个示例,它必须包含三个列,指定示例的ID、第一个FASTQ文件的路径和用于配对端数据的第二个FASTQ文件的路径。对于单端数据,第三列必须报告字符串“None”。FASTQ文件可以gzip(“。gz”扩展名)。要使用"——pipelinstart =decon"选项运行quanTIseq,输入文件必须是一个以tab分隔的文本文件,其中包含要反卷积的所有样本的基因TPM(或微阵列表达式值)(基因符号在第一列,样本id在第一行)。表达式数据必须是非对数尺度。

sample0	test/Input/rnaseq_sample_1.fastq.gz	test/Input/rnaseq_sample_2.fastq.gz
sample1	test/Input/rnaseq_sample_3.fastq.gz	None

实操起来非常简单只需要一行代码,麻烦就是安装时候需要配置一下docker。

bash quanTIseq_pipeline.sh --inputfile=Input/rnaSeqInfoFile.txt --outputdir=Output

参数说明

参数可以根据实际数据情况调整,因为这个流程需要从fastq开始,因此我们也只用软件自带的数据做测试,参数默认即可!

--pipelinestart: step from which the pipeline should be started as illustrated in Figure 1: (1) "preproc", (2) "expr", or (3) "decon". Note: if "decon", the input file must be a tab-delimited text file with the gene TPM (or microarray expression values) for all samples to be deconvoluted (gene symbols on the first column and sample IDs on the first row). Expression data must be on non-log scale. Default: "preproc".

--tumor: specifies whether expression data are from tumor samples. If TRUE, signature genes with high expression in tumor samples are removed (see [1]). We highly recommend setting "--tumor=TRUE" when analyzing tumor data. Default: FALSE.

--arrays: specifies whether expression data are from microarrays (instead of RNA-seq). If TRUE, the "--rmgenes" parameter is set to "none". Default: FALSE.

--method: deconvolution method to be used: "hampel", "huber", or "bisquare" for robust regression with Huber, Hampel, or Tukey bisquare estimators, respectively, or "lsei" for constrained least squares regression. The fraction of uncharacterized cells ("other") is computed only by the "lsei" method, which estimates cell fractions referred to the total cells in the sample under investigation. For all the other methods, the cell fractions are referred to immune cells considered in the signature matrix. We recommend using the "lsei" method. Default: "lsei".

--mRNAscale: specifies whether cell fractions must be scaled to account for cell-type-specific mRNA content. We highly recommend using the default setting: "--mRNAscale=TRUE". Default: TRUE.

--totalcells: path to a tab-separated text file containing the total cell densities estimated from images of tumor tissue-slides. The file must have no header and contain, on the first column, the sample IDs, and on the second column, the number of total cells per mm2 estimated from tumor images (e.g. from images of haematoxylin and eosin (H&E)-stained tissue slides). quanTIseq computes cell densities for all the cell types of the signature matrix and all the samples present in both the "inputfile" and "totalcell" files. This parameter is optional and, if not set, only the deconvoluted cell fractions are returned.

--rmgenes: specifies which genes must be removed from the signature matrix before running deconvolution. This parameter can be equal to:

"none": no genes are removed; "default": the list of genes reported in this file are removed; A path to a text file containing a list of genes to be removed can be specified. The text file must report one gene symbol per line, as in this example. Can be used to specify noisy genes that are known/thought to bias deconvolution results (e.g. immune genes with very high expression in the sample(s) of interest). We advise using "default" and "none" for the deconvolution of RNA-seq and microarray data, respectively. Default: "default". --rawcounts: specifies whether a file of gene raw counts should be generated in addition to TPM. Default: FALSE.

--prefix: prefix of the output files. Default: "quanTIseq".

--threads: number of threads to be used. Note: kallisto results (gene counts and TPM) can differ slightly depending on the number of threads used. Default: 1.

--phred: encoding of the RNA-seq quality scores for read preprocessing with Trimmomatic: "33" for Phred-33 or "64" for Phred-64. Default: 33.

--adapterSeed: maximum number of seed mismatches for the identification of adapter sequences by Trimmomatic. Default: 2.

--palindromeClip: threshold for palindrome clipping of adapter sequences by Trimmomatic. Default: 30.

--simpleClip: threshold for simple clipping of adapter sequences by Trimmomatic. Default: 10.

--trimLead: minimum Phred quality required by Trimmomatic to keep a base at the start of a read. Bases with lower quality are trimmed. Default: 20.

--trimTrail: minimum Phred quality required by Trimmomatic to keep a base at the end of a read. Bases with lower quality are trimmed. Default: 20.

--minLen: minimum read length required by Trimmomatic to keep a read. Reads shorter than this threshold are discared. Default: 36.

--crop: maximum read length required by Trimmomatic. Longer reads are trimmed to this maximum length by removing bases at the end of the read. Default: 10000.

--avgFragLen: estimated average fragment length required by Kallisto for single-end data. Deafult: 50.

--sdFragLen: estimated standard deviation of fragment length required by Kallisto for single-end data. Deafult: 20.

结果解读

结果输出两个文件:

  1. prefix_cell_fractions.txt:以制表符分隔的文本文件,其中包含quanTIseq估计的单元格分数(样本id位于第一列,单元格类型位于第一行)。只有在“——method=lsei”时才报告未特征细胞的比例(表1)。

Sample	B.cells	Macrophages.M1	Macrophages.M2	Monocytes	Neutrophils	NK.cells	T.cells.CD4	T.cells.CD8	Tregs	Dendritic.cells	Other
sample0	0.00371552192306291	0.0335235118320783	0.0307219602198042	0	0.0327278514869014	0.0164700546865858	0	0.00702707911489273	0.0199395867163995	0	0.855874434020275
sample1	0.00413627370367692	0.0306458852877606	0.0334538767081232	0	0.0353162337129308	0.0146209315948356	0	0.00748693519911548	0.0229305079528753	0	0.851409355840682
  1. prefix_gene_tpm.txt:以制表符分隔的文本文件,其中TPM中的基因表达(基因符号在第一列,样本id在第一行)。当指定"——pipelinstart =decon"选项时,该文件不会生成。

GENE	sample0	sample1
UBE2Q2P2	0	0
SSX9	0	0
CXorf67	0	0
EFCAB8	0	0
SPATA31B1P	0	0
SDR16C6P	0	0
GTPBP6	75.03456	76.78617
EFCAB12	0	0
A1BG	0	0
A1CF	7.571652372	8.619685958
RBFOX1	0	0
GGACT	4.13322	5.32586
A2ML1	0	0
A2M	110.614	126.5759784
A4GALT	1.9126700201938	2.1760300507943
A4GNT	0	0
NPSR1-AS1	0	0
AAAS	30.888404581	29.8392
AACS	17.7155183	18.4985943465
  1. prefix_gene_count.txt:以制表符分隔的文本文件,读取计数中的基因表达(基因符号在第一列,样本id在第一行)。该文件仅在指定"——rawcounts=TRUE"选项时生成。

后续的分析就可以进行一些可视化,可以参考桓峰基因公众号上的绘图等!

桓峰基因,铸造成功的您!

未来桓峰基因公众号将不间断的推出转录组系列生信分析教程,

敬请期待!!

有想进生信交流群的老师可以扫最后一个二维码加微信,备注“单位+姓名+目的”,有些想发广告的就免打扰吧,还得费力气把你踢出去!

RNA 28 SCI 文章中基于RNA-seq数据反褶积揭示肿瘤免疫结构的分子和药理学 (quanTIseq)..._第7张图片

References:

1. Finotello F, et al. Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data. Genome Medicine, 2019. 11(1):34.

2. A. M. Bolger, M. Lohse, and B. Usadel, "Trimmomatic: a flexible trimmer for Illumina sequence data", Bioinforma. Oxf. Engl., vol. 30, no. 15, pp. 2114–2120, Aug. 2014.

3. N. L. Bray, H. Pimentel, P. Melsted, and L. Pachter, "Near-optimal probabilistic RNA-seq quantification", Nat. Biotechnol., vol. 34, no. 5, pp. 525–527, May 2016.

4. B. Li and C. N. Dewey, "RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome", BMC Bioinformatics, vol. 12, p. 323, 2011.

桓峰基因和投必得合作,文章润色优惠85折,需要文章润色的老师可以直接到网站输入领取桓峰基因专属优惠券码:KYOHOGENE,然后上传,付款时选择桓峰基因优惠券即可享受85折优惠哦!https://www.topeditsci.com/

你可能感兴趣的:(人工智能)