文章1:Genomic basis for RNA alterations in cancer
此篇文献已发在上期推文,本文增加了大量的解读和注释 (灰色且斜体字),用于辅助对原文的理解
接收: 2019-12-11,Nature
作者:PCAWG Transcriptome Core Group
链接:doi.org/10.1038/s41586-020-1970-0
摘 要
癌症中转录本的改变通常由体细胞基因组的变化所引起 (Transcript alterations often result from somatic changes in cancer genomes)。(仔细理解这句话,是癌症中联合转录组和基因组数据分析的理论基础。即:转录(组)变化的分子机制,来自于基因组)
癌症中描述了各种形式的RNA改变 (RNA alteration),包括:过表达、可变剪接和基因融合 (Overexpression, Altered splicing, Gene fusions)。(这些大多是转录组测序的研究内容,但我们经常只侧重基因的表达定量、差异基因鉴定、GO/KEGG等功能注释)
然而,由于患者之间,以及肿瘤类型之间的异质性 (Heterogeneity),以及通过转录组和全基因组测序分析样本的相对较小的患者队列 (Small cohort),很难将这些归因于潜在的基因组变化。
(这里的异质性主要指:1. 每个人的遗传差异,以及2. 肿瘤组织中不同取样部分、取样时间、不同细胞亚型之间的差异。这些差异或异质性,会极大地影响肿瘤研究中的样本设置,以及数据分析的策略。比如:
① 肿瘤WES研究除了需要患者自身的肿瘤组织,还需要自身的非肿瘤组织来配对,后者是为了过滤掉该患者特有的胚系变异(即我们每个人生下来就含有的、与别人不同的DNA序列变异),否则将无法判读一个突变是来自体细胞 (后天获得)、还是生来就有 (遗传或新发突变),即:Somatic Mutation vs. Germline Mutation。公共数据库并未记录该患者的先天遗传变异,所以永远需要患者自身的正常组织来配对。
② 体细胞突变多数情况下是随机的。因此同一块肿瘤组织一般可被分为不同的细胞亚群,含有不同的驱动突变、进而有不同的转录本表达 (此为空间异质性。肿瘤的演进/进化研究则可能涉及时间上的异质性,比如肿瘤晚期的癌转移组织,或服用靶向药1年后出现的耐药突变组织等等)。
但总体上我们认为:单个个体中的不同器官、组织、细胞,都携带相同的遗传物质,即:Germline的突变不受取材的影响,既可以是癌旁,也可以是全血、白细胞甚至口腔脱落细胞;而对于肿瘤体细胞样本的取材则复杂得多,需要考虑:正常组织配对,时间及空间异质性。
由于异质性的存在,就要求取样方式的多样化,以及较大的总样本量 (即:较大的患者队列),获得更加可信的分子机制预测结果)
在这里,我们展示了迄今为止据我们所知的最全面的癌症相关基因改变目录 (Catalogue of cancer-associated gene alterations),该目录是通过描述国际癌症基因组联盟(International Cancer Genome Consortium, ICGC)和癌症基因组图谱(The Cancer Genome Atlas, TCGA)的泛癌全基因组分析(Pan-Cancer Analysis of Whole Genomes, PCAWG)联盟1,188名捐赠者的肿瘤转录组获得的。
我们 (进一步)利用匹配的 (Matched)全基因组测序数据,将几种类型的RNA改变与胚系和体细胞DNA改变相关联 (Associated several categories of RNA alterations with germline and somatic DNA alterations),并确定了可能的遗传机制 (Genetic mechanism)。
(我们发现了)细胞拷贝数改变是总基因 (Total gene)与等位基因特异性表达 (Allele-specific expression, ASE)变化的主要驱动因素。
我们鉴定了649个体细胞SNV与基因表达的顺式 (cis)相关性,其中68.4%与基因的侧翼非编码区相关 (Flanking non-coding region)。
我们 (还)发现1,900个与体细胞突变相关的剪接改变,包括在靠近Alu元件的内含子内部的外显子形成 (Formation of exons within introns in proximity to Alu elements)。
此外,82%的基因融合与结构变异相关,包括75个新类别的“桥接”融合 (由第三个基因组位置连接两个基因)。
(总之)我们观察到不同癌症类型的转录组改变的特征不同,并且与DNA突变特征的变化相关联。本研究所获得的基因组背景下RNA改变的概要,为确定与癌症功能相关的基因和机制,提供了丰富的资源。
引 言
为了更广泛地研究癌症基因组的改变 (特别是在非编码区),PCAWG项目的成立便是为了分析大量的全基因组样本,这些样本被贡献给ICGC和TCGA项目。
(之前)个别项目没有使用相一致的方法进行一些关键分析。因此16个PCAWG工作组的一个主要的重点是:统一地分析PCAWG数据。例如,PCAWG技术工作组领导了原始数据收集、全基因组测序数据的重新排列(比对),并实施了核心体细胞突变检测流程 (Core somatic mutation calling pipeline)。PCAWG的其它工作组集中于:对拷贝数变异、结构变异、胚系变异、突变特征和驱动基因鉴定等实施统一地分析。
在此,我们报告了PCAWG转录组工作小组对来自27种肿瘤类型的1,188个样本 (每类肿瘤154~6个样本,见下图;平均值:44)的可用且匹配的转录组和基因组图谱的联合分析 (Joint analysis of available matched transcriptome and genome profiling) (考虑到各种异质性的存在,这里的“匹配”可能是指:同一个肿瘤病人的同一块组织的匀浆液 (例如消除单块肿瘤组织的空间异质性),分别用于RNA、DNA的提取,分别进行转录组和基因组测序,即严格的“matched,匹配”。肿瘤组织 (也包含了少数转移组织)必然分别测序RNA、DNA (转录组、基因组);癌旁或其它Normal/Healthy组织可能只测序DNA (用于过滤自身的胚系变异),或也测序了RNA (用于做转录组差异分析)),提供了迄今为止我们所知的最大的癌症RNA表型及其潜在的遗传变化基础 (RNA phenotypes and their underlying genetic changes in cancer)资源 (Extended Data Fig. 1, Methods, Supplementary Results, Supplementary Table 23)。
Extended Data Fig. 1 | 1,188例PCAWG捐献者的泛癌表达谱
a,来自27种组织类型的肿瘤和正常RNA-seq数据。样本总数显示在柱状图的右边。灰色条表示匹配的健康样本。
b,女性和男性捐献者的数量。
c,来自PCAWG研究的肿瘤总数和匹配的健康样本。一组肿瘤(深紫色)已转移。
S1_covars.xlsx / All_samples_cohort
Supplementary_Tables (下载):https://pan.baidu.com/s/10fTsnVYlk30T9pKIq05cHg
提取码:ysx4
总之,我们展示了转录组数据用于理解特定DNA改变的不同维度如何促进癌变的重要性,并绘制出癌症相关RNA改变的图景 (Landscape)。
癌症特异性胚系顺式-eQTLs
Cancer-specific germline cis-eQTLs
为了研究不同类型RNA改变的潜在机制,我们首先关注了基因表达水平的变化 (Extended Data Fig. 2)。
(表达数量性状位点 (Expression quantitative trait locus, eQTL)是关联转录组和基因组/外显组两个组学的常用、经典方法,属于多组学研究范畴。
eQTL是一类能够影响基因表达量的遗传位点(大部分都是单核苷酸多态性,SNP),具有一定的生物学意义。迄今为止最全的eQTL数据库是GTEx。分析SNP和基因表达水平的关联度,以及SNP与基因的距离,寻找SNP调控的基因。
Cis-eQTLs vs. trans-eQTLs. Expression quantitative trait loci (eQTLs) are genetic variants that influence expression levels of mRNA transcripts. Cis-eQTLs commonly refer to genetic variations that act on local genes , and trans-eQTLs are those that act on distant genes and genes residing on different chromosomes.
a) cis-eQTL, b) trans-eQTL, c) mediated/介导 trans-eQTL with a single cis-mediator, and d) mediated/介导 trans-eQTL with multiple cis-mediators. 清华大学统计科学中心,https://doi.org/10.1186/s12859-019-2651-6,BMC Bioinformatics
Identification of eQTLs can help advance our understanding of genetics and regulatory mechanisms of gene expression in various organisms. Consistent findings suggest that many genes are regulated by nearby SNPs, and the identified cis-eQTLs are typically close to transcription start sites (TSSs). In contrast to cis-eQTLs, trans-eQTL identification is much more challenging because a greater number of SNP-gene pairs are tested for trans-association. In order to achieve the same power, analysis of trans-eQTLs requires a much larger sample size and/or effect than that in the cis-eQTL analysis. However, trans-eQTLs tend to have weaker effects than cis-eQTLs.
Mediation diagram of the trans-association between rs2239804 and RPL34
Several methods have been developed to improve trans-eQTL detection, such as reducing the multiple-testing burden based on pairwise partial correlations from the gene expression data to increase power, and constructing or selecting variables to control for unmeasured confounders that may lead to spurious association
eQTL分析至少需要三个文件:第1个是样本信息文件,该文件包含样本的年龄,性别和人种等等;第2个是基因表达量文件,它表示的是每个基因在每个样本中的表达含量;第3个是基因型数据,也即每个样本的基因型数据)
Extended Data Fig. 2 | 概述:在分析中考虑的遗传变异的不同来源
a, 为了分析顺式调控,使用标准eQTL方法,分别检测单等位、单核苷酸 (Mono-allelic single-nucleotide)胚系变异 (SNV,蓝色)与总基因表达 (Total gene expression)的关联。(蓝色圆点SNV,在样本中存在完全相同的基因组位置;上图的示例有3处)
由于体细胞SNV在队列中复发率较低 (Low recurrence,红色圆点SNV,在样本中不存在完全相同的基因组位置;上图的示例有0处),根据它们相对于所观测的基因的位置 (例如启动子、5 ' UTR或内含子),体细胞SNV被聚集在负荷分类中 (Aggregated in burden categories) (例如上图的“Local somatic SNV burden/局部体细胞SNV负荷”)。
然后使用eQTL方法测试局部SNV负荷,获取与所有基因的ASE/等位基因特异性表达 globally关联,以及在每个基因水平上的总表达。通过检测与突变及表观遗传特征相关的总基因表达,来估计反式效应 (Trans effects)。
所有体细胞顺式eQTL分析的窗口大小为1 Mb;ASE及胚系顺式eQTL分析的窗口大小为100 kb。
b,概述:不同的数据集,及其对a中所述分析的贡献。箭头表示所执行的单个分析之间的依赖关系。
① 胚系基因型来源于匹配的 (Matched)健康全基因组测序 (WGS)样本。
② 等位基因特异性SCNAs (体细胞拷贝数改变)、突变特征和局部SNV负荷,来自于:与未受影响的 (Unaffected) WGS样本相比的肿瘤WGS (即N-T配对)。
③ ASE和总表达 (Total expression/FPKM)来自肿瘤和正常RNA-seq数据。
我们最初考虑了常见的胚系变异 (次要等位基因频率 (Minor allele frequency, MAF)≥1%),且靠近单个基因 (±100 kb),并在队列中绘制了表达定量性状位点 (eQTL) (Extended Data Fig. 3, Supplementary Table 1)。
该泛癌分析发现了3,532个eQTL基因 (假发现率即FDR≤5%,以下表示为eGenes) (Supplementary Table 2),富集于转录起始位点的近端区域 (TSSs) (Extended Data Fig. 3)。
Supplementary_Tables / S2_eGenes_v2.xlsx / pan-analysis
Extended Data Fig. 3 | 胚系eQTL中的先导变异 (Lead variants)
(每一行的3个子图是一类肿瘤,3个图是eQTL分析常见的输出图形,主要涉及:P值、先导SNP的个数及其与TSS的距离)
为了识别癌症特异性调控变异,我们将我们的eQTL与来自基因型-组织表达项目 (GTEx,数据一般来自健康组织)的eQTL进行比较,采用之前的策略来评估eQTL的Replication,并探索先导eQTL变异在GTEx组织中的边缘显著性 (Marginal significance. P≤0.01, Bonferroni-adjusted)。
尽管大多数先导变异在GTEx样本中都能检测到 (3,532个eQTL变异中有3,110个),但我们鉴定出了422个 (~8.4%)与GTEx组织不对应的eQTL,这提示了存在癌症特异性调控 (Extended Data Fig. 4, Supplementary Table 3)。相应的eQTL先导变异富集于异染色质区 (Heterochromatic region) (Fig. 1a,图中右侧第2个显著性星号:* )。
Fig. 1 | 与基因表达关联的胚系及体细胞SNV
a,表观遗传学路线图 (Epigenetics Roadmap)富集分析,显示泛分析/Pan-analysis的PCAWG特异性eQTLs,以及在GTEx组织中复现/Replicate的eQTLs中,跨细胞系Roadmap因素/Factos的平均倍数变化。
* :P < 0.05/25, PCAWG特异性eQTLs的单侧Wilcoxon秩和检验,校正了所使用的Roadmap因子的数量 (即25)。数据为均值和标准差.
(其它几个子图,将在后文讲解)
总的来说,这一分析揭示了基因表达调控的胚系框架 (Germline framework)在癌症组织中很大程度上是保守的。
非编码区体细胞顺式eQTL
Somatic cis-eQTLs in non-coding regions
先前的其它研究已经描述了癌症中的非编码突变,特别是在启动子区,及其对基因表达的调控作用。在这里,我们研究了整个基因组中可能的体细胞DNA变化,这些变化是基因表达变化的基础。
Extended Data Fig. 5 | 顺式突变体细胞负荷 (Cis-mutational somatic burden)
a,每种癌症类型的体细胞突变负荷总数 (Total number of somatic mutational load per cancer type)。SNV的中位数范围从甲状腺腺癌的1,139个到皮肤黑色素瘤的72,804个。
(此图也可以用于绘制肿瘤样本分类或分组后,各自体细胞突变负荷总数的分布图)
(横轴) Shared Aliquots (共享的整除数)
b,由越来越多的患者共享的反复出现的体细胞SNV的数量。一小部分 (≥86个SNV)在超过1%的队列 (≥12例患者)中均被检测到。
(此图可由变异水平的各样本的SNV矩阵 (热图)/VCF文件,统计得到)
我们通过聚集 (Aggregating)基因附近 (侧翼)2 kb区间 (2-kb intervals adjacent to genes, flanking)的SNV,以及处在外显子、内含子中的SNV (Extended Data Figs. 2, 5, 6),来估计局部突变负荷 (Estimat local mutation burden)。
Extended Data Fig. 6 | 按检测区域类型划分的体细胞突变率与负荷频率 (Mutation rate and burden frequency)
a,每个基因检测到的、体细胞突变负荷频率≥1%的、突变区域的个数;
b,每千碱基的突变率 (Mutation rate per kilobase)。
c,按所测间隔类型划分的 (侧翼区、外显子、内含子)负荷频率。
d,前导间隔 (Leading intervals, FDR≤5%)到其最近的 (左和右)间隔的距离分布 (bp),使其关联的P值下降了至少一个数量级 (显示了99%的分布)。
e,检测的所有的基因组区域 (负荷频率≥1%,n = 1,049,102),以及所观察到的FDR为5%的体细胞顺式eQTL下的567个基因组区域的分解 (Breakdown)。图中,Intronic:eGene内含子;Exonic:eGene外显子;Flank.:表示距离eGene起始和结束1Mb距离内的2kb侧翼区域;flank.intergenic:指基因组位置 (无基因注释)的侧翼区域;Flank.intronic:指与邻近基因内含子重叠的侧翼区域;Flank.others:表示与附近基因的一些注释部分地重叠的侧翼区域。
接下来,我们分解 (Decomposed)了单个基因的表达变化,考虑了顺式基因中常见的突变负荷,以及顺式胚系变异和体细胞拷贝数改变 (SCNAs)。这表明SCNAs是表达变化的主要驱动因素 (17%),其次是基因侧翼区域的体细胞SNV (1.8%)和胚系变异 (1.3%) (Fig. 1b)。
Fig. 1b
b,对基因表达水平进行方差成分分析 (Variance component analysis),显示不同种系和体细胞因素,对不同基因集的方差所占的平均比例 (Average proportion of variance explained by different germline and somatic factors for different sets of genes),包括所有因子的平均效应:
1)所有遗传因子 (包括种系和体细胞);2)体细胞拷贝数变异;3)侧翼区的体细胞变异;4)人群结构;5) cis-germline effects;6)体细胞内含子和外显子突变效应。
(可见:体细胞的内含子和外显子突变效应的解释度很小,而主要由拷贝数变异、非编码区和顺式胚系等变异所解释)
我们还测试了所有常见突变负荷与整个基因组的基因表达之间的关联。我们鉴定了649个具有体细胞eQTL (FDR≤5%)的基因 (Supplementary Table 5)。其中,11个关联结果位于相应eGene的内含子或外显子,包括在特定癌症发病机制中已知存在作用的基因,如卵巢癌中的CDK12和慢性淋巴细胞白血病中的IRF4 (Extended Data Figs. 7, 8)。
Supplementary_Tables / S5_somatic_egenes_rev.xlsx / eQTL_results_FDR5%
Extended Data Fig. 7 | 与遗传先导负荷 (Genic lead burden)相关联的7个体细胞eGenes的曼哈顿图
Extended Data Fig. 8 | 8个体细胞eGenes的散点图,显示先导权重负荷对基因表达残差的影响 (Plots show the effect of the lead weighted burden on the gene expression residuals (见原文的Methods) of these genes. a, CDK12. b, PI4KA. c, IRF4. d, AICDA. e, C11orf73. f, BCL2. g, SGK1. h, TEKT5
大多数eQTL (68.4%)与侧翼非编码突变负荷相关 (Extended Data Fig. 6e,见上文)。(由此可见:基因组的非编码区虽然不直接体现生命活动 (蛋白),但对基因表达的调控非常重要)
接下来,我们考虑了位于侧翼区域 (n = 556)的eQTLs,并测试了来自Epigenetics Roadmap的细胞类型特异性注释的富集。确定了13个有富集的注释 (FDR≤10%) (Extended Data Fig. 9, Supplementary Table 6),包括:待发 (Poised)启动子,弱的和活跃的增强子,异染色质,但明显没有富集到转录因子结合位点 (Supplementary Table 7)。(Roadmap的这些注释可能被收集在一些带有注释的bed文件,结合本文献的数据及bedtools等工具,做进一步的统计、关联)。 转录不活跃区域的富集可能是由于这些区域的突变率增加 (Extended Data Fig. 9),这之前在癌症中有报道。
Extended Data Fig. 9 | 与存在体细胞突变负荷的侧翼间隔,有所重叠的表观基因组图谱标记 (Roadmap epigenome marks)
我们还研究了体细胞eGenes的功能特征,并观察到癌细胞testis基因的二价 (Bivalent)启动子中体细胞eQTLs的富集 (P = 0.04, Fisher’s exact test),如TEKT518 (Fig. 1c, Extended Data Fig. 8h)。
Fig. 1c
c,曼哈顿图显示TEKT5基因关联的名义 (Nominal)P值 (用灰色标出),已考虑侧翼、内含子和外显子间隔。先导体细胞负荷与TEKT5表达的增加相关 (P = 1.61 × 10e-6),并与上游二价 (Bivalent)启动子重叠 (红点;注释于:81个Roadmap细胞系,包括8个胚胎干细胞,9个胚胎干细胞来源,5个诱导多能干细胞系)。
此外,我们发现了与细胞分化和发育过程相关的基因本体 (即GO)类别的整体 (Global)富集 (FDR≤10%) (Supplementary Table 8)。总体而言,体细胞eQTL分析发现,大多数非编码区域与局部基因表达的变化相关,与癌症特异性胚系eQTL类似,显示了转录非活性区域的富集,如异染色质。
Fig. 1d, 1e
d,突变特征 (Mutational signatures, Sig)与基因表达之间的显著性关联结果总结。
顶:每1类突变特征/Signature (FDR ≤ 10%)中,关联基因的总数。
下:每1类突变特征/Signature相关的基因,其富集到的GO分类/Categories或Reactome通路 (FDR≤10%,显著性水平以颜色编码,-log10转换后的校正后的P值)。
e,仅考虑SCNAs、胚系eQTLs、编码和非编码突变,AEI (Allelic expression imbalance,非平衡等位基因表达)存在的标准效应 (Standardized effect)大小。数据是对效应大小的估计和标准误的估计。
-- 未翻译完,更多内容请查看原文;下文主要涉及:摘要、方法和部分图形解读 --
Fig. 2 | 体细胞突变对选择性剪接的位置特异性影响 (Position-specific effect of somatic mutations on alternative splicing)
a,顶部,外显子-内含子连接 (Exon–intron junctions)附近,及与外显子跳过事件 (Exon-skipping event)相关的分支位点 (Branch sites)的突变比例。具有相关剪接变化的突变是指其中:The percentage spliced in-derived |z-score| is ≥ 3 (图中的深蓝色)。星号:Intron positions significantly enriched for splicing changes relative to background based on a permutation test. *P < 0.05, **P < 0.01, ***P < 0.001。底部: sequence motifs of regions。
Fig. 2b, 2c
b,肿瘤抑制基因STK11的外显化 (Exonization)事件的例子。图的上方,对于携带变异 (Alternative/替代)等位基因的供体,基因的某部分的RNA-seq的Reads覆盖显示为红色,而对于携带参考等位基因的随机供体 (Random donor with reference allele)则显示为灰色。盒式外显子事件 (Cassette exon event)显示在图的下方。
c, Enrichment of SINE elements in SAVs (Splicing-associated variants,剪接相关变异) compared to sequence background (BG). Shown for SINE elements overlapping in sense (middle) and antisense (right) directions.
Fig. 3 | 与RNA融合相关的结构重排
a,所有检测到的和新的融合的数量,及其与癌症普查 (Census)基因的重叠部分。b、桥接融合示意图。桥接融合是由连接两个基因的第三个基因组片段形成的复合融合。在每种情况下,只描述了一种可能的基因组排列顺序,断点被突出显示为“闪电”。
Fig. 4 | 影响肿瘤的DNA和RNA变化的全局视图
a, 不同组织类型的不同改变的中位数. Histotypes are ordered by hierarchical clustering based on the pattern of different types of alteration. Alt., alternative; non-syn, non-synonymous. Cancer-type abbreviations are listed in Supplementary Table 23.
b, c, Circular representations of the selected genes significantly co-occurred with B2M (b) and PCBP2 (c). Connecting lines indicate the specific types of co-occurrence of alteration pairs. 内部直方图显示不同颜色的不同DNA/RNA变化类型的发生频率。
d, 所有74个癌症体细胞突变目录 (COSMIC)的癌症普查基因,或PCAWG驱动基因,在RNA和DNA水平的改变中、存在频繁和异质性地改变。黄条:DNA水平发生改变的样本比例,绿条:RNA水平发生改变的样本比例。(二者呈现相反的趋势,可以这么理解:肿瘤中如果一个基因已经发生了突变,则其表达与否,是次要影响因素,后者不再受癌症演变的选择;反之亦然。有些基因注定是驱动突变 (如TP53),另外一些基因则是“被动表达 (如GAS7)”,即驱动突变引起的对其它一系列基因表达调控的影响)。中间一栏:该基因观察到的每种变异类型的比例。
e, 在我们发现的显著地重复出现的基因的列表中的癌症基因的富集 (The enrichment of cancer genes within our list of significantly recurrent genes)。
文章2:综合多组学分析确定非肌肉浸润性膀胱癌 (Non-muscleinvasive bladder cancer)的预后分子亚型
接收: Nature Communications
时间/作者:2021/丹麦奥胡斯大学医院分子医学系
链接:doi.org/10.1038/s41467-021-22465-w
摘 要
非肌层浸润性膀胱癌(NMIBC)的分子特征是生物异质性大,临床结果可变。在这里,我们对诊断为NMIBC的患者(n=834)进行了多组学综合分析。转录组分析确定了反映肿瘤生物学和疾病侵袭性的四个类别(1、2a、2b和3)。基于转录组的亚型和染色体不稳定性水平提供了超出既定预后临床病理参数的独立预后价值。染色体高度不稳定性、p53通路中断和APOBEC相关突变与转录组2a类和不良预后显著相关。RNA衍生的免疫细胞浸润与染色体不稳定的肿瘤相关,并在2b类中富集。空间蛋白质组学分析证实2b类肿瘤浸润程度较高,并证明免疫细胞浸润程度较高与复发率较低之间存在关联。最后,使用单样本分类工具在1228个验证样本中记录了转录组分类的独立预后价值。该分类器为生物标记物发现和优化下一代临床试验中的治疗和监测提供了框架。
方 法
我们将先前研究中收录的438种肿瘤的RNA-Seq数据,与新的97个肿瘤RNA-Seq数据一起重分析 (Reanalyzed together )。
基于发现样本,我们创建了一个包含55名患者的BCG队列 (临床上,高危NMIBC手术后经膀胱辅助灌注 (Bacillus Calmette–Guérin, BCG)以根除残留疾病,从而减少复发和进展的频率),他们符合以下标准: (1)BCG治疗的指征是高级别疾病,(2)患者接受了至少6个BCG系列,(3) 在TURB后12个月内开始BCG治疗 (因此,对分析的肿瘤给予BCG)。利用我们数据集中的多种可用特征,利用BCG队列研究BCG失效的时间。BCG无失败生存期定义为BCG治疗后第一次出现高分级肿瘤或第一次进展至MIBC的时间。
部分图形解读
Fig. 1 Transcriptomic classes in NMIBC.
a Consensus matrix for four clusters. Samples are in both rows and columns and pairwise values range from 0 (samples never cluster together; white) to 1 (samples always cluster together; dark blue). (样本的相关性矩阵,发现聚集为4类)
b Comparison between the three UROMOL2016 transcriptomic classes and the UROMOL2021 four-cluster solution (76% of tumors in UROMOL2016 class 1 remained class 1, 92% of tumors in UROMOL2016 class 2 remained class 2a/2b and 67% of tumors in UROMOL2016 class 3 remained class 3). (样本前后分类、聚集的比较)
c Kaplan–Meier plot of progression-free survival (PFS) for 530 patients stratified by transcriptomic class. (以分组的转录组聚集分类,做无进展生存曲线;四条曲线分别对应4种分类)
d Kaplan–Meier plot of recurrence-free survival (RFS) for 511 patients stratified by transcriptomic class. (同上,无复发生存期 生存曲线)
e, f Clinicopathological information and selected gene expression signatures for all patients stratified by transcriptomic class. Samples are ordered after increasing silhouette score within each class (lowest to highest class correlation). CIS carcinoma in situ, EORTC European Organisation for Research and Treatment of Cancer, EAU European Association of Urology, MIBC muscle-invasive bladder cancer, EMT epithelial-mesenchymal transition. (转录组分类的,所有患者的临床病理信息、及选定的基因表达特征,二者的信息映射。样本在每个类别中增加轮廓分数后排序(从最低到最高类别相关性)。CIS原位癌,EORTC欧洲癌症研究和治疗组织,EAU欧洲泌尿学协会,MIBC肌肉浸润性膀胱癌,EMT上皮-间质转化) (比如EMT基因集合,在各个样本中的表达值做加和?)
g RNA-based immune score and immune-related gene expression signatures for all patients stratified by transcriptomic class. (转录组分类的所有患者的RNA免疫评分和免疫相关基因表达特征)
h Regulon activity profiles for 23 transcription factors. Samples are ordered after increasing silhouette score within each class (lowest to highest class correlation). Regulons (rows) are hierarchically clustered. (23个转录因子的调控活性图谱。样本在每个类别中增加轮廓分数后排序(从最低到最高类别相关性)。规则(行)是层级聚类的)
i Regulon activity profiles for potential regulators associated with chromatin remodeling. The most-upregulated regulons within each class are shown. Regulons are hierarchically clustered. P-values were calculated using two-sided Fisher’s exact test for categorical variables, Kruskal–Wallis rank-sum test for continuous variables and two-sided log-rank test for comparing survival curves. Source data are provided as a Source data file. (与染色质重塑相关的潜在调控因子的调控活性谱。每个类别中最受限制的规则显示出来。规则是层级聚类的。P值的计算采用分类变量的双侧Fisher精确检验,连续变量的Kruskal-Wallis秩和检验,生存曲线的比较采用双侧log-rank检验。源数据作为源数据文件提供)
图2 NMIBC中拷贝数的变化
a 根据基因组类别 (Genomic class, GC) 1-3分层的473个肿瘤的全基因组拷贝数图。增益(增益+高平衡增益)和损失(损失+高平衡损失)汇总在染色体带面板的左侧。EORTC欧洲癌症研究与治疗组织,EAU欧洲泌尿外科协会,MIBC肌肉浸润性膀胱癌。
b 426例按基因组分类的无进展生存期(PFS) Kaplan-Meier图。
c 399例按基因组分类的患者无复发生存期(RFS) Kaplan-Meier图。
d EORTC高危评分(n = 163)按基因组分类分层的患者的PFS Kaplan-Meier图。p值的计算采用双侧log-rank检验。源数据作为源数据文件提供。
Fig. 3 Genomic alterations associated with transcriptomic classes. (与转录组分类相关的基因组改变)
a Genomic classes (GCs) compared to transcriptomic classes (n = 303). 两个组学分类方式的交叉分布展示、统计检验。
b. 12-gene qPCR-based progression risk score compared to GCs. Colors indicate transcriptomic classes.
c Kaplan–Meier plot of progression-free survival (PFS) for 154 patients (including only class 2a and 2b tumors) stratified by GC.
d. Number of RNA-derived mutations according to transcriptomic classes.
e Landscape of genomic alterations according to transcriptomic classes. Samples are ordered after the combined contribution of the APOBEC-related mutational signatures. Panels: RNA-derived mutational load, relative contribution of four RNA-derived mutational signatures (inferred from 441 tumors having more than 100 single nucleotide variations), selected RNA-derived mutated genes, copy number alterations in selected disease driver genes (derived from SNP arrays). Asterisks indicate p-values below 0.05. Daggers indicate BH-adjusted p-values below 0.05.
f. Comparison of RNA-derived single nucleotide variations to whole-exome sequencing (WES) data from 38 patients for 11,016 mutations in all genes, 280 mutations in the genes most frequently mutated or differentially affected between the classes (n = 82, Supplementary Fig. 5b) and 93 mutations in 19 selected bladder cancer genes (Fig. 3e). Only mutations with > 10 reads in tumor and germline DNA were considered and a mutation was called observed when the frequency of the alternate allele was above 2%.
g. Genomic alterations significantly enriched in one transcriptomic class vs. all others.
h Overview of p53 pathway alterations for all tumors with available copy number data and RNA-Seq data (n = 303).
i Amount of genome altered according to p53 pathway alteration. intact (完好无损的)
j Number of mutations according to mutations in DNA-damage response (DDR) genes (including TP53, ATM, BRCA1, ERCC2, ATR, MDC1).
k. RNA-based immune scoreaccording to GCs.
l RNA-derived mutational load according to GCs.
m Relative contribution of the APOBEC-related mutational signaturesaccording to transcriptomic class.
(采用的统计检验方法等) P-values were calculated using two-sided Fisher’s exact test for categorical variables, Kruskal–Wallis rank-sum test for continuous variables and twosided log-rank test for comparing survival curves. For all boxplots, the center line represents the median, box hinges represent first and third quartiles and whiskers represent ± 1.5× interquartile range. Source data are provided as a Source data file.
Fig. 4 Spatial proteomics analysis of tumor immune contexture. a Multiplex immunofluorescence staining with Panel 1 (CD3, CD8, and FOXP3) of tumors with high- and low immune infiltration with magnifications of T helper cells (CD3+, CD8− and FOXP3−), a cytotoxic T lymphocyte (CTL; CD3+, CD8−, FOXP3−) and a regulatory T cell (Treg; CD3+, CD8− and FOXP3+). Yellow dashed lines divide the tumor tissue into parenchymal and stromal regions. Scale bar: 20 µm. All protein measurements were performed once for each distinct sample. b Spatial organization of immune cell infiltration and antigen recognition/escape mechanisms (MHC class 1 and PD-L1) with associated data for genomic class, transcriptomic class, and recurrence rate. The immune cells and immune evasion markers are defined as the percentage of positive cells in the different regions (stroma and parenchyma) and normalized using zscores, (1) z ¼ ðxμÞ σ . Columns are sorted by the degree of immune infiltration into the tumor parenchyma in descending order from left to right. c Immune infiltration stratified by transcriptomic class. Immune infiltration is defined as the percentage of total cells in the parenchyma classified as immune cells. The p-value was calculated using two-sided Wilcoxon rank-sum test. d Immune infiltration stratified by recurrence rate. The p-value was calculated by the one-sided Jonckheere–Terpstra test for trend. e Kaplan–Meier plot of recurrence-free survival (RFS) for patients with tumors with few genomic alterations (GC1 + 2) stratified by immune infiltration. P-value was calculated using two-sided log-rank test. f Distribution of CK5/6 and GATA3 positive carcinoma cells stratified by transcriptomic class. Each column represents a patient. The p-value reflects the difference in CK5/6 expression across classes and was calculated by chi-squared test. For boxplots, the center line represents the median, box hinges represent first and third quartiles and whiskers represent ± 1.5× interquartile range. Source data are provided as a Source data file.
Fig. 5 Prediction models and summary characteristics of classes. a Overview of hazard ratios calculated from univariate Cox regressions of progressionfree survival using clinical and molecular features. Black dots indicate hazard ratios and horizontal lines show 95% confidence intervals (CI). Asterisks indicate p-values below 0.05 and the sample sizes, n, used to derive statistics are written to the right. CIS carcinoma in situ, EORTC European Organisation for Research and Treatment of Cancer, EAU European Association of Urology. b Receiver operating characteristic (ROC) curves for predicting progression within 5 years using logistic regression models (n = 301, events = 19). Asterisks indicate significant model improvement compared to the EORTC model (Likelihood ratio test, BH-adjusted p-value below 0.05). AUC area under the curve, CI confidence interval. c Summary characteristics of the transcriptomic classes. Molecular features associated with the classes are mentioned, and suggestions for therapeutic options with potential clinical benefit are listed. MIBC muscle-invasive bladder cancer, EMT epithelial-mesenchymal transition, CTLs cytotoxic T lymphocytes. Source data are provided as a Source data file.
Fig. 6 Validation of transcriptomic classes in independent cohorts. a Summary of classification results and stage distribution for all tumors, tumors with microarray data and tumors with RNA-Seq data (1228 tumors were classified in total and 1225 of these were assigned to a class). b Association of tumor stage, tumor grade and FGFR3 and TP53 mutation status with transcriptomic classes. P-values were calculated using two-sided Fisher’s exact test. c Kaplan–Meier plot of progression-free survival (PFS) for 511 patients stratified by transcriptomic class. The p-value was calculated using two-sided logrank test. d Association of regulon activities (active vs. repressed status) with transcriptomic classes in the UROMOL cohort (including samples with positive silhouette scores, n = 505) and transcriptomic classes in the independent cohorts (pooled). The heatmap illustrates BH-adjusted p-values from two-sided Fisher’s exact tests. e Pathway enrichment scores within transcriptomic classes in the UROMOL cohort (including samples with positive silhouette scores, n = 505) and transcriptomic classes in the independent cohorts (pooled). Asterisks indicate significant association between pathway and class (one class vs. all other classes, two-sided Wilcoxon rank-sum test, BH-adjusted p-value below 0.05). Triangles indicate direction swaps of pathway enrichment in the independent cohorts compared to the UROMOL cohort. GSVA gene set variation analysis. Source data are provided as a Source data file.
文章3:多组学分析揭示肿瘤突变负荷对肝癌预后的价值
日期: 2021
期刊:Cancer Cell Int (IF=6.5)
链接:doi.org/10.1186/s12935-021-02049-w
整篇文章似乎是:转录组分析转录组,WES分析WES,二者未做关联分析
摘 要
背景: 肝细胞癌 (HCC)是世界上第6种具有高侵袭性特征的常见恶性肿瘤。肿瘤突变负荷 (Tumor mutation burden, TMB)是多种肿瘤免疫治疗反应性的指标。然而,TMB在肿瘤免疫微环境 (TIME)中的作用尚不清楚。
方法: 采用“maftools”软件包对突变数据进行分析。采用加权基因共表达网络分析(WGCNA)确定与TMB值相关的候选模块和显著基因。采用R软件包“limma”对不同水平TMB亚组进行差异分析。基因本体 (GO)富集分析采用“clusterProfiler”、“enrichment plot”和“ggplot2”软件包实现。通过系统的生物信息学分析,建立了风险评分特征。进一步分析KM生存曲线和受试者工作特征 (ROC)曲线,以判断预后的有效性。为了描述TIME的综合上下文,我们使用了XCELL、TIMER、QUANTISEQ、MCPcounter、EPIC、CIBERSORT和CIBERSORT- abs算法。此外,进一步探讨了风险评分在免疫检查点封锁 (ICB)免疫治疗中的潜在作用。实时定量PCR检测HTRA3的表达。
结果: TMB值与老年、男性、早期T状态呈正相关。共筛选到75个TMB相关基因与差异表达基因 (DEGs)的交集基因,并富集于细胞外基质相关通路。基于3个中枢基因的风险评分显著影响总生存 (OS)时间、免疫细胞浸润和ICB相关中枢目标。外部试验组验证了风险评分对预后的影响。构建风险临床图,供临床应用。进一步研究证实HTRA3是肝癌预后的影响因素。最后,TP53突变与风险评分相关,不影响基于风险评分的预后预测。
结论: 综合分析TMB可能会为突变驱动的肿瘤发生机制提供新的见解,进一步有助于个性化的免疫治疗和肝癌的预后预测。
关键词: 肿瘤突变负荷,肝癌,肿瘤免疫微环境,免疫治疗
Fig. 1 Landscape of somatic mutation profiles in HCC samples. A Mutation information of each gene in each sample was shown in the waterfall plot, where different colors with specific annotations at the bottom meant the various mutation types. The barplot above the legend exhibited the number of mutation burden. B Cohort summary plot displaying distribution of variants according to variant classification, type and SNV class. Bottom part (from left to right) indicates mutation load for each sample, variant classification type. A stacked barplot shows top ten mutated genes. C TCGA HCC样品降雨图,每个点都是一个根据SNV类型编码的突变颜色 (Rainfall plot of TCGA HCC sample TCGA−UB−A7MB−01A−11D−A33Q−10. Each point is a mutation color coded according to SNV class.) D 显示肝癌中SNV分布的 (核苷酸)转变及反转,可分为6个转变和反转事件。堆叠条形图显示了MAF文件中每个样本的突变谱分布 (Transition and transversion plot displaying distribution of SNVs in HCC classified into six transition and transversion events. Stacked bar plot (bottom) shows distribution of mutation spectra for every sample in the MAF file. E 突变基因间的一致性和排他性联系 (The coincident and exclusive associations across mutated genes). TMB与年龄的相关性 (The correlation of TMB with age) (F), gender (G) and T status (H)
Fig. 2 Construction of weighted gene co-expression network of HCC samples.
A Sample dendrogram and clinical-traits heatmap was plotted. B Selection of the soft threshold made the index of scale-free topologies reach 0.90 and analysis of the average connectivity of 1–20 soft threshold power. C TMB-related genes with similar expression patterns were merged into the same module using a dynamic tree-cutting algorithm, creating a hierarchical clustering tree. D Heatmap of the correlations between the modules and TMB value (traits). Within every square, the number on the top refers to the coefficient between the TMB level and corresponding module, and the bottom is the P value
Fig. 3 Differential analysis of gene expression data in high- and low-TMB groups and enrichment pathway annotation. A Volcano plot was delineated to visualize the DEGs. Red represented upregulated and green represented downregulated. B Heatmap of top 40 DEGswas drawn to reveal different distribution of expression state, where the colors of red to blue represented alterations from high expression to low expression. C Venn diagram of the hub genes from WGCNA blue module and DEGs. Pathway enrichment analyses of TMB hub genes. D Gene Ontology (GO) enrichment analysis of naïve B cells-related genes: biological processes (BP), cellular components (CC) and molecular function (MF). E KEGG enrichment analysis of naïve B cells-related genes.
Fig. 4 发现组预后风险特征的验证 (Validation of the prognostic risk signature in discovery group). A Heatmap presents the expression pattern of three hub genes in each patient. B 多基因签名风险评分分布 (Distribution of multi-genes signature risk score). C The survival status and interval of HCC patients. D Kaplan–Meier curve analysis presenting difference of overall survival between the high-risk and low-risk groups. E 体细胞突变数分布 (Distribution of somatic mutation count). F 总生存期的单因素Cox回归分析 (Univariate Cox regression analyses of overall survival). G 多因素Cox回归分析总生存期 (Multivariate Cox regression analyses of overall survival).
Fig. 5 预后风险特征的临床意义 (Clinical significance of the prognostic risk signature). A 热图显示每个样本的临床特征,及相应的风险评分的分布情况。高、低风险评分组临床变量亚型的比例 (Heatmap presents the distribution of clinical feature and corresponding risk score in each sample. Rate of clinical variables subtypes in high or low risk score groups). B Age, C Gender, D WHO grade, E clinical stage, F T status, G N status and H M status
文章4:1,699例儿科白血病和实体瘤的泛癌基因组和转录组分析
接收: Nature (Letter)
时间:2018
链接:doi:10.1038/nature25795
摘 要
跨多种癌症类型的分子畸变 (Aberration)分析,被称为泛癌分析,确定在不同谱系的癌细胞中失调的关键生物过程的共性和差异。泛癌分析已用于1~4岁成人癌症,但未用于儿童癌症,这些癌症通常发生在发育中的中胚层组织而不是成人上皮组织。在此,我们对6种组织类型的1,699例儿童白血病和实体肿瘤进行了体细胞改变的泛癌症研究,包括单核苷酸变异、小插入或缺失、结构变异、拷贝数改变、基因融合和内部串联重复,在统一的分析框架下处理了全基因组、全外显子组和转录组测序数据。我们报告了儿科癌症中的142个驱动基因,其中只有45%与成人泛癌症研究中发现的一致;拷贝数改变和结构变异构成了大多数 (62%)的事件。研究确定了11个全基因组突变特征,其中一个是由于8例非整倍体白血病中暴露于紫外线所致。34%的蛋白编码突变检测到突变等位基因的转录,20%表现出等位基因特异性表达。这些数据为儿科癌症提供了一个全面的基因组架构,并强调了儿科癌症特异性发展精确治疗的必要性。
对儿童肿瘤组临床试验中登记的1,699名儿童癌症患者的配对肿瘤和正常样本进行了分析,包括689例B系急性淋巴母细胞白血病 (B-ALL), 267例T-ALL, 210例急性髓系白血病 (AML), 316个神经母细胞瘤 (NBL), 128个Wilms肿瘤和89个骨肉瘤 (扩展数据图1a-c)。所有肿瘤标本都是在最初诊断时获得的,98.5%的患者年龄在20岁或更小 (参见方法,扩展数据图1d)。
Extended Data Figure 1 | Cohort description and workflow. a, Venn diagram of samples analysed by whole-exome (WES), whole genome (CGI) and whole transcriptome (RNA-seq) sequencing in this cohort. b, c, Sample-level sequencing status of the entire cohort (b) and those with WGS data (c, SNP6 for T-ALL). d, Age distribution for each histotype. Median, first and third quartiles are indicated by horizontal bars. Sample sizes are indicated in parentheses. Percentage of cases with age over 20 years are indicated. e, Analytical workflow. The tumour/normal BAM files of WES data were analysed by our in-house pipeline followed by manual quality control. The mutation annotation format files generated by CGI were downloaded from TARGET Data Matrix (see Methods) and analysed by a pipeline developed for this dataset, including SNVs, indels and structural variants. CAN and LOH were analysed using read counts of germline SNPs in the mutation annotation format files. Manual quality control was also performed. For RNA-seq data, the FASTQ files were re-mapped and fusions and ITDs were analysed with CICERO. The resultant mutations were analysed by GRIN (SNVs, indels, CNAs, structural variants and fusions) and MutSigCV (SNVs and indels) to discover 142 recurrently mutated genes. f, One representative sample with chromothripsis for each histotype. CNAs are shown in the inner circle, orange indicates copy gain and blue indicates copy loss. Intra- and interchromosomal rearrangements are shown as green and purple curves, respectively.
体细胞突变率和特征。每个组织类型的样本大小显示在括号中。来自WGS的非编码SNV的突变率(a),来自WGS和WES的编码SNV的突变率(b)。红线表示中位数。a和b分别缩放到WGS (n=651)、WGS或WES (n= 1639)的样本总数。c,从WGS和T-ALL WES数据中识别的突变特征及其在每种组织类型中的贡献。d,各组织类型中代表性样本的突变谱。超变异体(高于相应组织类型平均比率三个标准差/SD)用星号标记。e、各组织类型中各特征MAF的均值和标准差 (Mean and s.d. of MAF of each signature in each histotype)。
图2 | 儿科癌症中候选驱动基因。a,前100个反复突变的基因: 每种组织类型的病例数以与图例相同的颜色显示。星号表示既往成人泛癌分析中未报道的基因。b、两两关系有统计学意义 (P < 0.05; 双侧Fisher精确检验)在每个组织类型中是否共存(红色)或排他(蓝色)。Q < 0.05的基因对被标记为暗红色(同时发生)或深蓝色(不同时发生),以解释错误发现率。仅在WGS + WES样本中检测到的显著性用星号标记。括号中显示的是突变样本的数量。
图3 | 儿童癌症中生物学过程与体细胞改变
a,显示每种组织类型中至少有一个驱动因子改变的肿瘤的百分比。WGS分析的肿瘤可能有点突变(浅灰色),CNAs或结构变异(深灰色),或两者兼有(黑色)。对于T-ALL, CNAs来源于SNP阵列。b,每种组织类型中21种生物通路中发生体细胞改变的肿瘤百分比; 组织类型排序如a所示。每个通路的彩色部分表示3个TCGA泛癌症研究中缺失的基因变异的百分比。c, RAS、酪氨酸激酶和PI3K通路中组织型突变的发生。
Extended Data Figure 4 | Example driver mutations.
a, Diverse mutation types of STAG2. Variants are coloured by histotype as in Fig. 2. Circles and half-moons represent mutations and structural alterations, respectively. Bottom panel shows RNA-seq for an SNV at the −8 position of STAG2 exon 7, which created a de novo splice site resulting in an out-of-frame transcript. b–d, Truncating mutations by deletion or ITD. e, Cohesin complex detected by HotNet2 analysis. f, Samples with mutations in cohesion complex. g–k, Selected examples of singleton oncogenic activation caused by high level amplifications including CDK4 (g), PDGFRA (h), and YAP1 (i) with FPKM and histotype-wise ranks indicated, as well as recurrent co-amplification of MYCN-ALK in two NBL samples (j, k). l, Recurrent MAP3K4 mutation with structural model in N lobe (m). Location of the mutation p.G1366R is indicated by a magenta sphere and the alteration side chain is modelled as a stick. Known activating alterations (p.I1361M and p.M1415I) are shown as teal spheres. GADD45 binding (A1), kinase inhibitor (A2), and kinase domains (B1, B2) are indicated in l. n, ITD in UBTF. o, Fusion of FEV. p, q, Mutations in novel driver genes NIPBL and LEMD3.
文章5:成人弥漫性胶质瘤的纵向分子轨迹 (Longitudinal molecular trajectories)
日期:20 November 2019
期刊:Nature
链接:doi.org/10.1038/s41586-019-1775-1
摘 要
在成人弥漫性胶质瘤患者中,导致普遍治疗耐药的进化过程尚不清楚。在这里,我们分析了222名成年胶质瘤患者的分离的DNA测序数据和相匹配的临床注释。通过分析弥漫性胶质瘤3种主要亚型的突变和拷贝数,我们发现在疾病初始阶段检测到的驱动基因在复发时保留,而很少有证据表明复发特异性基因改变。在不同的胶质瘤亚型中,使用烷基化剂治疗可导致高突变表型的不同率,高突变与总生存期的差异无关。在复发性胶质瘤中经常发现获得性非整倍体,其特征是IDH突变,但不存在染色体臂1p/19q的共缺失,并进一步与细胞周期的获得性改变和不良结果相融合。每个肿瘤的克隆结构随着时间的推移保持相似,但亚克隆选择的存在与生存率下降有关。最后,初始和复发胶质瘤之间的免疫编辑水平没有差异。总的来说,我们的研究结果表明,最强的选择压力发生在胶质瘤早期发展阶段,而目前的治疗方法在很大程度上以随机的方式塑造了这一演变。
方 法
GLASS数据集包括未发表和已发表的测序数据,如补充表1所示。在队列中有来自436个胶质瘤样本 (200例患者)的外显子组,来自165个胶质瘤样本 (78例患者)的全基因组数据,78个胶质瘤样本(38例患者)的外显子组/全基因组数据重叠。所有患者均可获得匹配的胚系测序。该数据集包括257组至少两个时间间隔的肿瘤样本,17例独立复发,19例患者至少有两个地理上不同的肿瘤部分。更具体地说,该数据集包括211例原发性胶质瘤的外显子组或全基因组测序数据,234例首次复发,32例第2次复发,11例第3次复发和1例第4次复发 (补充表7)。
文章6:透过GLASS观察胶质瘤:弥漫性胶质瘤的分子进化和胶质瘤纵向分析联盟 (Glioma Longitudinal Analysis Consortium)
日期: 2018
期刊:Neuro-Oncology
链接:doi:10.1093/neuonc/noy020
摘 要
成人弥漫性胶质瘤 (Adult diffuse gliomas)是一组不同的脑肿瘤,对患者和他们的家人造成高度的情感伤害。癌症基因组图谱和类似的项目为诊断胶质瘤的体细胞改变和分子亚型提供了全面的理解。然而,在疾病进展过程中,胶质瘤经历了显著的细胞和分子进化。我们回顾了当前关于原发肿瘤和疾病复发后的基因组和表观遗传学异常的知识,强调了文献中的空白,并阐述了需要一个新的多机构的努力来弥合这些知识空白,以及胶质瘤纵向分析联盟 (GLASS)如何旨在系统地编目胶质瘤的纵向变化。GLASS计划将为胶质瘤向一种致命表型的进化提供必要的洞见,有可能揭示可靶向的 (肿瘤)弱点,并最终改善有需要的患者群体的结果。
文章7:全基因组和转录组分析在儿童和青年成人癌症中的可行性
(侧重临床诊断和应用)
接收: Nature Communications
时间/作者:2022/纪念斯隆凯特琳癌症中心儿科
链接:doi.org/10.1038/s41467-022-30233-7
摘 要
肿瘤全基因组和转录组测序 (cWGTS)在肿瘤学中的应用日益受到重视。然而,cWGTS的实施受到以下方面的挑战:需要在临床相关的时间框架内提供结果,对检测灵敏度的担忧,结果的报告和优先排序。在一项前瞻性研究中,我们制定了一个工作流程,在9天内报告全面的cWGTS结果。将cWGTS与诊断组分析进行比较,表明cWGTS具有在一个工作流程中捕获所有临床报告的具有同等敏感性的突变的潜力。基准测试确定了临床WGS测序的最佳深度至少为80×。结合胚系、体细胞DNA和RNA-seq数据,可以实现数据驱动的变异优先排序和报告,报告的致癌结果比标准护理的患者多54%。这些结果确立了实施cWGTS作为临床肿瘤学综合测试的关键技术考虑。
Fig. 1 End-to-end cWGTS workflow. a Schematic representation of the end-to-end cWGTS workflow, with information on median-time duration (in hours) for each step, as determined by a time trial over four consecutive batches containing n = 16 tumors and representation of dedicated resources necessary to execute the workflow. b Comparison of bestreported turnaround times in literature, from sample collection to results ready for tumor board review. For our study, we show an orange bar denoting median time for n = 16 samples with minimum and maximum times denoted with the error bar. These samples were processed post optimization.
Fig. 2 Analytical validity of cWGTS for clinical biomarkers. a The left barplot depicts the proportion of patients with therapy-informing, oncogenic, or no relevant findings reported by MSK-IMPACT as defined by OncoKb (Levels 1–4). The right barplot shows the breakdown (0,1,2) of the highest level of OncoKb annotation in the study cohort. b Barplot demonstrating breakdown of the highest OncoKb level by the number of informative biomarkers in study cohort. c Barplot demonstrating breakdown of the highest OncoKb level by disease class. d Scatterplot shows the comparison of variant allele frequency (VAF) of MSK-IMPACT variants as reported by MSK-IMPACT (x axis) and absolute VAF estimates by pileup in WGS data (y axis) (Pearson correlation). Discrepant mutations are observed along the x axis. Mutations are color-coded by call status, where Both is called in both assays and ITH is mutations that were not called in higher- depth resequencing and/or had proportion test p-value < 0.05. e Barplot demonstrating breakdown of MSK-IMPACT mutations, observed in both WGS and MSK-IMPACT or only MSK-IMPACT (ITH). f Validation of oncogenic fusions reported by MSK-IMPACT/MSK-Fusion in cWGTS. The asterisk indicates that the SS18-SSX1 that was reported by MSK-Fusion was reported as SS18-SSX2 by RNA-seq and supported by spanning reads in WGS. Main oncotree disease code listed underneath for each patient (ARMS alveolar rhabdomyosarcoma, CHS chondrosarcoma, DLGT diffuse leptomeningeal glioneural tumor, DSRCT desmoplastic small round-cell tumor, ES Ewing sarcoma, MBL medulloblastoma, MFH undifferentiated pleomorphic sarcoma/malignant fibrous histiocytoma/high-grade spindle-cell sarcoma, RCSNOS round-cell sarcoma, NOS, SYNS synovial sarcoma, US undifferentiated sarcoma, USPC undifferentiated sarcoma of the peritoneal cavity). Source data for panels a–e and f are provided in Supplementary Data 4 and 6.
Fig. 3 Assessment of optimal coverage for WGS. a Barplots demonstrating sensitivity of variant detection and 95% confidence intervals (error bars) by coverage depth (100x, 80x, 60x, and 30–40x) from left to right for: 1. clinically relevant events detected by MSK-IMPACT and WGS (n = 220), 2. genomewide SNVs, 3. genome-wide indels, and 4. genome-wide SVs. Only data from samples with original median coverage >100x (n = 32) are shown. Red dots indicate overall sensitivity of all mutations across all BAMs at the same subsampling level. b Histograms of variant allele frequencies for each subsampling level for a representative sample in the study cohort (H135973), showing loss in sensitivity to detect subclonal mutations at lower sequencing depth of coverage. c Scatterplot of effective local coverage vs VAF in subsampled BAMs for the clinically relevant calls from MSK-IMPACT. Variants called in subsampled BAMs are shown with circles, while the missed variants are denoted with X’s. Trendline shows the cumulative binomial distribution for obtaining at least 2 variant reads, given the effective coverage and variant allele fraction. Source data for panels a, c are provided at the data repository. Raw data for panel b can be accessed at the dbGAP study.
Fig. 4 Additional relevant findings detected by cWGTS as compared with standard of care. a Heatmap of additional relevant findings by cWGTS colored by what technology (WES, WGS, and RNA-seq) may detect each event. Columns represent patients, while rows are clinical event types. The asterisks for Germline indicate pathogenicity supported by mutational signatures. b (top) Stacked-bar breakdown of patients with clinically relevant findings by assay. The blue areas (solid or meshed) represent patients with relevant findings from targeted sequencing (RNA and DNA), while the orange areas (solid or meshed) are for patients with findings from cWGTS. The blue/orange mesh indicates patients that had relevant findings from both targeted sequencing and WGTS. (bottom) Stacked-bar breakdown of findings specific to cWGTS from the patients in the orange section (solid or meshed) from top. The relevant findings are colored by event type. SV, structural variant. TMB, tumor mutation burden. MSI, microsatellite instability. Small Mut, small mutations, including substitutions and insertion/deletions. Viral, viral integration. Source data for panels a, b are provided in Supplementary Data 3.
Fig. 5 Integration of DNA and RNA findings for variant annotation. a Top panel shows absolute copy number on the y axis and the structural variants (SVs) that result in PAX3-FOXO3 fusion in patient H134768. Lower panel displays RNA fusion product created by the corresponding genomic SVs. b tSNE clustering of methylation data from rhabdomyosarcoma samples color-coded by disease subtype (ARMS: alveolar, ERMS: embryonal, SCRMS: spindle cell, and SRMS: sclerosing). The patient harboring the PAX3-FOXO3 fusion clusters with the ARMS samples. c Top panel shows the chromoplexy event among chromosomes 6, 9, and 18, resulting in the localization of NFIB enhancer to the MYB locus in patient H133676. Lower panel displays H3K27me3 chromatin marks from Drier et al., Nature Genetics 2016.
d Boxplot shows the MYB expression in transcripts per million (TPM) across the cohort. Center line indicates the median and whiskers extend within +/−1.5x the interquartile range (IQR) from the box. The patient with MYB-NFIB event (H133676) is highlighted in orange, demonstrating that the SV event in panels c associates with overexpression of MYB, validating the SV as an enhancer-hijacking event. e Diagram of SV events targeting TP53 gene body in osteosarcoma patients (n = 12, the 13th patient’s event breakpoints fall outside of the gene body). SVs are shown as arrows with absolute copy number on the y axis (gray dots) overlaid over the exonic structure of TP53 (TRA: translocation, DUP: duplication, DEL: deletion, INV: inversion). f Boxplot shows the comparison of TP53 expression in RNA between TP53-rearranged samples and those without any rearrangement with a center line indicating the median and whiskers extending within +/−1.5 x the IQR (two-sided Mann–Whitney U test, p = 1.645e-03). Raw data for panel a–c can be accessed at the dbGAP study. Source data for panel e are provided in Supplementary Data 9. Source data for panels d, f are provided at the data repository.
Fig. 6 Genome-wide distribution and patterns of somatic mutations for four different patients. a Neuroblastoma patient (H135421) harboring a pathogenic germline MUTYH variant (c.924 + 3A > C). b Immature teratoma patient (H135466) with a pathogenic germline PMS2 mutation (c.538- 1G > C). c Malignant peripheral nerve sheath tumor patient (H135073) harboring a germline PMS2 variant of unknown significance (VUS) (p.W841*). For each patient, the top panel is a Circos plot showing the different types of somatic mutations along the genome. The outermost ring shows the intermutation distance for all SNVs color-coded by the pyrimidine partner of the mutated base. The middle ring shows small insertions (green) and deletions (red). The innermost ring shows copy number changes, and the arcs show SVs. Middle panel is a barplot showing the absolute number of mutations attributed to the five mutational signatures with the highest exposure in the tumor. Bottom panel is a barplot showing the 96 trinucleotide contexts of SNVs. d Genome-wide distribution and patterns of somatic mutations identified in the patient outside the cohort with recurrent osteosarcoma (H201472). WGS results show the sample is hypermutated, with enrichment in SBS26, T > C mutations, repeat-mediated deletions, and MSI unstable. The patient was found to be harboring a pathogenic PMS2 variant (p.D699H) (repeat deletion: repeat-mediated deletion, m-homology: microhomologymediated deletion, deletion other: all other deletions, TRA translocation, DUP duplication, DEL deletion, INV inversion). Raw data for this figure can be accessed at the dbGAP study.
Fig. 7 Genome-wide mutational burden in the context of immunotherapy. a Distribution of coding tumor mutational burden (TMB) as assessed by WGS across the cohort (n = 114), colored by treatment status of the patient at the time of sampling. Dotted line indicates median-coding TMB (SNVs and indels) as previously reported by the Zero Childhood Cancer study. Patients are grouped by disease category (NB: neuroblastoma, CNS: central nervous system, C: carcinoma, WT: Wilms tumor, Germ: germ cell tumor, H: hepatoblastoma, O: other). Carcinoma patients C1 and C2 who responded to immunotherapy are labeled. b Distribution of structural variant (SV) (right) and gene fusion (left) burden across the samples with both WGS and RNA-seq available (n = 101). Patient C2 had a poor-quality RNA sample, so clonal fusions from another time point from the same patient are shown. c (top) Genome-wide distribution and patterns of somatic mutations for tumor C1 (H135022), patient with metastatic adrenocortical carcinoma, depicting high SV burden. Circos plots are shown as described in Fig. 6. PET imaging shows resolution of a large pulmonary metastatic lesion (red arrow) following treatment with nivolumab and ipilimumab. d Genome-wide distribution and patterns of somatic mutations for H135462, a 14-year-old with relapsed refractory poorly differentiated clearcell carcinoma with high TMB and SV burden. Circos plots are shown as described in Fig. 5. PET imaging shows resolution of multiple metastatic lesions (red arrows) following treatment with pembrolizumab. Source data for panels a and b are provided at the data repository. Raw data for panel c, d can be accessed at the dbGAP study.
Fig. 8 Comparison of WGS data from matched fresh frozen tumor tissue and cfDNA. a Coverage values ordered by estimated tumor context in cfDNA. b Estimates of tumor content. c Barplots showing the proportion of de novo mutation calls in cfDNA that are present in the matched fresh frozen tumor broken down by variant type. cfDNA samples with no high-confidence SVs denoted with an asterisk. d Genome-wide distribution and mutation patterns of matched fresh frozen (left) and cfDNA (right) samples for H158182. Circos plots are shown as described in Fig. 6. e Individual-level clonality analysis for H158182. (left) Scatterplot of cancer cell fraction (CCF) values for all substitutions color-coded by the estimated cluster. (middle) Phylogenetic tree representation of clusters annotated with clinically relevant variants. (right) Clone-level mutational signature analysis showing the proportion of mutations attributed to each mutational signature with total numbers of mutations in each cluster shown on the right. Whereas drivers associated with these clones could not be determined, cfDNA-specific SNV calls recapitulated mutation signatures in the FF sample, and were enriched for platinum-associated mutational signatures pointing to the existence of therapy-exposed tumor subclones in circulation. (repeat deletion: repeat-mediated deletion, mhomology: microhomology-mediated deletion, deletion other: all other deletions, TRA: translocation, DUP: duplication, DEL: deletion, INV: inversion). Source data for panels a, b are provided in Supplementary Data 11. Source data for panel c are provided at the data repository. Raw data for panels d, e can be accessed at the dbGAP study.
文章8:神经胶质瘤的发展是由遗传进化和微环境相互作用形成的
来源:2022,Cell
链接:doi.org/10.1016/j.cell.2022.04.038
摘 要
导致弥漫性胶质瘤治疗阻力的因素仍不清楚。为了确定治疗相关的细胞和基因变化,我们分析了304例异柠檬酸脱氢酶(IDH)野生型和IDH突变型胶质瘤成年患者的肿瘤对的RNA和/或DNA测序数据。肿瘤以不同的方式复发,依赖于IDH突变状态,并可归因于组织学特征组成、体细胞改变和微环境相互作用的改变。在两种胶质瘤亚型复发时,高突变和获得性CDKN2A缺失与增殖的肿瘤细胞增加相关,反映了肿瘤的活跃生长。野生型IDH肿瘤复发时更具侵袭性,其肿瘤细胞表现出神经元信号程序表达增加,这反映了神经元相互作用在促进胶质瘤进展中的可能作用。间充质转化与骨髓细胞状态相关,骨髓细胞状态是由与肿瘤细胞的特异性配体-受体相互作用定义的。总的来说,这些复发相关的表型是改变疾病进展的潜在目标。
Figure 1. Longitudinal cellular heterogeneity in glioma
(A) Each column represents an initial (I) and recurrent (R) tumor pair. Pairs are arranged based on the combined representation of the proneural and mesenchymal subtypes in their initial tumors. The first track indicates whole-exome (WXS) or whole-genome sequencing (WGS) data availability. The next three tracks indicate bulk subtype signature representation. Stacked bar plots indicate cell-state composition based on the single-cell-based deconvolution method, CIBERSORTx.
(B) Sankey plot indicating whether the highest-scoring transcriptional subtype changed at recurrence. Numbers in parentheses indicate the number of samples of each subtype: proneural (Pro.), classical (Class.), and mesenchymal (Mes.).
(C) Average cell-state composition of transcriptional subtypes (left) and initial and recurrent tumors by IDH status (right).
文章9:空间分辨率多组学破译恶性胶质瘤中肿瘤-宿主双向相互依赖关系
日期: 2022
期刊:Cancer Cell
链接:doi.org/10.1016/j.ccell.2022.05.009
摘 要
胶质母细胞瘤是中枢神经系统的恶性肿瘤,其特征是亚克隆多样性和发育层次的动态适应。这些肿瘤在空间背景下的动态重组的来源仍然难以捉摸。在本研究中,我们通过空间分辨的转录组学、代谢组学和蛋白质组学对胶质母细胞瘤进行了表征。通过破译患者间区域共享的转录程序,我们推断胶质母细胞瘤是由谱系状态的空间隔离组织起来的,并适应炎症和/或代谢刺激,让人想起成熟星形胶质细胞的反应性转化。代谢成像和成像质细胞分析技术的整合揭示了局部区域肿瘤与宿主的相互依赖性,从而产生空间独占的自适应转录程序。推断拷贝数的改变强调了与反应性转录程序相关的亚克隆的空间内聚组织,证实了环境压力导致选择压力。将胶质母细胞瘤干细胞植入人类和啮齿动物新皮质组织模拟各种环境的模型证实,其转录状态来源于对各种环境的动态适应。
图1 方法和队列概述。(A)空间数据集的工作流和队列说明(左)和使用的分析方法概述(右)。(B)所有整合的空间分辨转录组点的t-随机邻域嵌入(tSNE)图。颜色反映个体标本和患者。数字表示匿名的患者样本ID。字母表示组织的解剖起源。T,肿瘤; TI,肿瘤浸润性; TC,肿瘤的核心; C,皮层。(C)预测肿瘤细胞含量的工作流程概述(上)和tSNE图(下)。颜色表示预测的肿瘤细胞含量的百分比。(D)组织学界定区域的不同分辨率的例子。(E)基于ANN估计的stRNA-seq数据集中恶性斑百分比点图。在底部,barplot图解的组织区域分布的样本。
上述9篇文献下载链接 (提取码 ysx4)
https://pan.baidu.com/s/1BRC6B0GW2UxELN1pSPFzAg
机器学习
后台回复“生信宝典福利第一波”或点击阅读原文获取教程合集