10X单细胞、单核、空间转录组揭示肿瘤-微环境的空间结构与调控网络

hello,大家好,今天我们继续来分享关于10X空间转录组在肿瘤方面的研究,其实也和大家分享过很多次了,强调肿瘤正常边界的重要医学意义,而如果实现这个,就必须借助空间转录组技术来实现相应的研究,今天我们参考的文献在Spatially resolved transcriptomics reveals the architecture of the tumor-microenvironment interface,2021年11月发表于NC,影响因子14分,其实不知道大家注意到没有,随着时间的推移,借助空间转录组发表的文章影响因子也慢慢降低了,更多的要求多组学的分析,单细胞、空间、免疫荧光等要多技术结合才能发表好的文章,希望有条件的科研人员能够珍惜这个窗口期吧。

图片.png

Abstract

During tumor progression, cancer cells come into contact with various non-tumor cell types(其实就是肿瘤微环境), but it is unclear how tumors adapt to these new environments. Here, we integrate spatially resolved transcriptomics, single-cell RNA-seq, and single-nucleus RNA-seq to characterize tumor-microenvironment interactions at the tumor boundary(运用到的技术包括单细胞转录组、单细胞核转录组、空间转录组技术,关于三种技术的联合运用,大家可以参考我之前的文章10X单细胞转录组、单细胞核转录组、VDJ、空间转录组联合分析识别人肺组织的免疫细胞生态位). Using a zebrafish model of melanoma(黑色素瘤), we identify a distinct “interface” cell state where the tumor contacts neighboring tissues(肿瘤和正常区域的交界处,这个地方非常值得深入研究). This interface is composed of specialized tumor and microenvironment cells that upregulate a common set of cilia genes(纤毛基因), and cilia proteins are enriched only where the tumor contacts the microenvironment(在交界出富集). Cilia gene expression is regulated by ETS-family transcription factors, which normally act to suppress cilia genes outside of the interface. A ciliaenriched interface is conserved in human patient samples, suggesting it is a conserved feature of human melanoma. Our results demonstrate the power of spatially resolved transcriptomics in uncovering mechanisms that allow tumors to adapt to new environments.(肿瘤如何适应新环境的研究)。

Introduction

As tumors grow and invade into new tissues, they come into contact with various new cell types(肿瘤不断扩展的同时也不断接触新的细胞类型), but it is poorly understood how these cell–cell interactions allow for successful invasion and tumor progression. In melanoma, these interactions can occur between the tumor cells and a diverse number of cell types(其实就是肿瘤细胞跟各种细胞类型之间的交流差异是什么,为什么可以在众多其他细胞类型存在的条件下不断invade). In many cases, the tumor cells interact directly with stromal cells such as fibroblasts or immune cells.However, increasing evidence suggests that the repertoire of such
interactions is considerably broader, and can include cell types including adipocytes and keratinocytes. Many of these cell interactions can influence tumor cell behavior(看来是一个相互影响的过程)。

There are likely at least two levels of cell–cell interactions that are relevant to cancer: “microenvironmental” interactions in which the tumor cell directly interacts with adjacent non-tumor cells, and “macroenvironmental” interactions, in which the tumor cell indirectly interacts with more distant cells两种模式,直接接触和远程调控,两种之间肯定存在很多差异). The microenvironment is increasingly appreciated to play a major role in cancer phenotypes, including proliferation, invasion, metastasis,and drug resistance. However, it is debatable whether every cell type that a tumor interacts with is truly part of the microenvironment, since the mechanisms by which these cells influence tumor cell behavior are often unclear(各种细胞类型对肿瘤细胞行为的影响). This uncertainty is compounded by the fact that tumor cells themselves are highly heterogeneous(肿瘤的高度异质性), making it challenging to determine which subset of tumor cells are directly interacting with surrounding nontumor cells. The macroenvironment may also influence tumor progression, since the tumor cells can interact with other cells in the body at a distance, as recently demonstrated for metabolic coupling between melanoma cells and distant cells in the liver(看来肿瘤异质性是关键点,一些肿瘤细胞直接接触正常细胞,而另外一些肿瘤细胞远程调控细胞间的交流和维持自己的形态)。

A better understanding of the nature of these cell–cell interactions requires high resolution imaging and analyses of genes expressed by tumor cells as they interact with different cell types(单细胞空间两种技术都必不可少). While bulk and single-cell RNA-sequencing approaches have improved our ability to understand cell–cell interactions, these techniques require dissociation of the tissue of interest, resulting in a loss of spatial information. Thus, a comprehensive understanding of how tumor and surrounding cells interact in situ is lacking, at least in part due to the limitations of current RNAsequencing technologies细胞之间的临近交流和空间位置,非常重要的研究手段)。

Spatially resolved transcriptomics (SRT) has recently emerged as a way to address the limitations of both bulk and single-cell RNA-seq by preserving tissue architecture, while still profiling the genes expressed by the cell or tissue at high resolution. Current SRT techniques typically either use spatially-barcoded probes to capture and sequence mRNA from tissue sections, or multiple rounds of in situ hybridization, sequencing, and imaging to computationally reconstruct the transcriptional landscape of the cell. In situ hybridization-based SRT techniques allow the user to profile the transcriptional landscape of the cell at cellular or even subcellular resolution, whereas the resolution of techniques that capture and sequence mRNA from sections is limited by the diameter of each capture spot on the SRT array (for example, 55 μm with the current 10× Genomics Visium SRT technology, with a 45 μm gap between spots). However, to overcome the limited spatial resolution of SRT arrays, a number of computational methods to infer single cell resolved gene expression profiles have recently been developed, including SPOTlight(关于SPOTlight,大家可以参考我的文章10X单细胞空间联合分析之三----Spotlight,关于单细胞空间联合的方法汇总,大家可以参考我的文章10X单细胞空间联合分析之十一(CellTrek)) and Stereoscope. We recently developed a technique to integrate capture probe-based SRT and scRNA-seq to map the transcriptomic and cellular architecture of tumors. This provides a unique opportunity to understand the mechanisms that are driving the cell–cell interactions that occur between the tumor and its immediately adjacent microenvironment(这里还是要强调临近通讯,单细胞做通讯分析最大的问题就在这里)。

Here, we integrate SRT, single-cell RNA-seq, and single-nucleus RNA-seq to characterize the transcriptional landscape of melanoma cells as they interact with the immediately adjacent microenvironment近邻细胞交流,这个强调过多多次了,希望引起大家的足够重视). Using a zebrafish model of melanoma, we construct a spatially-resolved gene expression atlas of transcriptomic heterogeneity within tumors and surrounding tissues. We discover a histologically invisible but transcriptionally distinct “interface” region where tumors contact neighboring tissues, composed of cells in specialized tumor-like and microenvironment-like states(交界处的细胞类型的组成). We uncover enrichment of cilia genes and proteins at the tumor boundary, and find that ETS-family transcription factors regulate cilia gene expression specifically at the interface. We further demonstrate that this distinct “interface” transcriptional state may be conserved in human melanoma, suggesting a conserved mechanism that presents opportunities for halting melanoma invasion and progression.

Results

Spatially resolved transcriptomics reveals the architecture of the melanoma–microenvironment interface.

To investigate the transcriptional landscape of tumors and neighboring tissues in situ with spatial resolution, we processed frozen sections from three adult zebrafish with large, invasive BRAFV600E-driven melanomas for capture probe-based spatially resolved transcriptomics (SRT), using the 10× Genomics Visium platform(还是用的10X平台)。Although the size of the tissue section used is limited by the size of the Visium array (6.5mm2), zebrafish allow us the
unique advantage that a transverse section through an adult fish (~5mm diameter) fits in its entirety on the array. Zebrafish are thus one of the only vertebrate animals that can be used to study both the tumor and all surrounding tissues(肿瘤和其邻近区域) in their intact forms, without any need for dissection. Our SRT dataset contained transcriptomes for 7281 barcoded array spots across three samples, encompassing 17,317 unique genes. We detected approximately 1000–15,000 transcripts (unique molecular identifiers, UMIs) and 500–3000 unique genes per spot, with somewhat fewer UMIs/genes detected in sample C. Visium array spots within the tumor region typically contained more UMIs than spots in the rest of the tissue, likely at least in part due to higher density of cells within the tumor region(空间细胞密度)。

图片.png

图片.png

We first combined our expression matrices using an anchoring framework to identify common cell states across different datasets这里是对空间转录组数据进行的整合分析).

图片.png

After community-detection based clustering on our integrated dataset, we inferred the identities of 13 distinct clusters. When we projected the cluster assignments back onto the tissue coordinates and onto the UMAP embeddings for each spot(空间转录组数据的UMAP展示), we found complex spatial patterns in the data that strongly recapitulated tissue histology. Our Visium data captured multiple microenvironment cell types (muscle, liver, brain, skin, pancreas, heart, intestine, and gills) in addition to the BRAFV600E-driven melanomas. We validated our cluster assignments by plotting onto the Visium array the expression of marker genes that should be expressed exclusively in the tumor (BRAFV600E)(这里对空间组织上细胞类型的识别主要还是基于marker), muscle ( ), heart ( ), and nervous system ( ), and observed that expression of these marker genes was restricted to the expected regions of the tissue。
To further characterize the transcriptional architecture of the microenvironment, we asked whether we could leverage publicly available, annotated gene sets to uncover spatially-organized patterns of biological activity across the tissue(就是富集分析). To this end, we computed the mean expression of genes associated with all zebrafish Gene Ontology (GO) terms, and measured the distance between spots that highly express these genes, reasoning that shorter distances between spots may represent underlying spatial organization of these biological pathways across the tissue. We then compared this distribution to that of a null distribution of distances between random spots, allowing us to identify GO terms with spatially coherent(这里富集分析的方式大家要注意一下,不像是我们普通的那种cluster差异基因的富集,而是相对于空间背景的区域功能富集,跟空间高变基因的原理差不多), non-random expression patterns. Applying this to the tumor region of our samples, we identified several GO terms displaying interesting spatial expression patterns related to tissue structure (GO: extracellular structure organization; p = 2.3 × 10−8) and the immune system (GO:macrophage migration, p = 7.1 × 10−4), among others. We performed the same analysis on the microenvironment, and found several notable spatially-organized pathways that function in tumor growth and invasion (GO: lipid import into cell, p = 1.2 × 10−96; GO: IMP biosynthetic process, p = 2.0 × 10−40). Together, these data validate our spatially resolved transcriptomics workflow and demonstrate the existence of discrete tumor and microenvironment regions within our SRT dataset(其实还是对区域的识别和判定,非常重要).
图片.png

图片.png

The tumor–microenvironment interface is transcriptionally distinct from the surrounding tissues.

We noticed in all of the samples a transcriptionally distinct cluster of array spots that
localized to the border between the tumor and the adjacent microenvironment(有单独的cluster位于肿瘤和正常组织的交界处), in which specific biological pathways were upregulated(这个就是之前一直讲到的基因开关). This “interface” cluster was present in all three samples.

图片.png

Interestingly, the tissue in this interface region appeared largely indistinguishable from the surrounding microenvironment(该interface区域的组织与周围的微环境在很大程度上无法区分) (muscle), despite it being transcriptionally distinct. We thus hypothesized that this interface cluster represented the region in which the tumor was contacting neighboring tissues这一区域,就是肿瘤和正常区域的交界区域,认为是肿瘤接触正常区域的交流中心). To get a better sense of the transcriptional profile of the interface cluster, we computed the correlation between the averaged transcriptomes of each SRT cluster across all three samples(计算相关性). We found that the transcriptional profile of the interface cluster was more correlated with the tumor (R = 0.33) than with muscle (R = 0.06), despite the fact that the tissue in this region histologically resembles muscle with few tumor cells visible(交界区域跟肿瘤更相似)。
We next sought to identify genes that may differentiate the interface from muscle (to which it is most similar histologically) and from tumor (to which it is most similar transcriptionally)(差异分析). We found a number of genes that were upregulated specifically in the interface cluster relative to both tumor and muscle, including, interestingly, a number of uncharacterized genes, genes related to increased transcriptional/translational activity (atf3, eif3ea, and ribosomal genes), and genes related to the microtubule cytoskeleton (tuba1a and tuba1c)
图片.png

  • 注:The tumor–microenvironment interface is transcriptionally distinct from the surrounding microenvironment. a Interface and muscle-annotated cluster spots projected onto tissue image (n = 3 sections). Insets show the tissue underlying the interface spots (1) and muscle spots (2). b Correlation matrix between average expression profile of SRT clusters across all three datasets. Clusters are ordered by hierarchical clustering of the Pearson’s correlation coefficients and bubble sizes correspond to p-value (−log10) of correlation (two-sided), with p-values < 10−3 omitted. Clustering of tumor and interface together is highlighted in the dendrogram (red). c Volcano plot of differentially expressed genes between the interface cluster versus the muscle and tumor clusters. p-values were obtained from the Wilcoxon’s rank sum test (two-sided).

The upregulation of most of these genes was subtle (though statistically significant;), which may be due to the somewhat lower cellular resolution of the Visium technology and number of UMIs detected per spot (note: to address this, we further compare the magnitude of changes for these genes in our single cell datasets below(看来对于空间转录组来讲,精度仍然是个问题)). To identify gene expression programs that are enriched specifically at the interface and provide further evidence for the interface as a transcriptionally distinct tissue region, we
performed non-negative matrix factorization (NMF) on all microenvironment spots (including both interface and muscle clusters) across all samples(NMF识别转录程序,这个我分享了很多了,大家可以参考我的文章10X单细胞(10X空间转录组)分析之寻找目标bases基因集(factors)(PNMF)). When we projected the NMF factor scores onto each spot, we found that some factors were enriched across all three samples (e.g., factor 2), whereas some were only enriched in one or two of the samples (e.g., factors 4, 11).

图片.png

图片.png

These differences may be due to different tissue types present across the three samples. Notably, we also found that multiple factors were specifically enriched at the interface between the tumor and the microenvironment (有的factor主要富集在交界区域). To investigate the biology underlying the genes contributing to each factor, we looked for significantly enriched GO terms among the top 150 genes contributing to each factor (注意这里对每个因子的判定方法). This revealed several factors enriched in muscle-specific genes, as expected, and that the interface factors were enriched in genes functioning in biological processes including membrane bound organelles, protein targeting to organelles and the membrane, and DNA replication. This result suggests a high degree of biological activity within the interface region, with a potential role for membrane bound organelles in signaling within this region. Together, these data uncover a unique “interface” region bordering the tumor, which histologically resembles the microenvironment, transcriptionally resembles tumor, but expresses distinct gene modules that may contribute to tumor–microenvironment cell interactions(交界区域展现了第一无二的转录模式,对于肿瘤-微环境的相互作用至关重要).
图片.png

  • 注:d Non-negative matrix factorization (NMF) of the microenvironment spots (muscle and interface clusters). Shown are the standardized factor scores for interface-specific NMF factor 7, projected onto microenvironment spots. Arrows denote areas with higher factor scores. e Enriched GO terms for the top 150 scoring genes in NMF factor 7.

The tumor–microenvironment interface is composed of specialized cell states.(交界处的细胞状态)。

Our SRT results so far detail a transcriptionally distinct “interface” region where tumors contact the microenvironment. However, spatially resolved transcriptomics data is limited in resolution by the diameter of each spot on the Visium array (55 μm with current technology). Thus, each array spot probably captures transcripts from multiple cells. As the interface region is, by nature, likely a mixture of tumor and microenvironment cells, we performed single-cell RNA-seq (scRNA-seq,借助单细胞数据提高精度) on tumor and non-tumor cells from three adult zebrafish with large melanomas in order to better define the cell states present in the interface. We detected approximately 10,000–75,000 transcripts and 1000–5000 unique genes per cell.

图片.png

As expected, our scRNA-seq data contained tumor cells as well as various non-tumor cell types, including erythrocytes, keratinocytes, and several types of immune cells (细胞注释看看是如何做的). We did not identify a muscle cell cluster in our scRNA-seq dataset, likely because adult skeletal muscle is composed of multinucleated muscle fibers that cannot be isolated and encapsulated for droplet-based scRNA-seq.(某些细胞类型10X单细胞技术是无法捕获的,这也是单细胞技术的缺点之一,会丢失很多信息)。

Consistent with our SRT results, clustering of our scRNA-seq data revealed a distinct “interface” cell cluster, which we identified based on the fact that cells in this cluster significantly upregulated the same genes that were upregulated in our SRT interface cluster (p= 1.83 × 10−26). The distinct clustering of the interface population was not due to the presence of a significant number of cell doublets(多细胞去除,用到的方法是DoubleFinder,大家可以参考我的文章DoubletFinder) within this cluster。

图片.png

Strikingly, UMAP and principal component analysis of the interface cluster revealed two distinct cell populations, one expressing tumor markers such as BRAFV600E and the other expressing muscle genes such as ckba, with other genes such as the centromere gene stra13 upregulated in both populations. This result suggests that the transcriptionally distinct “interface” region we identified in our SRT data is actually composed of at least two similar, but distinct cell states: a “tumor-like interface” and a “muscle-like interface”(借助单细胞数据识别交界区域的细胞类型的组成). The interface region may not be limited to only tumorlike and muscle-like cell states; however, since zebrafish melanomas frequently invade into muscle, this likely contributes to the presence of muscle-like interface cells in our data.

图片.png

Based on this, we separated the interface cluster into two subclusters(再分群分析), and confirmed that the two subclusters express anticorrelated(负相关) levels of tumor markers such as BRAFV600E, , and , and muscle markers such as , , and . A common set of genes, including many genes related to the microtubule cytoskeleton and cell proliferation such as , , , and , were upregulated in both subclusters. Both the tumor-like and muscle-like interface cell states were present in both scRNA-seq samples.

图片.png

The presence of putative “muscle” cells in the interface is particularly notable, in light of the fact that adult skeletal muscle is composed of multinucleated muscle fibers that we were unable to isolate in our scRNA-seq workflow due to their size, evidenced by the lack of a muscle cell cluster in our dataset(单细胞转录组的缺点开始体现). This could suggest the presence of mono-nucleated muscle cells, or a hybrid tumor-muscle cell state at the invasive front. Previous work suggests that tumor and immune cells can fuse to create a hybrid cell state that contributes to tumor heterogeneity and metastasis, although tumor-muscle cell fusion has not yet been reported. Together, these data suggest that the interface region is composed of specialized tumor-like and microenvironment-like cell states(这种中间态细胞类型最为重要).

Interface cell states are distinct from neighboring tissues.

Our results so far indicate that we have uncovered an “interface” cell state localized to where the tumor contacts neighboring tissues. However, our scRNA-seq dataset does not contain a muscle cell cluster due to the fact that muscle fibers cannot be encapsulated for scRNA-seq单细胞转录组无法捕获某种细胞类型). This makes it difficult to assess whether the specialized muscle cell state found in the interface is truly distinct from muscle that is not in proximity to the tumor. Thus, to effectively compare the interface cell state(s) to other microenvironment cell types/states that cannot be captured with scRNA-seq, we validated our scRNA-seq results by performing single-nucleus RNA-seq (snRNA-seq) on nuclei extracted from three adult zebrafish借助单细胞核转录组的手段进行分析), all with large transgenic melanomas. Although snRNA-seq captures only nascent transcripts in the nucleus, which contains only 10–20% of the cell’s mRNA, scRNA-seq, and snRNA-seq typically recover the same cell states/types, albeit sometimes in different proportions. After quality control and filtering, our dataset encompassed transcriptomes for 10,527 individual nuclei。We also identified an “interface” cluster in our snRNA-seq dataset (看来单细胞和单核可以相互验证). We identified the interface cluster based on the fact that nuclei in this cluster strongly upregulated genes that were strongly upregulated in the interface cluster in our scRNA-seq dataset, including stmn1a, stra13, plk1, and haus4, and that the interface cluster from our snRNAseq dataset clustered with the interface cluster from our scRNAseq dataset when the two datasets were integrated(单细胞转录组和单核转录组的整合分析,后续我们看看方法是什么)。

图片.png

  • 注:Single-nucleus RNA-seq demonstrates that the interface cell states are distinct from the rest of the microenvironment. a snRNA-seq cluster assignments plotted in UMAP space. b Expression of marker genes from the scRNA-seq interface cluster in the snRNA-seq dataset. c Integrated UMAP of the snRNA-seq and scRNA-seq datasets (labeled, top plot) showing colocalization of the two interface clusters (bottom plot).

To interrogate the types of nuclei present in the interface cluster in our snRNA-seq dataset, we performed dimensionality reduction and clustering on the nuclei from the interface cluster, which identified five discrete subclusters. Similar to our scRNA-seq dataset, within the interface cluster in our snRNA-seq dataset we identified subclusters of nuclei that upregulated tumor-specific or muscle-specific genes。 The interface cluster in our snRNA-seq dataset also contained other subclusters that did not express tumor-specific or muscle-specific genes.单细胞转录组和单核转录组的相互验证)。Nuclei in these subclusters expressed genes related to other cell types in our snRNA-seq dataset, including immune cells (ctss2.1), liver (fabp10a), and digestive system (ela2) . This is in line with recent work showing that melanomas can reprogram microenvironmental cells such as liver cells even when not in physical contact. However, similar to our scRNA-seq and SRT datasets, there were many genes that were specifically upregulated across the interface subclusters that were not upregulated in any other cell type in the snRNA-seq dataset, further suggesting that the “interface” cell state is a distinct transcriptional entity(三种技术数据的相互验证,看来下了相当大的功夫)。

图片.png
  • 注:d Subcluster assignments and expression of marker genes from the snRNA-seq interface cluster. e Dotplot showing expression of microenvironment cell-type specific genes within the interface subclusters. f Heatmap showing expression of the top 100 genes upregulated across all of the interface subclusters.

Although our snRNA-seq analysis workflow includes multiple processing steps to exclude doublets, including filtering steps based on the number of UMIs per nucleus and removing possible doublets identified by DoubletFinder, to further interrogate whether these tumor-like and microenvironment-like interface nuclei could be attributed to doublets with the corresponding cell type, we quantified the number of UMIs/genes expressed by interface cells/nuclei relative to other cells/nuclei in the dataset. The results were inconclusive: in some cases we quantified significantly more UMIs/genes in interface cells, in some cases we quantified significantly less UMIs/genes in interface cells, but in other cases there was no significant difference (单核数据是否需要进行双细胞的去除,这里给出了答案).

图片.png

Thus, to further investigate the presence of doublets in the interface, we calculated the degree of overlap between genes expressed by the tumor/microenvironment nuclei and genes expressed by the corresponding interface nuclei (进一步验证).

图片.png

Although the tumor-like and microenvironment-like interface clusters expressed some tumor-specific and microenvironment-specific genes, as expected, in most cases there was not a significant degree of overlap between all genes upregulated between both cell states, suggesting that these interface cell states are not caused by doublets(双细胞的效应排除). We did observe some overlap between all genes expressed by NK cells and macrophages relative to the immune-like interface cells, suggesting that some doublets could be present within the immune-like interface cluster. Notably, tumor-immune cell fusion has been reported in melanoma. Determining whether these potential doublets result from technical or biological reasons will be an important area of future study.

Since our snRNA-seq dataset contained more cells/nuclei and a greater breadth of cell types than our scRNA-seq dataset, we integrated our snRNA-seq data with our Visium SRT data using our recently developed multimodal intersection analysis (MIA) methodMIA整合单核和空间转录组数据,大家可以参考我之前的文章MIA用于单细胞和空间的联合分析) to confirm the presence of tumor-like and microenvironment-like cell states within the interface region. Notably, our MIA results suggested that the interface regions in our SRT dataset were enriched in cell types including muscle, macrophages, and tumor, in line with our scRNA-seq and snRNA-seq results. The cluster that was most significantly enriched in the interface region was the muscle-like interface cell state, in accordance with the histology of our SRT samples that showed that the interface region closely resembles the surrounding muscle. Together, these results suggest that the interface is composed of tumor and microenvironment cells which upregulate a common gene program that may contribute to tumor-microenvironment cell interactions at the tumor boundary(看来交界处确实是一个关键的转换开关).

图片.png

As our SRT results suggest that the interface cell state may be modulated by direct cell–cell interactions between tumor and microenvironment cells, we used NicheNet(NicheNet是一个很好的细胞通讯的分析软件,大家可以参考我的文章10X单细胞(10X空间转录组)通讯分析之NicheNet、10X单细胞(10X空间转录组)空间相关性分析和cellphoneDB与NicheNet联合进行细胞通讯分析) to computationally infer interactions between interface cells and the rest of the cells in our snRNA-seq dataset(注意这里作者运用的是单核数据而不是单细胞转录组数据) by identifying potential ligands expressed by interface cells and receptors and target genes in the other cell types. As the NicheNet model is currently designed to work with human genes, we performed this analysis with the human orthologs of the zebrafish genes in our dataset (物种之间的基因转换). The top ligand predicted to be active in interface nuclei was HMGB2, of which there are two zebrafish orthologs: hmgb2a and hmgb2b. These genes were highly expressed in the interface clusters across our snRNA-seq, scRNA-seq, and SRT datasets. Interestingly, HMGB2 expression has been reported to be correlated with tumor aggressiveness. The predicted receptors for HMGB2 were AR, ITPR1, and CDH1 (fish orthologs: ar, itpr1a, itpr1b, cdh1). Of these potential receptors, cdh1 was the most highly expressed in general across the three datasets. cdh1 was expressed in various microenvironment cell types, including intestinal cells, keratinocytes, and also in some interface cell states. cdh1 (Ecadherin) is a core component of adherens junctions along with α-catenin and β-catenin这个地方一定要注意,一般有adherens junctions说明空间上细胞类型存在共定位现象). Interestingly, HMGB2 and β-catenin have been reported to cooperate to promote melanoma progression. These data demonstrate one of likely many signaling interactions that occur between interface cells and other cells adjacent to the tumor. Taken together, our results suggest that we have identified a putative “interface” cell state in each of our SRT, scRNA-seq, and snRNA-seq datasets, composed of tumor and microenvironment cells which upregulate a common gene program that may contribute to tumor–microenvironment cell interactions at the tumor boundary.这里对于细胞类型之间相互作用的方法值得大家好好借鉴)。

图片.png

Cilia genes and pathways are upregulated at the interface.

To gain further insight into the biological processes underlying the specialized “interface” region identified in our SRT and scRNAseq data, we performed pre-ranked gene set enrichment analysis (GSEA,这个分析大家应该都熟悉,但是现在一般GSVA用的多一点), using differentially expressed genes in the scRNA-seq interface cluster, to identify conserved pathways that may be active in interface cells(这个时候做富集又用到单细胞转录组了,O(∩_∩)O). We noticed that many cilia-related pathways were enriched in the combined interface cluster. This enrichment of cilia-related pathways occurred in both the muscle-like and tumor-like interface cell states. Cilia-related GO terms were also enriched in the SRT interface, as were GO terms related to membrane-bound organelles in the genes contributing to NMF factor 7, which localized to the interface. When we calculated a list of common genes upregulated across the SRT, scRNA-seq, and snRNA-seq interface clusters, several cilia genes were present on this list including ran, tubb4b, stmn1a, and tuba8l4 (三种技术共有的上调基因,这个说明基因上调具有普遍性).

图片.png

Several recent studies have implicated cilia in an important role in melanoma initiation and progression, although the mechanism by which cilia mediate melanoma progression is unclear. To further investigate a role for cilia at the tumor–microenvironment interface, we scored each cell from our scRNA-seq dataset for relative enrichment of cilia genes, using the “gold standard” SYSCILIA gene list, and quantified a significant upregulation of cilia genes in both interface cell states in our scRNA-seq data, with a particularly strong upregulation in the muscle-like interface cluster. Although cilia genes generally were expressed at relatively low levels in our snRNA-seq dataset, in line with the overall lower expression of most genes in our snRNA-seq data relative to our scRNA-seq data, the most highly upregulated cilia genes in the scRNA-seq interface cluster were also upregulated across the tumor-like and muscle-like cell states in the snRNA-seq interface cluster relative to the tumor and muscle clusters.

图片.png

Furthermore, we quantified a clear enrichment of cilia genes such as ran, tubb4b, tuba4l, and gmnn specifically in tumor-like and muscle-like interface cells in our snRNA-seq dataset, and, similar to our scRNA-seq results, all four genes were upregulated more highly in the muscle-like interface cluster than in any of the other interface clusters. Together, these results suggest a potential role for cilia at the tumor–microenvironment interface.(多技术共有的上调基因,这个方法确实不错,不过我很好奇纤毛基因在这里起到什么作用)。

The tumor–microenvironment interface is ciliated.

Interestingly, previous studies have shown that human and mouse melanomas
are not ciliated, although they express cilia genes
. To reconcile these models, we stained sections through adult zebrafish with invasive BRAFV600E-driven melanomas for acetylated tubulin, a common cilia marker30. Strikingly, we found that although the bulk of the tumor was not ciliated as expected, there was a specific enrichment of cilia at the invasive front of the tumor, where it contacts the muscle(纤毛和肌肉组织的关系很密切).

图片.png

We observed long, acetylated tubulinpositive tubulinpositive projections that were often found in the extracellular space spanning tumor and adjacent muscle cells. These projections were not found in the bulk of the tumor or in muscle that was not adjacent to the tumor (说明了什么?转录组和蛋白组的丰度不能划等号). These structures did not resemble typical cilia, which we occasionally see on cultured zebrafish melanoma cells expressing a transgenic cilia reporter, as the acetylated tubulin-positive structures we see in vivo are longer and structurally distinct from typical cilia. Determining the nature and function of these structures will be an exciting area of future study.

图片.png

We could not conclusively determine whether these cilia originated in tumor cells, muscle cells or both cell types, another interesting topic that awaits further study. These data suggest that although the bulk of primary melanomas is not ciliated, cilia are enriched at the tumor–microenvironment interface, where they may facilitate growth of the tumor into surrounding tissues(纤毛在肿瘤-微环境interface富集,在那里它们可能促进肿瘤生长到周围组织中,纤毛的作用居然是导致肿瘤入侵).
图片.png

  • 注:g Immunofluorescent images of sections through adult zebrafish with invasive melanomas, stained for GFP (tumor cells), acetylated tubulin (cilia), and Hoescht (nuclei), showing the tumor-muscle interface (left), center of the tumor (middle), and distant muscle (right). Arrows denote cilia at the interface. Scale bars, 100 μm. Images are representative from at least three independent experiments. h Inset of region highlighted in g (left). Scale bars, 25 μm.

ETS-family transcription factors regulate cilia gene expression at the interface.

To identify potential regulators of gene expression within the interface, we performed HOMER motif analysis(这个地方我们需要注意) to identify conserved transcription factor (TF) binding motifs enriched in genes differentially expressed in the interface. When we performed de novo motif enrichment analysis on genes differentially expressed in the SRT interface compared to normal muscle, the top-ranked motif was the highly conserved ETS DNA-binding domain, containing a core GGAA/T sequence (p = 1 × 10−22). The ETS domain was also the top-ranked motif enriched in genes differentially expressed in the SRT interface compared to all other SRT spots (p = 1 × 10−15), and was the second-ranked motif enriched in genes differentially expressed in the interface cluster identified in our scRNA-seq dataset (p = 1 × 10−13) and in genes differentially expressed in our snRNA-seq interface cluster (p = 1 × 10−13). Furthermore, ETS motifs were frequently enriched in both the tumor-like and muscle-like interface subclusters in our scRNA-seq dataset, along with, notably, motifs for RFX-family transcription factors which regulate ciliogenesis.Although ETS-family transcription factors have not been widely studied in melanoma, they have been reported to function in melanoma invasion and phenotype switching, and are aberrantly upregulated in many types of solid tumors. Interestingly, zebrafish ETS-family transcription factors were downregulated in the interface in each of our scRNA-seq, snRNA-seq, and SRT datasets.(motif分析目前是一个很重要的分析点,不过分析的方法我们还是重点关注一下)。

图片.png

  • 注:ETS transcription factors may regulate cilia gene expression at the interface. a Results from HOMER de novo motif analysis of differentially expressed genes in the SRT, scRNA-seq, and snRNA-seq interface clusters. b Top ten enriched motifs from HOMER known motif analysis of the scRNAseq tumor-like (left) and muscle-like (right) interface cell states. a, b p-values calculated using the hypergeometric test (one-tailed). c–e Relative expression of zebrafish ETS genes across the clusters in the scRNA-seq (c), SRT (d), and snRNA-seq (e) datasets. p-values are noted (Wilcoxon rank sum test, two-sided, with Bonferroni’s correction).

To identify potential biological processes that could be regulated by ETS transcription factors at the tumor–microenvironment interface, we investigated putative target genes containing an ETS motif in their promoter. We queried the zebrafish genome for genes with an ETS motif within 500 bp of the transcription start site, filtered these genes to include only those differentially expressed in the tissue/cell state of interest, and performed GSEA on the resulting target gene lists. Surprisingly, within the ETS-target genes in both the SRT and scRNA-seq interface clusters, we again found an enrichment of pathways related to cilia. As ETS TFs are downregulated specifically in the interface, this suggests that ETS-family TFs may act as a transcriptional repressor of cilia genes. ETS TFs can act as transcriptional activators and/or repressors depending on gene and context. In support of this model, when we scored each cell in the interface for relative expression of both ETS genes and ETS-target genes, the two were strongly anti-correlated (R=−0.625, p=9.02 × 10−27). Collectively, these data suggest that ETS-family transcription factors act as transcriptional repressors of cilia genes in cells at the interface between tumors and fold in our scRNA-seq interface cluster, and classified cells that upregulated these genes as an “interface” population. Similar to our snRNAseq results, “interface” cells were found across all the major cell types in the human melanoma dataset. For the purposes of statistical power, we focused on interface-like cells from the three largest clusters (tumor, myeloid cells, and T/NK cells). Human cells in an interface-like cell state upregulated many of the same genes upregulated in the interface in our zebrafish datasets, including PLK1, HMGB2, TUBB4B and TPX2. Cilia genes were significantly upregulated across all of the interface cell states, relative to their corresponding tumor/TME cell types. This suggests that a transcriptionally distinct “interface” gene signature may be found in human melanoma. Identifying which human melanoma subtypes (e.g., BRAF, NRAS, c-KIT, etc) in which an interface cell state is found awaits larger datasets of freshly isolated tumors subjected to scRNA-seq and/or SRT. Follow-up analyses determining the roles of specific types of immune and myeloid cells in the interface would also be an interesting area of future study. Together, our results suggest that cell-cell interactions at the tumor–microenvironment interface are accomplished by a subset of specialized tumor and muscle cells, which together upregulate a conserved common gene program characterized by upregulation of cilia genes and downregulation of ETS transcription factors.(总之,我们的结果表明,肿瘤-微环境interface上的细胞-细胞相互作用是由一组特化的肿瘤和肌肉细胞完成的,它们共同上调了一个保守的共同基因程序,其特征是纤毛基因的上调和 ETS 转录因子的下调。)

图片.png

Discussion

在这里,结合空间分辨和单细胞和单核转录组学方法来表征肿瘤细胞如何与其周围环境中的新组织相互作用,揭示这种interface如何形成的key regulators。分析了总共 49,944 个转录组,包括来自 7281 个空间spot、2889 个斑马鱼细胞、10,527 个斑马鱼细胞核和 29,247 个人类细胞的 20,589 个独特基因的表达。分析的结果确定了一系列空间模式基因模块,其中一些专门定位于肿瘤和周围组织之间的interface。发现interface由专门的肿瘤和肌肉细胞状态组成,其特征在于纤毛基因和蛋白质的上调。进一步表明,ETS 转录因子调节interface处纤毛基因的表达,并且在人类黑色素瘤患者样本中,一个独特的“interface”细胞群是保守的。总之,分析的结果揭示了可能介导黑色素瘤生长到周围组织的“interface”转录状态。
结果确定了 ETS 家族转录因子在介导interface纤毛基因表达中的作用。近年来,纤毛与黑色素瘤生物学的多个方面有关,但它们在黑色素瘤进展中的作用仍不清楚。大部分黑色素瘤没有纤毛,事实上,“ciliation index”作为区分黑色素瘤和良性痣的诊断工具越来越受到重视。此外,纤毛分解最近与黑色素瘤转移有关,其中由 EZH2 调节的纤毛解构驱动转移。矛盾的是,虽然大多数黑色素瘤细胞没有纤毛,但许多黑色素瘤仍然表达纤毛基因。数据为这种复杂性增加了一层,因为我们发现纤毛基因不仅在肿瘤和微环境之间的interface上特异性上调,而且更重要的是,只有该interface的细胞表达高水平的纤毛蛋白。这提出了一个仍未完全回答的问题,即纤毛在黑色素瘤进展的各个步骤中所起的作用。在原发性黑色素瘤生长中,很明显大多数细胞是无纤毛的,并且 EZH2 会抑制这些基因。在这些模型中,通过 EZH2 丢失纤毛会通过增强的 Wnt/β-catenin 信号传导增加转移。关于大多数黑色素瘤细胞没有纤毛的发现与这一发现一致,但在interface处发现了一个特定的细胞亚群,这些细胞上调纤毛基因和蛋白质,这些细胞似乎也存在于人类黑色素瘤中。
如何协调这些看似矛盾的数据?分析的数据表明,当它们第一次在邻近环境中遇到新的异型细胞类型时,完整的纤毛可能是最重要的。这里可以设想几种不同的可能性来解释为什么纤毛在这个interface上特别上调。首先,interface处纤毛基因和蛋白质的这种上调可能是暂时的,由肿瘤和肌肉之间的异型细胞 - 细胞相互作用诱导。初级纤毛是细胞的关键信号传导枢纽,调节信号通路,如 Hedgehog 和 TGF-β/BMP,所有这些在癌症进展和细胞间通讯中都很重要。NicheNet 分析表明,可能存在不同的配体/受体对,包括 HMG 家族蛋白,可能会介导此类信号。第二种可能性是初级纤毛充当机械传感器,并在细胞侵入新组织时在细胞定向迁移中发挥作用。例如,对初级纤毛的开创性工作表明纤毛可以在 3T3 细胞中的迁移方向上定向,这也在伤口愈合的背景下被看到。最后,interface处纤毛的出现可能实际上是全身转移性传播的障碍,黑色素瘤和肌肉之间的异型相互作用可能会抑制进展。值得注意的是,斑马鱼黑色素瘤的转移率很低,实际上骨骼肌(我们最容易看到interface的地方)是人类罕见的转移部位,这与这种可能性是一致的。未来研究的一个主要努力将是描绘纤毛在肿瘤发生的每个步骤中的作用、哪些信号节点是最关键的,以及它们是否作为转移的障碍或促成因素。另一个开放且相关的问题是哪些微环境细胞类型(肌肉除外)触发肿瘤-微环境interface的纤毛化。snRNA-seq 数据和对人类患者数据的分析表明,该interface不仅限于肿瘤和肌肉,而且其他细胞类型也可能被重新编程以采用这种细胞状态,例如免疫细胞或肝细胞。最近的工作表明,在黑色素瘤中,肿瘤细胞可以在远处重新编程微环境细胞,如肝细胞。目前尚不清楚是否需要肿瘤/微环境细胞之间的直接物理接触来诱导类似interface的细胞状态,或者更远距离的信号机制是否也可能在起作用,但决定了这些肿瘤-微环境相互作用的性质(代谢或表观遗传)是未来机制研究的一个令人兴奋的领域。
分析的结果揭示了 ETS 家族转录因子在黑色素瘤中的作用,作为纤毛基因的潜在转录抑制因子。尽管大多数 ETS TF 可以作为转录激活因子,但已知至少有四种 ETS TF 具有阻遏活性。尽管 ETS TFs 在几种类型的实体瘤中具有明确的作用,但它们在黑色素瘤中的作用尚未得到深入研究,尽管最近的一项研究发现 ETS TFs 诱导紫外线损伤特征,这与增加的突变负荷相关人类黑色素瘤。 ETS TFs 广泛应用于肿瘤发生的各个方面,包括 DNA 损伤、代谢、自我更新和微环境重塑。然而,大多数(如果不是全部)这些情况已被发现是由 ETS 基因的异常上调引起的。相反,我们发现了 ETS TF 下调的作用,特别是在肿瘤接触周围组织的地方。目前尚不清楚是什么触发了这种空间受限区域中 ETS 基因的这种下调。尽管它们作为转录因子,ETS 蛋白也参与广泛的蛋白质-蛋白质相互作用,并且它们的活性通过作为信号级联结果的磷酸化进行调节。据报道,MAPK 信号可以调节 ETS,并且 MAPK 通路在黑色素瘤中经常被激活。目前尚不清楚 MAPK 或其他信号通路是否在肿瘤和/或微环境中显示空间受限的激活模式,但 SRT 技术的出现将有助于解决这些问题。
虽然这不是本文研究的重点,但SRT 数据集也揭示了肿瘤本身内空间组织的转录组异质性。近年来,单细胞转录组学方法的出现已经在大多数(如果不是所有)类型的癌症中确定了相当程度的转录组异质性。肿瘤异质性通常随着肿瘤的进展而增加,并且可能是临床结果不佳的预测因素,因为它被认为是耐药性的主要因素。研究不同肿瘤细胞亚型内的根本原因和复杂的克隆关系已被证明具有挑战性,原因有很多,其中之一是缺乏关于这种异质性的空间模式的信息。文中的数据集作为使用空间解析转录组学识别空间组织肿瘤异质性的原理证明,并为使用数据集或其他数据集探索这种空间模式的基础的未来研究奠定了基础。
据目前所知,文章的研究是第一个空间分辨的肿瘤与其环境之间界面的基因表达图谱。尽管发现了许多在肿瘤和/或环境中空间模式化的基因、通路和基因模块,但数据集中可能还有更多有趣的生物现象尚未确定。最近,深度学习方法已应用于组织病理学图像,以揭示分子改变、突变和预后的空间分辨预测。合乎逻辑的下一步是扩展这些方法,将深度学习和模式识别算法与 SRT 数据相结合,以识别基因表达的有趣空间模式,并根据组织病理学预测转录组。最终,转录组学、组织病理学和深度学习技术的整合将使我们能够扩展 SRT 和组织学数据集的实用性,并拓宽对体内癌细胞相互作用的理解

Method(我们关注一些关键的方法)

SRA Dimensionality reduction and clustering.

SRT data was processed using R version 3.6.3, Seurat version 3.1.417, Python version 3.6, and MATLAB 2019b. Data was normalized using SCTransform. The three SRT datasets were integrated using the Seurat SCTransform integration workflow, using 3000 integration features and including all common genes between the three datasets. Principal component analysis and UMAP dimensionality reduction were done using default parameters.Initial clustering was done using the FindClusters function implemented in the Seurat R package with the resolution parameter = 0.8. Tissue types of each cluster were inferred and clusters were further refined by plotting clusters onto the associated histology images and identifying marker genes using the Wilcoxon’s Rank Sum test. Expression scores for ETS and cilia gene sets were calculated using the Seurat function AddModuleScores with default parameters. A list of cilia genes was obtained from the SYSCILIA gold standard list33. A list of ETS genes was obtained from ref.(还是用的Seurat进行的空间转录组数据的分析)。

Identification of genes enriched in the SRT interface.(SRA)

To identify genes that were enriched at the interface in the SRT data, we first used the Seurat function FindMarkers and the Wilcoxon rank sum test in order to calculate the average log2 fold change for each gene in our dataset within the interface cluster, relative to all other SRT array spots. We then used the same function to calculate the average log2 fold change of each of these genes within the tumor and muscle clusters. To account for the likely admixture of tumor and muscle cells within the interface region, we defined interface-upregulated genes as: genes with a log2 fold change > 0, log2 fold change in the interface > log2 fold change in the tumor, and log2 fold change in the interface > log2 fold change in the muscle. We defined interface-downregulated genes as: genes with a log2 fold change < 0, log2 fold change in the interface < log2 fold change in the tumor, and log2 fold change in the interface < log2 fold change in the muscle. Finally, we filtered the lists of genes upregulated and downregulated in the interface to only include genes with an adjusted p-value of <0.05.(还是找差异的分析方法)。

Non-negative matrix factorization (NMF).(重点)

After normalization and integration of SRT data (see “Dimensionality reduction and clustering” section), negative values in the integrated expression matrix were set to zero. NMF was performed with a rank of 11. The optimal number of ranks was estimated using the function nmfEstimateRank based on the first rank for which cophenetic starts decreasing and for which RSS presents an inflection point. Factor scores were first z-scored across factors prior to plotting onto array spots.

Analysis of gene ontology (GO) terms with spatially coherent expression patterns.(这个空间的富集还是很值得注意的)

Danio rerio 的 GO terms注释是从 Biomart 下载的。 对于每个 GO terms,计算为该 GO terms注释的基因的平均表达。 将高度表达该 GO terms的spot定义为spot,这些基因的表达水平高于平均值加两个标准差(要求这些spot的数量至少为 5 以进行分析)。 然后计算了这些spot之间的欧几里得距离。 接下来,计算了相同数量的随机spot之间的欧几里德距离,并重复此计算 100 次以生成距离的零分布。 然后,使用 Wilcoxon 秩和检验将 GO terms spot距离与零分布进行比较以计算 p 值。

Correlation between SRT spots and SRT clusters.(cluster相关性)

For computing the correlation across SRT clusters, we first computed the average expression of each tissue cluster in the integrated expression matrix of our three datasets. We then used the union of the ~1000 variably expressed genes in each individual dataset to obtain a list of ~2300 total variably expressed genes. We then used these genes to compute the Pearson’s correlation and associated p-values.

GSEA and pathway analysis.

Lists of differentially expressed genes for pathway analysis were created using the Seurat function FindMarkers using the Wilcoxon rank sum test. Ribosomal genes and genes with p-values above 0.05 were removed. Zebrafish genes were converted to their human orthologs using DIOPT, keeping only human orthologs with a DIOPT score >6. In cases where there were multiple zebrafish orthologs for one human gene, the gene with the highest log fold change in expression was used. Pathway analysis and GSEA68 was done using the fgsea R package, using the MSigDB GO biological processes and GO cellular component human genesets.

HOMER motif analysis.

Motif analysis was performed using HOMER(参考文章在Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities), using the function findMotifs.pl. Motifs of lengths 8, 10, and 16 were queried within +/− 500 bp of the TSS of differentially expressed genes. Target genes containing the motif of interest were found by filtering the list of differentially expressed genes to contain only those with the desired motif. JASPAR73 was used to annotate motifs.

Multimodal intersection analysis (MIA).

To determine cell type enrichment in tissue regions we used MIA, which uses the hypergeometric cumulative distribution to determine the statistical significance of the overlap between cell type specific gene sets and tissue region specific gene sets. We used the intersect between all genes in the SRT count matrix and all genes in the snRNA-seq count matrix as the gene background to calculate the p-value. In parallel, we tested for cell type depletion by computing −log10(1 − p).

单细胞的分析

Data was processed using R version 3.6.3 and Seurat version 3.1.4. Cells with fewer than 200 unique genes or >20% mitochondrial reads were filtered out. Expression data was normalized using SCTransform. Datasets were integrated using the Seurat SCTransform integration workflow, with 3000 integration anchors and including all genes expressed in both datasets (15,154 genes). Principal component analysis, UMAP dimensionality reduction, HOMER analysis, GSEA, and pathway analysis were performed as described above. Cluster annotations were performed using the Seurat function FindAllMarkers, in conjunction with marker genes used in previous analyses. Doublets were detected using the doubletFinder R package, using 15 principal components.

Single-nucleus RNA-seq 分析

Data was processed using R version 3.6.3 and Seurat version 3.1.4.Nuclei with fewer than 200 unique genes, more than 1 million UMIs, predicted doublets and/or >20% mitochondrial reads were filtered out. A putative erythrocyte cluster was also filtered out for quality control reasons, due to the unusual nature of zebrafish erythrocyte nuclei. Expression data was normalized using SCTransform. PCA, UMAP, and HOMER analysis were performed as described above. Potential doublets were detected with doubletFinder and were filtered out before downstream analyses. Cluster annotations were performed using the Seurat function FindAllMarkers, in conjunction with marker genes used in previous analyses. Modeling of ligand–receptor interactions was performed using NicheNet and the nichenetR R package, with the combined interface cluster as the “sender” cell population and all other cells as “receiver”, using a cutoff of 0.1 for determining expressed genes and 0.5 for ligand-target scores. For NicheNet analysis, Zebrafish genes were converted to human as described above, using DIOPT, keeping only human orthologs with a DIOPT score >6. In cases where there were multiple zebrafish orthologs for one human gene, the gene with the highest log fold change in expression was used.

Calculation of an interface gene signature.

Genes significantly upregulated in the interface clusters of the SRT, scRNA-seq, and snRNA-seq datasets were calculated using the Seurat function FindMarkers and the Wilcoxon rank sum test. Ribosomal genes (starting with “rps” or “rpl”) were filtered out. The three genelists were then merged to only include common genes present on all three lists.

Statistical analysis.

Statistical analysis and figure generation were performed in MATLAB and R (R Foundation for Statistical Computing, 3.6.3). Image processing and analysis was performed in MATLAB and ImageJ (NIH). Unless otherwise noted, p-values were calculated using the Wilcoxon ranksum test, two-sided, with Bonferroni’s correction for multiple groups as necessary (R functions wilcox.test and pairwise.wilcox.test). Pearson correlation coefficients and corresponding p-values were calculated using the R function cor.test.

生活很好,有你更好

你可能感兴趣的:(10X单细胞、单核、空间转录组揭示肿瘤-微环境的空间结构与调控网络)