gitto-page enrichment

Giotto|| 空间表达数据分析工具箱
20

Seurat 新版教程:分析空间转录组数据(上)
Seurat 新版教程:分析空间转录组数据(下)
scanpy教程:空间转录组数据分析
10X Visium:空间转录组样本制备到数据分析
空间信息在空间转录组中的运用

Giotto, a toolbox for integrative analysis and visualization of spatial expression data

像Seurat一样,Giotto也是一位画家。乔托·迪·邦多纳(Giotto di Bondone 1266年-1337年),意大利画家、雕刻家与建筑师,被认为是意大利文艺复兴时期的开创者,被誉为“欧洲绘画之父”。在乔托的作品当中,可以看出他对于画作中真实空间的表达相当努力,有些壁画甚至还搭配了真实教堂内部的透视感来构图。这也许是Dries实验室选择这个名字作为其开发的空间表达数据分析工具箱的名字吧。

我们知道Seurat和scanpy中均有分析空间转录组的函数,美中不足的是空间信息多是用来作为可视化的画板,没有得到很好的利用。今天我们就跟着Giotto的教程看看,空间表达数据可以做什么以及是如何做到的。

首先,Giotto为常见的单细胞表达数据处理提供了一个灵活的框架,如:

质量控制
归一化
降维
聚类和细胞类型注释
当然, 针对非单细胞分辨率的空间技术,如10X Visium ,Giotto实现了3种算法,通过整合已知基signatures 或单细胞RNAseq注释数据来估计不同细胞类型在空间中的富集(也可以理解为一种映射)。

最重要的是,Giotto进一步利用空间信息形成空间网格和或空间接近网络,用于:

识别空间特异基因
提取连续的空间表达模式
使用HMRF识别离散的空间域
探索细胞类型/细胞类型空间相互作用富集或耗尽
利用空间和配受体表达计算细胞空间相互作用
发现相互作用改变基因(interaction changed genes,ICG):由于与相邻细胞相互作用而改变一种细胞类型表达的基因
最后,Giotto提供了界面版工具来探索空间表达数据:Giotto Viewer(http://spatial.rc.fas.harvard.edu/giotto-viewer/)

具体每一步算法,还是建议把原论文打印出来慢慢研究。这里可以看做是一份demo报告,而且Giotto本身还在发展当中,如果真的用Giotto来分析自己的空间表达数据,开始看其官网:https://rubd.github.io/Giotto_site/index.html

截至2020-12-13,官网已经准备好了15个案例教程,支持分析9种最先进的空间技术,包括原位杂交(seqFISH+, merFISH, osmFISH),测序(Slide-seq, Visium, STARmap)和基于成像的多路复用/蛋白质组学(CyCIF, MIBI, CODEX)。并分别提供了不同的数据集,以便用户学习使用。

Install a Giotto environment (optional) # 学会R包安装
Create a Giotto object # 理清楚Giotto数据结构
Process and filter a Giotto object # 类比单细胞数据分析流程学习
Dimension reduction
Cluster cells or spots
Identify differentially expressed genes
Annotate clusters
Cell-type enrichment or deconvolution per spot # 低分辨率单细胞技术分析
Create a Spatial grid or Network # 空间数据分析
Find genes with a spatially coherent gene expression pattern
Identify genes that are spatially co-expressed
Explore spatial domains with HMRF
Calculate spatial cell-cell interaction enrichment
Find cell-cell interaction changed genes (ICG)
Identify enriched or depleted ligand-receptor interactions in hetero and homo-typic cell interactions
Export Giotto results to use in Giotto viewer # 在线工具

创建Giotto 对象
安装与加载:

library(devtools) # if not installed: install.packages(‘devtools’)
library(remotes) # if not installed: install.packages(‘remotes’)
remotes::install_github(“RubD/Giotto”)

library(Seurat) # 假设大家都会分析单细胞数据了,这个假设坏不坏,坏
library(SeuratData)
library(ggplot2)
library(cowplot)
library(dplyr)
library(Giotto)
library(patchwork)
library(tidyverse)

我们将使用Giotto自带的数据集mini_visium做为演示数据,这样数据较小,运行时间较快。在创建Giotto对象之前首先需要做一步初始化,这一点和metaCELL还是很像。

?createGiottoInstructions
temp_dir = ‘./mini_visium/’
myinstructions = createGiottoInstructions(save_dir = temp_dir,
save_plot = FALSE,
show_plot = F,
python_path = ‘D:\Program Files (x86)\anconda\python.exe’)

可以看到Giotto过程是依赖python的,这一点早期的Seurat差不多,这当然会有些不便之处,至少知道同一个环境下python在哪,需要哪些库需要安装。这是后话,在我们的展示中这都不是问题(有问题的地方我们可以跳过_)。下面我们来创建这个对象:

expr_path = system.file(“extdata”, “visium_DG_expr.txt.gz”, package = ‘Giotto’)
loc_path = system.file(“extdata”, “visium_DG_locs.txt”, package = ‘Giotto’)
mini_visium <- createGiottoObject(raw_exprs = expr_path,
spatial_locs = loc_path,
instructions = myinstructions)

这个会有一个提示:

Consider to install these (optional) packages to run all possible Giotto commands for spatial analyses: scran trendsceek SPARK multinet RTriangle FactoMiner
Giotto does not automatically install all these packages as they are not absolutely required and this reduces the number of dependenciesWarning message:
In createGiottoObject(raw_exprs = expr_path, spatial_locs = loc_path, :
module: community was not found with python path: D:\Program Files (x86)\anconda\python.exe

经验告诉我,community 是用来分群的,我们不用louvain算法来计算距离,用leiden可以的。创建完对象我们当然要看看对象长什么样了。

mini_visium
An object of class giotto
634 genes across 624 samples.

Steps and parameters used:

list()

str(mini_visium,max.level = 2)
Formal class ‘giotto’ [package “Giotto”] with 19 slots
…@ raw_exprs :Formal class ‘dgCMatrix’ [package “Matrix”] with 6 slots
…@ norm_expr : NULL
…@ norm_scaled_expr : NULL
…@ custom_expr : NULL
…@ spatial_locs :Classes ‘data.table’ and ‘data.frame’: 624 obs. of 3 variables:
… …- attr(, “.internal.selfref”)=
…@ cell_metadata :Classes ‘data.table’ and ‘data.frame’: 624 obs. of 1 variable:
… …- attr(
, “.internal.selfref”)=
…@ gene_metadata :Classes ‘data.table’ and ‘data.frame’: 634 obs. of 1 variable:
… …- attr(*, “.internal.selfref”)=
…@ cell_ID : chr [1:624] “AAAGGGATGTAGCAAG-1” “AAATGGCATGTCTTGT-1” “AAATGGTCAATGTGCC-1” “AAATTAACGGGTAGCT-1” …
…@ gene_ID : chr [1:634] “Gna12” “Ccnd2” “Btbd17” “Sox9” …
…@ spatial_network : NULL
…@ spatial_grid : NULL
…@ spatial_enrichment : NULL
…@ dimension_reduction: NULL
…@ nn_network : NULL
…@ images : NULL
…@ parameters : list()
…@ instructions :List of 11
…@ offset_file : NULL
…@ OS_platform : chr “windows”

可以看到也是一个S4对象包含了表达谱和空间信息,需要知道每部分存放的数据是什么。

head(mini_visium@spatial_locs)
sdimx sdimy cell_ID
1: 5477 -4125 AAAGGGATGTAGCAAG-1
2: 5959 -2808 AAATGGCATGTCTTGT-1
3: 4720 -5202 AAATGGTCAATGTGCC-1
4: 5202 -5322 AAATTAACGGGTAGCT-1
5: 4101 -4604 AACAACTGGTAGTTGC-1
6: 5821 -3047 AACAGGAAATCGAATA-1

head(mini_visium@cell_metadata)
cell_ID
1: AAAGGGATGTAGCAAG-1
2: AAATGGCATGTCTTGT-1
3: AAATGGTCAATGTGCC-1
4: AAATTAACGGGTAGCT-1
5: AACAACTGGTAGTTGC-1
6: AACAGGAAATCGAATA-1

head(mini_visium@gene_metadata)
gene_ID
1: Gna12
2: Ccnd2
3: Btbd17
4: Sox9
5: Sez6
6: Serpinf1

有了表达谱和空间信息,我们还需要图像数据,也加载进来。

1. read image

png_path = system.file(“extdata”, “deg_image.png”, package = ‘Giotto’)
mg_img = magick::image_read(png_path)
mg_img

2. test and modify image alignment

mypl = spatPlot(mini_visium, return_plot = T, point_alpha = 0.8)
orig_png = createGiottoImage(gobject = mini_visium, mg_object = mg_img, name = ‘image’,
xmax_adj = 450, xmin_adj = 550,
ymax_adj = 200, ymin_adj = 200)
mypl_image = addGiottoImageToSpatPlot(mypl, orig_png)
mypl_image

3. add images to Giotto object

image_list = list(orig_png)
mini_visium = addGiottoImage(gobject = mini_visium,
images = image_list)
showGiottoImageNames(mini_visium)
The following images are available: image
[1] “image”

图像数据在:

mini_visium@images

至此,一个空间表达数据对象就构建完成了,我们可以类比Seurat或scanpy的数据结构来看看Giotto的数据结构,以便在下游分析的时候随心所欲地提取数据。

表达数据标准分析
空间表达数据首先是表达数据,所以之前我们学到的单细胞表达数据的分析方法框架是完全可以用的。所谓,降维聚类必知必会嘛。先查看数据分析执行基本的数据质控。

explore gene and cell distribution

p1 <- filterDistributions(mini_visium, detection = ‘genes’) + ggtitle(" Gene Distributions")
p2 <- filterDistributions(mini_visium, detection = ‘cells’)+ ggtitle(" Cell Distributions")
p1 + p2

显示有多少基因和细胞在组合阈值时丢失。

filterCombinations(mini_visium,
expression_thresholds = c(1),
gene_det_in_min_cells = c(20, 20, 50, 50),
min_det_genes_per_cell = c(100, 200, 100, 200))

$results
threshold gene_detected_in_min_cells min_detected_genes_per_cell combination removed_genes removed_cells
1: 1 20 100 20-100 1 2
2: 1 20 200 20-200 1 75
3: 1 50 100 50-100 43 2
4: 1 50 200 50-200 43 77

$ggplot

之后,我们进行数据过滤以及均一化等操作。

filter and normalize

mini_visium <- filterGiotto(gobject = mini_visium,
expression_threshold = 1,
gene_det_in_min_cells = 50,
min_det_genes_per_cell = 100,
expression_values = c(‘raw’),
verbose = T)
mini_visium <- normalizeGiotto(gobject = mini_visium, scalefactor = 6000, verbose = T)
mini_visium <- addStatistics(gobject = mini_visium)

mini_visium <- calculateHVG(gobject = mini_visium)

看看基因和细胞的信息:

head(mini_visium@cell_metadata)
cell_ID nr_genes perc_genes total_expr
1: AAAGGGATGTAGCAAG-1 226 38.24027 992.0223
2: AAATGGCATGTCTTGT-1 283 47.88494 1123.8751
3: AAATGGTCAATGTGCC-1 247 41.79357 1024.8760
4: AAATTAACGGGTAGCT-1 221 37.39425 971.3199
5: AACAACTGGTAGTTGC-1 306 51.77665 1157.2925
6: AACAGGAAATCGAATA-1 337 57.02200 1227.8457

head(mini_visium@gene_metadata)
gene_ID nr_cells perc_cells total_expr mean_expr mean_expr_det hvg
1: Gna12 501 80.54662 1984.1670 3.1899791 3.960413 no
2: Ccnd2 356 57.23473 1250.6225 2.0106471 3.512984 no
3: Btbd17 252 40.51447 783.0241 1.2588812 3.107238 no
4: Sox9 308 49.51768 1015.9613 1.6333783 3.298576 no
5: Sez6 504 81.02894 2179.5787 3.5041458 4.324561 no
6: Serpinf1 105 16.88103 312.4149 0.5022748 2.975380 no

可以直接用函数获得上述信息:

pDataDT(mini_visium)
fDataDT(mini_visium)

此时的Giotto对象结构:

591 genes across 622 samples.

Steps and parameters used:

$0_subset
cells removed genes removed
2 43

$1_filter
used expression values gene expression threshold minimum # of genes detected per cell
“raw” “1” “100”
minimum times a gene is detected over all cells
“50”

$2_normalize
normalization method normalized to library size scalefactor log-normalized logbase
“standard” “yes” “6000” “yes” “2”
log offset genes scaled cell scaled if both, order of scaling
“1” “yes” “yes” “first_genes”

$3_gene_stats
expression values used detection_threshold
“normalized” “0”

$4_cell_stats
expression values used detection_threshold
“normalized” “0”

$5_hvg
method used expression values1 expression values2 expression values3 reversed log scale
“cov_groups” “normalized” “scaled” “custom” “FALSE”
logbase expression threshold number of expression groups threshold for z-score difference in variance
“2” “0” “20” “1.5” “0.1”
name for HVGs
“hvg”

执行降维。

?runPCA
mini_visium <- runPCA(gobject = mini_visium)
screePlot(mini_visium, ncp = 30)

每个PC的解释度

提取PCA坐标,对于UMAP和tsne也可以类比,方便自己画图谱。

head(mini_visium@dimension_reduction c e l l s cells cellspca p c a pca pcacoordinates[1:4,1:4])
Dim.1 Dim.2 Dim.3 Dim.4
AAAGGGATGTAGCAAG-1 0.9011678 -0.3335627 -0.14288217 -0.90345159
AAATGGCATGTCTTGT-1 3.8096179 3.4784611 0.61075503 0.06560415
AAATGGTCAATGTGCC-1 -8.0655096 0.4639553 -0.35425095 -1.02290334
AAATTAACGGGTAGCT-1 -7.3829491 1.3098899 0.04309618 -1.28969309

关于特征选择的PCA信息Giotto提供的并不像Seurat那么多。

可视化降维:

ppca <- plotPCA(gobject = mini_visium)
mini_visium <- runUMAP(mini_visium, dimensions_to_use = 1:10)
pumap<- plotUMAP(gobject = mini_visium)
mini_visium <- runtSNE(mini_visium, dimensions_to_use = 1:10)
ptsne <- plotTSNE(gobject = mini_visium)

下面是标准分析的核心了:聚类发现细胞亚群

mini_visium <- createNearestNetwork(gobject = mini_visium, dimensions_to_use = 1:10, k = 15)
mini_visium <- doLeidenCluster(gobject = mini_visium, resolution = 0.4, n_iterations = 1000)

查看聚类结果

head(mini_visium@cell_metadata)
cell_ID nr_genes perc_genes total_expr leiden_clus
1: AAAGGGATGTAGCAAG-1 226 38.24027 992.0223 3
2: AAATGGCATGTCTTGT-1 283 47.88494 1123.8751 4
3: AAATGGTCAATGTGCC-1 247 41.79357 1024.8760 2
4: AAATTAACGGGTAGCT-1 221 37.39425 971.3199 2
5: AACAACTGGTAGTTGC-1 306 51.77665 1157.2925 2
6: AACAGGAAATCGAATA-1 337 57.02200 1227.8457 4

查看聚类结果

visualize UMAP cluster results

plotUMAP(gobject = mini_visium, cell_color = ‘leiden_clus’, show_NN_network = T, point_size = 2.5)

visualize UMAP and spatial results

spatDimPlot(gobject = mini_visium, cell_color = ‘leiden_clus’, spat_point_shape = ‘voronoi’)

spatial voronoi plot with selected clusters

p1 <- spatPlot(mini_visium, point_shape = ‘voronoi’, cell_color =‘leiden_clus’, select_cell_groups = c(1,2,3))

spatial voronoi plot without showing not selected clusters

p2 <- spatPlot(mini_visium, point_shape = ‘voronoi’, cell_color =‘leiden_clus’, select_cell_groups = c(1,2,3), show_other_cells = F)

spatial voronoi plot without showing not selected cells, but showing the voronoi borders

p3 <- spatPlot(mini_visium, point_shape = ‘voronoi’, cell_color =‘leiden_clus’, select_cell_groups = c(1,2,3,4), show_other_cells = F,
vor_border_color = ‘black’)

p1+p2+ p3

亚群间差异基因,提供三种差异分析分析方法: “scran”, “gini”, “mast”,我们只使用其中的一种作为示例。

?findMarkers_one_vs_all
mast_markers = findMarkers_one_vs_all(gobject = mini_visium,
method = ‘mast’,
expression_values = ‘normalized’,
cluster_column = ‘leiden_clus’)

head(mast_markers)
genes Pr(>Chisq) coef ci.hi ci.lo fdr cluster ranking
1: Chgb 1.647262e-34 -2.103924 -1.749995 -2.457852 7.864001e-32 1_vs_others 1
2: Ogfrl1 2.661252e-34 -2.073549 -1.724172 -2.422927 7.864001e-32 1_vs_others 2
3: Prkcg 1.066031e-33 -2.215784 -1.843982 -2.587586 2.100080e-31 1_vs_others 3
4: Fxyd7 1.788067e-31 -1.915721 -1.564124 -2.267317 2.641869e-29 1_vs_others 4
5: Camkv 6.043544e-31 -1.934645 -1.586670 -2.282620 6.140771e-29 1_vs_others 5
6: Dpp6 6.234285e-31 -2.225360 -1.883683 -2.567037 6.140771e-29 1_vs_others 6

差异基因可视化

topgenes_mast = mast_markers[, head(.SD, 2), by = ‘cluster’]$genes # 这里的.SD 用的很溜啊
topgenes_mast
[1] “Chgb” “Ogfrl1” “Tcf7l2” “Prkcd” “Ddn” “Ttr” “Ngef” “Arpp19” “Prdm8” “Cpne6” “Spink8” “Wfs1” “Ccn2” “Cplx3”
[15] “Ccdc153” “Rarres2”
violinPlot(mini_visium, genes = topgenes_mast, cluster_column = ‘leiden_clus’,
strip_text = 10, strip_position = ‘right’)

topgenes_mast = mast_markers[, head(.SD, 6), by = ‘cluster’]$genes
topgenes_mast

plotMetaDataHeatmap(mini_visium, selected_genes = topgenes_mast,
metadata_cols = c(‘leiden_clus’))

根据差异基因注释细胞类型,这当然是一个需要谨慎考虑的,这里为了方便演示,只是取一个基因名作为其细胞类型。

clusters_cell_types = paste0(mast_markers[, head(.SD, 1), by = ‘cluster’] g e n e s , " c e l l s " ) c l u s t e r s c e l l t y p e s n a m e s ( c l u s t e r s c e l l t y p e s ) = 1 : l e n g t h ( u n i q u e ( m i n i v i s i u m @ c e l l m e t a d a t a genes,"_cells") clusters_cell_types names(clusters_cell_types) = 1:length(unique(mini_visium@cell_metadata genes,"cells")clusterscelltypesnames(clusterscelltypes)=1:length(unique(minivisium@cellmetadataleiden_clus))
mini_visium@cell_metadata
mini_visium = annotateGiotto(gobject = mini_visium, annotation_vector = clusters_cell_types,
cluster_column = ‘leiden_clus’, name = ‘cell_types’)

check new cell metadata

pDataDT(mini_visium)

           cell_ID nr_genes perc_genes total_expr leiden_clus   cell_types

1: AAAGGGATGTAGCAAG-1 226 38.24027 992.0223 3 Ddn_cells
2: AAATGGCATGTCTTGT-1 283 47.88494 1123.8751 4 Ngef_cells
3: AAATGGTCAATGTGCC-1 247 41.79357 1024.8760 2 Tcf7l2_cells
4: AAATTAACGGGTAGCT-1 221 37.39425 971.3199 2 Tcf7l2_cells
5: AACAACTGGTAGTTGC-1 306 51.77665 1157.2925 2 Tcf7l2_cells

618: TTGTAATCCGTACTCG-1 313 52.96108 1179.7196 2 Tcf7l2_cells
619: TTGTATCACACAGAAT-1 240 40.60914 999.6419 4 Ngef_cells
620: TTGTCGTTCAGTTACC-1 219 37.05584 991.4468 3 Ddn_cells
621: TTGTGGCCCTGACAGT-1 202 34.17936 924.0617 1 Chgb_cells
622: TTGTTCAGTGTGCTAC-1 232 39.25550 1019.5997 6 Spink8_cells

空间数据分析
这里我们开始切入纳入空间位置之后的分析,其中之一就是,对低分辨率的技术,利用signatures 矩阵看空间中细胞类型的富集情况。可以说是与单细胞数据的联合分析,也可以说是把spot看作是局部bulk数据,对spot的反卷积(deconvolution),用以推断每个spot的细胞类型组成。

如果有signatures 可以用之计算每个空间位置基因标记富集分数。Giotto采用三种富集方法进行富集分析:

PAGE (Parametric Analysis of Gene Set Enrichment)
在这种方法中,一组已知的细胞型特异性标记基因作为输入。目的是评估这些基因是否在每个点(spot)比其他点表达得更高。具体地说,对于每个点,我们定义一个与一组标记基因相对应的富集分数,如下所示。首先,对于整个基因组中的每个基因,我们利用该位点的表达值相对于所有位点的平均表达值来计算该基因的表达折叠变化。

Enrichment analysis based on rank of gene expression
在这种方法中,不需要已知的标记基因列表。相反,将使用外部单细胞RNAseq数据集以及每个细胞的单元类型注释作为输入。

利用超几何分布的富集分析
这种方法也需要一组已知的细胞类型特异性标记基因作为输入,但它通过简单地使用超几何检验来评估富集程度。基于标记基因注释和基因表达值二值化,将所有基因划分为4个不重叠的类别,来构建列联表。后者是由每个位点前5%的表达基因决定的。根据列联表计算出一个p值。这里的富集分数被定义为-log10(p-value)。

cell type signatures

combination of all marker genes identified

sign_matrix_path = system.file(“extdata”, “sig_matrix.txt”, package = ‘Giotto’)
brain_sc_markers = data.table::fread(sign_matrix_path) # file don’t exist in data folder
sig_matrix = as.matrix(brain_sc_markers[,-1]); rownames(sig_matrix) = brain_sc_markers$Event
sig_matrix[1:4,1:4]
Astro_ependymal Cerebellum Cholinergic_monoaminergic Cortex_hippocampus_thalamus
Cartpt 0 0 1 0
Gm15469 0 0 1 0
Pmch 0 0 1 0
Ly6a 0 0 0 0

由于我们草率地找了个signatures 所以富集可能会不准,但是作为演示是可以的。

mini_visium = runSpatialEnrich(mini_visium,
sign_matrix = sig_matrix,
enrich_method = ‘PAGE’) #default = ‘PAGE’

heatmap of enrichment versus annotation (e.g. clustering result)

cell_types = colnames(sig_matrix)
plotMetaDataCellsHeatmap(gobject = mini_visium,
metadata_cols = ‘leiden_clus’,
value_cols = cell_types,
spat_enr_names = ‘PAGE’,
x_text_size = 8, y_text_size = 8)

查看富集分数

enrichment_results = mini_visium@spatial_enrichment$PAGE
head(enrichment_results)
cell_ID Astro_ependymal Cerebellum Cortex_hippocampus_thalamus Cortex_hippocampus di_mesencephalon_1 di_mesencephalon_2 Hindbrain
1: AAAGGGATGTAGCAAG-1 -1.2800440 -0.5864989 0.917972068 -0.05840974 -2.0124908 2.44315201 0.6100475
2: AAATGGCATGTCTTGT-1 -1.1898333 -0.8364881 1.146730319 2.94314936 -0.3084605 -0.07627204 0.2909754
3: AAATGGTCAATGTGCC-1 -0.3672426 -0.7726506 -0.961627156 -3.13654976 2.3252251 0.04925334 1.0702687
4: AAATTAACGGGTAGCT-1 0.2906555 0.6332150 0.004221154 -4.66096648 2.3323610 1.60580233 1.5530766
5: AACAACTGGTAGTTGC-1 2.1619438 -1.4829558 -2.718809976 -5.05188952 0.2268543 -1.41909263 -0.4450371
6: AACAGGAAATCGAATA-1 -0.4275308 -0.2356467 1.354997024 4.45432092 -1.2874840 -0.79001109 1.4877626
Olfactory_bulb Oligo_dendrocyte Peptidergic Vascular
1: -2.0100819 -0.4722108 1.0807732 -0.7906277
2: 0.1148558 -1.8963290 -1.3098433 1.0049731
3: -0.1948654 0.4321902 1.2949379 2.3258985
4: -1.4450167 0.8517501 0.4966094 0.0358209
5: -0.5760702 2.3737223 1.1574013 0.4736639
6: -0.6069400 -1.3063133 -0.5210430 -0.8693017

可视化结果

enrich_cell_types = colnames(enrichment_results)
enrich_cell_types = enrich_cell_types[enrich_cell_types != ‘cell_ID’]

enrich_cell_types

spatplot

?spatCellPlot
spatCellPlot(gobject = mini_visium, spat_enr_names = ‘PAGE’,
cell_annotation_values = enrich_cell_types,show_image=T,show_legend=F,
cow_n_col = 3,coord_fix_ratio = NULL, point_size = 1)

创建空间网格,看局部空间的异质性。

mini_visium <- createSpatialGrid(gobject = mini_visium,
sdimx_stepsize = 300,
sdimy_stepsize = 300,
minimum_padding = 50)
showGrids(mini_visium)
?spatPlot
spatPlot(gobject = mini_visium, show_grid = T, point_size = 1.5,
show_image=T,group_by=“cell_types”)

提取网格和相关数据点

annotated_grid = annotateSpatialGrid(mini_visium)
annotated_grid
sdimx sdimy cell_ID gr_x_loc gr_y_loc gr_loc
1: 5477 -4125 AAAGGGATGTAGCAAG-1 gr_x_9 gr_y_5 gr_57
2: 5959 -2808 AAATGGCATGTCTTGT-1 gr_x_11 gr_y_9 gr_107
3: 4720 -5202 AAATGGTCAATGTGCC-1 gr_x_6 gr_y_2 gr_18
4: 5202 -5322 AAATTAACGGGTAGCT-1 gr_x_8 gr_y_1 gr_8
5: 4101 -4604 AACAACTGGTAGTTGC-1 gr_x_4 gr_y_4 gr_40

618: 4996 -5442 TTGTAATCCGTACTCG-1 gr_x_7 gr_y_1 gr_7
619: 6303 -2688 TTGTATCACACAGAAT-1 gr_x_12 gr_y_10 gr_120
620: 5202 -3885 TTGTCGTTCAGTTACC-1 gr_x_8 gr_y_6 gr_68
621: 5340 -3406 TTGTGGCCCTGACAGT-1 gr_x_8 gr_y_7 gr_80
622: 5615 -4125 TTGTTCAGTGTGCTAC-1 gr_x_9 gr_y_5 gr_57

annotated_grid_metadata = annotateSpatialGrid(mini_visium,
cluster_columns = c(‘leiden_clus’, ‘cell_types’, ‘nr_genes’))

annotated_grid_metadata

           cell_ID sdimx sdimy gr_x_loc gr_y_loc gr_loc leiden_clus   cell_types nr_genes

1: AAAGGGATGTAGCAAG-1 5477 -4125 gr_x_9 gr_y_5 gr_57 3 Ddn_cells 226
2: AAATGGCATGTCTTGT-1 5959 -2808 gr_x_11 gr_y_9 gr_107 4 Ngef_cells 283
3: AAATGGTCAATGTGCC-1 4720 -5202 gr_x_6 gr_y_2 gr_18 2 Tcf7l2_cells 247
4: AAATTAACGGGTAGCT-1 5202 -5322 gr_x_8 gr_y_1 gr_8 2 Tcf7l2_cells 221
5: AACAACTGGTAGTTGC-1 4101 -4604 gr_x_4 gr_y_4 gr_40 2 Tcf7l2_cells 306

618: TTGTAATCCGTACTCG-1 4996 -5442 gr_x_7 gr_y_1 gr_7 2 Tcf7l2_cells 313
619: TTGTATCACACAGAAT-1 6303 -2688 gr_x_12 gr_y_10 gr_120 4 Ngef_cells 240
620: TTGTCGTTCAGTTACC-1 5202 -3885 gr_x_8 gr_y_6 gr_68 3 Ddn_cells 219
621: TTGTGGCCCTGACAGT-1 5340 -3406 gr_x_8 gr_y_7 gr_80 1 Chgb_cells 202
622: TTGTTCAGTGTGCTAC-1 5615 -4125 gr_x_9 gr_y_5 gr_57 6 Spink8_cells 232

创建空间网络,我们可以用Delaunay", “kNN” 两种方法。

mini_visium = createSpatialNetwork(gobject = mini_visium, minimum_k = 2, maximum_distance_delaunay = 400)
mini_visium = createSpatialNetwork(gobject = mini_visium, minimum_k = 2, method = ‘kNN’, k = 10)
showNetworks(mini_visium)
The following images are available: Delaunay_network kNN_network
[1] “Delaunay_network” “kNN_network”

查看其中一个网络的信息,看到一个有from to 文件,这就可以用igraph来构建网络了啊。

mini_visium@spatial_network D e l a u n a y n e t w o r k Delaunay_network DelaunaynetworknetworkDT
from to sdimx_begin sdimy_begin sdimx_end sdimy_end distance weight
1: AAAGGGATGTAGCAAG-1 TCAAACAACCGCGTCG-1 5477 -4125 5340 -4125 137.0000 0.007299270
2: AAAGGGATGTAGCAAG-1 ACGATCATACATAGAG-1 5477 -4125 5546 -4244 137.5573 0.007269700
3: AAAGGGATGTAGCAAG-1 TTGTTCAGTGTGCTAC-1 5477 -4125 5615 -4125 138.0000 0.007246377
4: AAAGGGATGTAGCAAG-1 ATCGACTCTTTCCGTT-1 5477 -4125 5408 -4005 138.4233 0.007224219
5: AAAGGGATGTAGCAAG-1 GTAAGCGGGCAGTCAG-1 5477 -4125 5546 -4005 138.4233 0.007224219

1787: TTCAAGCCGAGCTGAG-1 TTGTATCACACAGAAT-1 6372 -2808 6303 -2688 138.4233 0.007224219
1788: TTCGACGGGAAGGGCG-1 TTCGCACTCGCGTGCT-1 4239 -4125 4308 -4245 138.4233 0.007224219
1789: TTCTTAGTGGCTCAGA-1 TTGTGGCCCTGACAGT-1 5408 -3287 5340 -3406 137.0584 0.007296161
1790: TTCTTGTAACCTAATG-1 TTGGCTCGCATGAGAC-1 3620 -4005 3757 -4005 137.0000 0.007299270
1791: TTGCACGGAGCAGCAC-1 TTGTCGTTCAGTTACC-1 5271 -3766 5202 -3885 137.5573 0.007269700

我们还是直接用Giotto来绘制吧,懒,哼。

p1 <- spatPlot(gobject = mini_visium, show_network = T,
network_color = ‘blue’, spatial_network_name = ‘Delaunay_network’,
point_size = 2.5, cell_color = ‘leiden_clus’) + ggtitle(“Delaunay_network”)

p2 <- spatPlot(gobject = mini_visium, show_network = T,
network_color = ‘blue’, spatial_network_name = ‘kNN_network’,
point_size = 2.5, cell_color = ‘leiden_clus’) + ggtitle(“kNN_network”)

p1+ p2

Giotto目前有五种不同的方法来识别空间连续表达模式(spatially coherent expression pattern)。三个先前发表的方法SpatialDE 、Trendsceek 和SPARK 可以与SpatialDE、Trendsceek和SPARK函数一起运行。Giotto介绍有两种新方法(“kmeans”, “rank”)是基于对空间网络中相邻细胞二值化表达数据的统计。

首先,对于每个基因,表达值采用kmeans聚类(k = 2)或简单的rank阈值设定(默认为30%)进行二值化,这是两种方法唯一的区别。接下来,根据相邻细胞间的二值化表达式值计算列联表,并将其用作Fisher精确检验的输入,以获得概率比估计值和p值。在这种情况下,如果一个基因通常在近端或邻近的细胞中被发现高表达,则被认为是空间基因。除了每个基因的胜率和p值外,还计算并提供了基因的平均表达量、高表达细胞数和hub细胞数。一个hub细胞被认为是一个高表达感兴趣的基因的细胞,它有多个高表达该基因的邻近细胞。用户可以利用这些特征进一步对具有不同特征的空间基因进行排序和探索。由函数 BinSpect (Binary Spatial extract)实现。

?binSpect
km_spatialgenes = binSpect(mini_visium)
km_spatialgenes

   genes       p.value   estimate   adj.p.value        score  av_expr high_expr

1: Ddn 1.249839e-282 19.1956483 7.386550e-280 1.246001e+04 6.031918 426
2: Shox2 6.120417e-223 24.2751032 7.234333e-221 1.242072e+04 3.984479 118
3: Hpca 8.836558e-253 13.1227650 2.611203e-250 7.616126e+03 6.964470 378
4: Zic1 5.861216e-226 12.2212691 8.659947e-224 6.338144e+03 4.284136 211
5: Mobp 6.841076e-234 10.7129452 1.347692e-231 5.751587e+03 7.660916 318

587: Plod1 8.569300e-01 1.0134093 8.627694e-01 1.564695e-01 3.052250 229
588: Tmem119 8.710065e-01 0.9854772 8.754504e-01 1.361002e-01 3.126011 180
589: Fn1 8.757291e-01 0.9871367 8.787028e-01 1.309915e-01 3.300124 193
590: Ndufa4l2 9.561515e-01 0.9885798 9.577720e-01 4.432688e-02 3.200996 115
591: Vsir 1.000000e+00 0.9989055 1.000000e+00 0.000000e+00 3.037544 177

可视化基因

spatGenePlot(mini_visium, expression_values = ‘scaled’,
genes = km_spatialgenes[1:4]$genes,
point_shape = ‘border’, point_border_stroke = 0.1,
show_network = F, network_color = ‘lightgrey’, point_size = 2.5,
cow_n_col = 2)

rank_spatialgenes = binSpect(mini_visium, bin_method = ‘rank’)
spatGenePlot(mini_visium, expression_values = ‘scaled’,
genes = rank_spatialgenes[1:4]$genes,
point_shape = ‘border’, point_border_stroke = 0.1,
show_network = F, network_color = ‘lightgrey’, point_size = 2.5,
cow_n_col = 2)

计算空间相关分数(calculate spatial correlation scores)。地理学第一定律:Everything is related to everything else, but near things are more related to each other. (任何事物都是与其他事物相关的,只不过相近的事物关联更紧密)。这一定律告诉我们,空间位置对对象间的相关性是有决定作用的。了识别空间共表达基因(Spatial co-expression patterns)的稳健模式,detectSpatialCorGenes和clusterSpatialCorGenes 可以用于识别的单个空间基因。第一个函数通过网格平均法或k近邻法在空间上平滑基因表达,然后计算基因到基因的相关性(默认=皮尔逊)分数。

ext_spatial_genes = km_spatialgenes[1:100]$genes
ext_spatial_genes

   genes       p.value   estimate   adj.p.value        score  av_expr high_expr

1: Ddn 1.249839e-282 19.1956483 7.386550e-280 1.246001e+04 6.031918 426
2: Shox2 6.120417e-223 24.2751032 7.234333e-221 1.242072e+04 3.984479 118
3: Hpca 8.836558e-253 13.1227650 2.611203e-250 7.616126e+03 6.964470 378
4: Zic1 5.861216e-226 12.2212691 8.659947e-224 6.338144e+03 4.284136 211
5: Mobp 6.841076e-234 10.7129452 1.347692e-231 5.751587e+03 7.660916 318

587: Plod1 8.569300e-01 1.0134093 8.627694e-01 1.564695e-01 3.052250 229
588: Tmem119 8.710065e-01 0.9854772 8.754504e-01 1.361002e-01 3.126011 180
589: Fn1 8.757291e-01 0.9871367 8.787028e-01 1.309915e-01 3.300124 193
590: Ndufa4l2 9.561515e-01 0.9885798 9.577720e-01 4.432688e-02 3.200996 115
591: Vsir 1.000000e+00 0.9989055 1.000000e+00 0.000000e+00 3.037544 177

检测空间相关的基因

spat_cor_netw_DT = detectSpatialCorGenes(mini_visium,
method = ‘network’, spatial_network_name = ‘Delaunay_network’,
subset_genes = ext_spatial_genes)

spat_cor_netw_DT$cor_DT
gene_ID variable spat_cor expr_cor cordiff spatrank exprrank rankdiff
1: 2010300C02Rik 2010300C02Rik 1.0000000 1.0000000 0.0000000 1 1 0
2: 2010300C02Rik Icam5 0.8697253 0.5558625 0.3138629 2 5 -3
3: 2010300C02Rik Itpka 0.8631855 0.6118914 0.2512941 3 2 1
4: 2010300C02Rik Ppp1r1a 0.8290775 0.5070328 0.3220448 4 8 -4
5: 2010300C02Rik Ddn 0.8159786 0.5619251 0.2540535 5 4 1

9996: Zic1 Itpka -0.6782534 -0.4374862 -0.2407671 96 94 2
9997: Zic1 Vxn -0.7007696 -0.4780233 -0.2227463 97 96 1
9998: Zic1 Sst -0.7252043 -0.3920385 -0.3331659 98 84 14
9999: Zic1 Rprml -0.7293975 -0.5113883 -0.2180092 99 100 -1
10000: Zic1 Dkk3 -0.7297394 -0.4979835 -0.2317558 100 99 1

相关基因的聚类与可视化。

2. cluster correlation scores

spat_cor_netw_DT = clusterSpatialCorGenes(spat_cor_netw_DT, name = ‘spat_netw_clus’, k = 8)
heatmSpatialCorGenes(mini_visium, spatCorObject = spat_cor_netw_DT, use_clus_name = ‘spat_netw_clus’)

可视化和过滤空间相关的基因

top_netw_spat_cluster = showSpatialCorGenes(spat_cor_netw_DT, use_clus_name = ‘spat_netw_clus’,
selected_clusters = 6, show_top_genes = 1)
top_netw_spat_cluster
gene_ID variable spat_cor expr_cor cordiff spatrank exprrank rankdiff clus
1: Amotl1 Amotl1 1 1 0 1 1 0 6
2: Grm4 Grm4 1 1 0 1 1 0 6
3: Lef1 Lef1 1 1 0 1 1 0 6
4: Nexn Nexn 1 1 0 1 1 0 6
5: Ntng1 Ntng1 1 1 0 1 1 0 6
6: Patj Patj 1 1 0 1 1 0 6
7: Plekhg1 Plekhg1 1 1 0 1 1 0 6
8: Prkcd Prkcd 1 1 0 1 1 0 6
9: Prkch Prkch 1 1 0 1 1 0 6
10: Rab37 Rab37 1 1 0 1 1 0 6
11: Rgs16 Rgs16 1 1 0 1 1 0 6
12: Shox2 Shox2 1 1 0 1 1 0 6
13: Slc17a6 Slc17a6 1 1 0 1 1 0 6
14: Synpo2 Synpo2 1 1 0 1 1 0 6
15: Tcf7l2 Tcf7l2 1 1 0 1 1 0 6
16: Tnnt1 Tnnt1 1 1 0 1 1 0 6
17: Vav3 Vav3 1 1 0 1 1 0 6
18: Vipr2 Vipr2 1 1 0 1 1 0 6
19: Zfhx3 Zfhx3 1 1 0 1 1 0 6
20: Zic1 Zic1 1 1 0 1 1 0 6

cluster_genes_DT = showSpatialCorGenes(spat_cor_netw_DT, use_clus_name = ‘spat_netw_clus’, show_top_genes = 1)
cluster_genes = cluster_genes_DT c l u s ; n a m e s ( c l u s t e r g e n e s ) = c l u s t e r g e n e s D T clus; names(cluster_genes) = cluster_genes_DT clus;names(clustergenes)=clustergenesDTgene_ID
cluster_genes
mini_visium = createMetagenes(mini_visium, gene_clusters = cluster_genes, name = ‘cluster_metagene’)
spatCellPlot(mini_visium,
spat_enr_names = ‘cluster_metagene’,
cell_annotation_values = netw_ranks$clusters,
point_size = 1.5, cow_n_col = 3,show_image=T,show_legend=F)

如前所述,空间域(Spatial domain detection)用隐马尔可夫随机场(HMRF)模型识别 。简而言之,HMRF是一个基于图形的模型,它将每个细胞的状态推断为细胞的内在状态(从细胞自身的基因表达载体推断)和细胞的外在状态(基于细胞周边的状态分布)的联合概率。在我们的例子中状态的概念是空间域。邻域图定义了邻居细胞的影响程度,以及定义细胞间相互作用强度的参数beta。HMRF将每个cell分配给k个空间域中的一个(k由用户定义)。

hmrf_folder = paste0(temp_dir,’/’,‘11_HMRF/’)
if(!file.exists(hmrf_folder)) dir.create(hmrf_folder, recursive = T)

?doHMRF

perform hmrf

my_spatial_genes = km_spatialgenes[1:100]$genes
HMRF_spatial_genes = doHMRF(gobject = mini_visium,
expression_values = ‘scaled’,
spatial_genes = my_spatial_genes,
spatial_network_name = ‘Delaunay_network’,
k = 8,
betas = c(28,2,2),
output_folder = paste0(hmrf_folder, ‘/’, ‘Spatial_genes_brain/SG_top100_k8_scaled’))

提示:

expression_matrix.txt already exists at this location, will be overwritten

spatial_genes.txt already exists at this location, will be overwritten

spatial_network.txt already exists at this location, will be overwritten

spatial_cell_locations.txt already exists at this location, will be overwritten
[1] “D:\Program Files (x86)\anconda\python.exe E:/software/R/R-4.0.2/library/Giotto/python/reader2.py -l “./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/spatial_cell_locations.txt” -g “./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/spatial_genes.txt” -n “./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/spatial_network.txt” -e “./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/expression_matrix.txt” -o “./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/result.spatial.zscore” -a test -k 8 -b 28 2 2 -t 1e-10 -z none -s 100 -i 100”

可见内置的是python实现的,我们用system函数在R中运行它。

system(“D:\Program Files (x86)\anconda\python.exe E:/software/R/R-4.0.2/library/Giotto/python/reader2.py -l "
./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/spatial_cell_locations.txt” -g "
./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/spatial_genes.txt" -n "
./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/spatial_network.txt" -e "
./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/expression_matrix.txt" -o "
./mini_visium//11_HMRF//Spatial_genes_brain/SG_top100_k8_scaled/result.spatial.zscore"
-a test -k 8 -b 30 2 2 -t 1e-10 -z none -s 100 -i 100")

127

check and select hmrf

myviewHMRFresults2D <- edit(viewHMRFresults2D)
environment(myviewHMRFresults2D) <- environment(viewHMRFresults2D)
debug(myviewHMRFresults2D)
?viewHMRFresults2D
viewHMRFresults2D(gobject = mini_visium,
HMRFoutput = HMRF_spatial_genes,
k = 8, betas_to_view = 2,
point_size = 2)

for(i in seq(28, 30, by = 2)) {
myviewHMRFresults2D(gobject = mini_visium,
HMRFoutput = HMRF_spatial_genes,
k = 8, betas_to_view = i,
point_size = 2)
}

myaddHMRF <- edit(addHMRF)
environment(myaddHMRF) <- environment(addHMRF)
mini_visium = myaddHMRF(gobject = mini_visium,
HMRFoutput = HMRF_spatial_genes,
k = 8, betas_to_add = c(28),
hmrf_name = ‘HMRF’)

mini_visium@cell_metadata
giotto_colors = getDistinctColors(8)
giotto_colors
names(giotto_colors) = 1:8
head(mini_visium@cell_metadata)
spatPlot(gobject = mini_visium, cell_color = ‘HMRF_k8_b.28’,
point_size = 3, coord_fix_ratio = 1, cell_color_code = giotto_colors)

识别细胞与细胞的相互作用改变了基因( cell-to-cell Interaction Changed Genes,ICG),即由于接近其他细胞类型而差异表达的基因。为了无偏性地识别与特定细胞型相互作用相关的所有潜在基因表达变化,Giotto实施了4项差异基因分析来识别这种相互作用改变的基因(ICG),包括t检验、limma检验、Wilcoxon秩和检验和空间排列检验。对于每一种细胞类型,我们将注释的细胞分成两个互补的亚组,其中一个亚组包含邻近另一种特定细胞类型的细胞的子集。这些组间差异表达的基因是通过上述每一种统计检验来鉴别的。若要调整多个假设检验,则通过重新随机化细胞类型中的细胞来创建背景空分布。这种分析可以用函数findInteractionChangedGenes或findICG实现。

gene_metadata = fDataDT(mini_visium)
high_expressed_genes = gene_metadata[mean_expr_det > 4]$gene_ID

high_expressed_genes

identify genes that are associated with proximity to other cell types

?findICG
ICGscoresHighGenes = findICG(gobject = mini_visium,
selected_genes = high_expressed_genes,
spatial_network_name = ‘Delaunay_network’,
cluster_column = ‘cell_types’,
diff_test = ‘permutation’,
adjust_method = ‘fdr’,
nr_permutations = 500,
do_parallel = T, cores = 2)

colnames(ICGscoresHighGenes$CPGscores)
[1] “genes” “sel” “other” “log2fc” “diff” “p.value” “p.adj” “perm_sel” “perm_other”
[10] “perm_log2fc” “perm_diff” “cell_type” “int_cell_type” “nr_select” “int_nr_select” “nr_other” “int_nr_other” “unif_int”
[19] “spec_int” “type_int”

可视化细胞接近基因得分

map(c( “cell_barplot”, “cell-cell”,
“dotplot”),function(x){
plotCellProximityGenes(mini_visium, cpgObject = ICGscoresHighGenes, method = x)
}) %>% cowplot::plot_grid(plotlist = .,ncol=3)

过滤 Interaction Changed Gene scores

ICGscoresFilt = filterICG(ICGscoresHighGenes,
min_cells = 2, min_int_cells = 2, min_fdr = 0.1,
min_spat_diff = 0.1, min_log2_fc = 0.1, min_zscore = 1)

过滤并可视化ICG基因

filter genes

?filterICG
ICGscoresFilt = filterICG(ICGscoresHighGenes,
min_cells = 2, min_int_cells = 2, min_fdr = 0.1,
min_spat_diff = 0.1, min_log2_fc = 0.1, min_zscore = 1)

ICGscoresFilt C P G s c o r e s CPGscores CPGscoresother

visualize subset of interaction changed genes (ICGs)

random subset

ICG_genes = topgenes_mast = mast_markers[, head(.SD, 1), by = ‘cluster’]$genes # c(‘Cpne2’, ‘Scg3’, ‘Cmtm3’, ‘Cplx1’, ‘Lingo1’)
ICG_genes_types = clusters_cell_types
names(ICG_genes) = ICG_genes_types

?plotICG
plotICG(gobject = mini_visium,
cpgObject = ICGscoresHighGenes,
source_type = ‘Tcf7l2_cells’,
source_markers = c(“Scg3”,“Nptx1”,“Stx1a”),
ICG_genes = ICG_genes)

基因表达在细胞附近发生的变化可以用几种方法来考量,配体受体对(Spatially informed ligand-receptor pairing)是其中的一个。为了研究细胞在微环境中是如何交流的,Giotto可以从现有的数据库中整合已知的配体-受体信息。通过计算这类基因对在两种细胞类型的相邻细胞中空间共表达情况,它可以估计出在两种细胞类型的相互作用,哪些配体-受体对可能更多(或更少)用于通信。这是在spatCellCellcom函数中实现的,更具体地说,对于每对配体-受体,计算出每对细胞类型的细胞-细胞通信评分S。

我们读入Giotto自带的配体受体对数据,在您自己的研究中可以根据需要准备或收集目的配受体。

LR_data = data.table::fread(system.file(“extdata”, “mouse_ligand_receptors.txt”, package = ‘Giotto’))
LR_data

 mouseLigand mouseReceptor

1: A2m Lrp1
2: Aanat Mtnr1a
3: Aanat Mtnr1b
4: Adam12 Itga9
5: Adam12 Itgb1

1063: Wnt7b Lrp5
1064: Xcl1 Xcr1
1065: Xcl2 Xcr1
1066: Yars Cxcr1
1067: Zp3 Mertk

先通过基因表达数据对细胞-细胞间的交流进行评分,这里没有用到空间数据,该方法用于模拟基于scRNAseq的CCI分析。

基因匹配

LR_data[, ligand_det := ifelse(mouseLigand %in% mini_visium@gene_ID, T, F)]
LR_data[, receptor_det := ifelse(mouseReceptor %in% mini_visium@gene_ID, T, F)]
LR_data_det = LR_data[ligand_det == T & receptor_det == T]
select_ligands = LR_data_det m o u s e L i g a n d s e l e c t r e c e p t o r s = L R d a t a d e t mouseLigand select_receptors = LR_data_det mouseLigandselectreceptors=LRdatadetmouseReceptor

get statistical significance of gene pair expression changes based on expression

expr_only_scores = exprCellCellcom(gobject = mini_visium,
cluster_column = ‘cell_types’,
random_iter = 500,
gene_set_1 = select_ligands,
gene_set_2 = select_receptors)

用空间数据对细胞通讯评分。

spatial_all_scores = spatCellCellcom(mini_visium,
spatial_network_name = ‘Delaunay_network’,
cluster_column = ‘cell_types’,
random_iter = 500,
gene_set_1 = select_ligands,
gene_set_2 = select_receptors,
adjust_method = ‘fdr’,
do_parallel = T,
cores = 4,
verbose = ‘none’)

spatial_all_scores[1:2,]
LR_comb lig_cell_type lig_expr ligand rec_cell_type rec_expr receptor LR_expr lig_nr rec_nr rand_expr av_diff log2fc pvalue
1: Npy-Npy2r Prdm8_cells 4.163227 Npy Spink8_cells 2.721694 Npy2r 6.884920 8 5 5.202572 1.6823484 0.3975514 0.052
2: Npy-Npy2r Ngef_cells 3.345273 Npy Prdm8_cells 2.803618 Npy2r 6.148892 5 3 6.866103 -0.7172109 -0.1567514 0.520
LR_cell_comb p.adj PI
1: Prdm8_cells–Spink8_cells 0.104 0.39077973
2: Ngef_cells–Prdm8_cells 0.688 -0.02545824

select top LR

selected_spat = spatial_all_scores[p.adj <= 0.5 & abs(log2fc) > 0.1 & lig_nr >= 2 & rec_nr >= 2]
selected_spat
data.table::setorder(selected_spat, -PI)

top_LR_ints = unique(selected_spat[order(-abs(PI))] L R c o m b ) [ 1 : 33 ] t o p L R c e l l i n t s = u n i q u e ( s e l e c t e d s p a t [ o r d e r ( − a b s ( P I ) ) ] LR_comb)[1:33] top_LR_cell_ints = unique(selected_spat[order(-abs(PI))] LRcomb)[1:33]topLRcellints=unique(selectedspat[order(abs(PI))]LR_cell_comb)[1:33]
?plotCCcomHeatmap
plotCCcomHeatmap(gobject = mini_visium,
comScores = spatial_all_scores,
selected_LR = top_LR_ints,
selected_cell_LR = top_LR_cell_ints,
show = ‘LR_expr’)

plotCCcomDotplot(gobject = mini_visium,
comScores = spatial_all_scores,
selected_LR = top_LR_ints,
selected_cell_LR = top_LR_cell_ints,
cluster_on = ‘PI’)

结合空间数据的细胞通讯。

?combCCcom
comb_comm = combCCcom(spatialCC = spatial_all_scores,
exprCC = expr_only_scores)

for top 1 expression ranks, you recover 0 % of the highest spatial rank
for top 10 expression ranks, you recover 29.55 % of the highest spatial rank
for top 20 expression ranks, you recover 57.95 % of the highest spatial rank

comb_comm[1:2]
LR_comb LR_cell_comb lig_cell_type rec_cell_type ligand receptor lig_expr_spat rec_expr_spat LR_expr_spat lig_nr_spat rec_nr_spat
1: Npy-Npy2r Ccdc153_cells–Ccdc153_cells Ccdc153_cells Ccdc153_cells Npy Npy2r 3.764760 0.9497822 4.714542 11 11
2: Npy-Npy2r Ccdc153_cells–Chgb_cells Ccdc153_cells Chgb_cells Npy Npy2r 3.472081 0.4239559 3.896037 17 11
rand_expr_spat av_diff_spat log2fc_spat pvalue_spat p.adj_spat PI_spat lig_expr rec_expr LR_expr lig_nr rec_nr rand_expr av_diff log2fc
1: 4.429262 0.28527998 0.08812256 0.102 0.204 0.06083716 3.552652 0.8706337 4.423286 106 106 3.334204 1.0890821 0.3973956
2: 3.861855 0.03418194 0.01239384 0.844 0.844 0.00091290 3.552652 0.2954424 3.848095 106 119 3.343469 0.5046257 0.1972939
pvalue p.adj PI LR_expr_rnk LR_spat_rnk exprPI_rnk spatPI_rnk
1: 0.168 0.336 0.18823066 12 9 19 8
2: 0.404 0.808 0.01826718 19 22 25 21

从张大表中我们可以提取很多可以可视化的信息。

top differential activity levels for ligand receptor pairs

plotRankSpatvsExpr(gobject = mini_visium,
comb_comm,
expr_rnk_column = ‘exprPI_rnk’,
spat_rnk_column = ‘spatPI_rnk’,
midpoint = 10)

predict maximum differential activity

plotRecovery(gobject = mini_visium,
comb_comm,
expr_rnk_column = ‘exprPI_rnk’,
spat_rnk_column = ‘spatPI_rnk’,
ground_truth = ‘spatial’)
percentage explained = 0.6022727

我们自己画画二者之间的关系。

p1 <- ggplot(comb_comm,aes(LR_spat_rnk ,LR_expr_rnk))+ geom_point() + theme_bw()

p2 <- ggplot(comb_comm,aes(exprPI_rnk ,spatPI_rnk))+ geom_point()+ theme_bw()
p1+p2

并用我们的igraph绘制网络图。

comb_commdf S O U R C E < − s t r s p l i t ( c o m b c o m m d f SOURCE <- str_split(comb_commdf SOURCE<strsplit(combcommdfLR_cell_comb,"–",simplify = T)[,1]
comb_commdf T A R G E T < − s t r s p l i t ( c o m b c o m m d f TARGET <- str_split(comb_commdf TARGET<strsplit(combcommdfLR_cell_comb,"–",simplify = T)[,2]

comb_commdf
library(igraph)
?graph_from_data_frame
colnames(graph_from_data_frame)
?recode
comb_commdf <- comb_commdf %>% relocate(c(“SOURCE”, “TARGET”,“LR_expr_rnk”, “LR_spat_rnk”, “exprPI_rnk”, “spatPI_rnk”),.before =“LR_comb”)

net<- graph_from_data_frame(comb_commdf)

plot(net)
allcolour=c("#DC143C","#0000FF","#20B2AA","#FFA500","#9370DB",
“#98FB98”,"#F08080","#1E90FF","#7CFC00","#FFFF00",
“#808000”,"#FF00FF","#FA8072","#7B68EE","#9400D3",
“#800080”,"#A0522D","#D2B48C","#D2691E","#87CEEB",
“#40E0D0”,"#5F9EA0","#FF1493",
“#FFE4B5”,"#8A2BE2","#228B22","#E9967A","#4682B4",
“#32CD32”,"#F0E68C","#FFFFE0","#EE82EE","#FF6347",
“#6A5ACD”,"#9932CC","#8B008B","#8B4513","#DEB887")

karate_groups <- cluster_optimal(net)
coords <- layout_in_circle(net, order =
order(membership(karate_groups))) # 设置网络布局
E(net) w i d t h < − E ( n e t ) width <- E(net) width<E(net)LR_expr_rnk /E(net)$LR_spat_rnk # 边点权重(粗细)

net2 <- net # 复制一份备用

for (i in 1: length(unique(comb_commdfKaTeX parse error: Expected '}', got 'EOF' at end of input: …que(comb_commdfSOURCE),function(x) {
get.edge.ids(net,vp = c(unique(comb_commdfKaTeX parse error: Expected 'EOF', got '}' at position 18: …URCE)[i],x)) }̲)%>% unlist()]color <- allcolour[i]
} # 这波操作谁有更好的解决方案?

plot(net, edge.arrow.size=.1,
edge.curved=0.2,
vertex.color=allcolour,
vertex.frame.color="#555555",
vertex.label.color=“black”,
layout = coords,
vertex.label.cex=.7)

可见,模型建好之后,可视化的工作就是提取相关的信息就可以了。作为本次演示的结尾,我们看看如果有目标cell、spot如何画出它的边界,毕竟还是想以空间结尾。

nb_annot = findNetworkNeighbors(mini_visium,
spatial_network_name = ‘Delaunay_network’,
source_cell_ids = unique(mini_visium@spatial_network k N N n e t w o r k kNN_network kNNnetworknetworkDT$to)[sample(500,30)],
)

mini_visium = addCellMetadata(mini_visium, new_metadata = nb_annot, by_column = T, column_cell_ID = ‘cell_ID’)
?spatPlot
spatPlot(mini_visium, point_shape = ‘voronoi’, cell_color =‘nb_cells’,show_image=T,
cell_color_code = c(source = ‘blue’, target = ‘red’, both = ‘yellow’, others = ‘lightgrey’ , neighbor=‘green’))

@Article{,
author = {Ruben Dries and Qian Zhu and Rui Dong and Chee-Huat Linus Eng and Huipeng Li and Kan Liu and Yuntian Fu and Tianxiao Zhao and Arpan Sarkar and Rani E George and Nico Pierson and Long Cai and Guo-Cheng Yuan},
title = {Giotto, a toolbox for integrative analysis and visualization of spatial expression data},
journal = {bioRxiv},
year = {2020},
doi = {10.1101/701680},
url = {https://doi.org/10.1101/701680},
}

Updating immune cell deconvolution for the spatial genomics era
https://www.drieslab.com/
https://rubd.github.io/Giotto/

你可能感兴趣的:(数据挖掘,r语言)