2022空间转录组分析02：stLearn-空间临近通讯

关于本教程的数据说明，见：https://www.jianshu.com/p/2ac31d3a5560

stLearn是python语言编写的，更新速度贼快，网上很多笔记教程已经运行不通了，前两天一口气更新了三个版本：

image-20220303192945678.png

今天再看的时候这三个版本已经没啦，又又又更新了一个：

image-20220303193212764.png

本教程基于0.4.0版本，请安装：pip install stlearn==0.4.0

分析基本原理

image-20220303193424070.png

主要用于鉴定空间转录组切片上的热点区域，即CCI hotspot，cell cell interaction hotspot，这个区域具有特征：These regions with high cell diversity and L-R activities are considered as hotspots in the tissue with most likely cell-cell interaction activities。

第一点：细胞类型多；第二点：配体-受体共表达。因此，在分析之前，最好已经对空间转录组数据进行了spot细胞类型注释。

分析步骤：

1.加载已知的配体受体基因对
2.鉴定配受体对显著互作的spot
3.对于每个L-R对和每个细胞类型-细胞类型组合：count the instances where neighbours of a signficant spot for that LR pair link two given cell types
4.扰动细胞类型标签，鉴定显著的互作(pvalue<0.05)
5.可视化CCI结果

一、配体-受体分析

stLearn CCI(Cell-Cell Interaction)流程的第一步是配体受体(LR)分析。
这个分析从候选配体-受体数据库中鉴定配体-受体相互作用的显著spot。
运行时将严重依赖于可用的数据集和计算资源; 请注意，分析支持多线程。

1. 数据加载和预处理

数据目录结构：

image-20220303194611493.png

注意：LR分析不需要log1p数据

conda activate stlearn

import stlearn as st
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Loading raw data #
data_dir = "./"
data = st.Read10X(data_dir)
data.var_names_make_unique()
st.add.image(adata=data,
             imgpath=data_dir+"spatial/tissue_hires_image.png",
             library_id="V1_Breast_Cancer_Block_A_Section_1", visium=True)

# Basic normalisation #
st.pp.filter_genes(data, min_cells=3)
st.pp.normalize_total(data) # NOTE: no log1p

作者提示到：对于上面的细胞注释类型信息，是使用Seurat生成的。数据下载：tutorials

对于每一个spot可以没有细胞类型打分，如果有细胞类型打分，do need to add the dominant cell type to the adata.obs slot with the same key as the cell type scores added to the adata.uns slot。

# Adding the label transfer results,  #
spot_mixtures = pd.read_csv(f'{data_dir}/tutorials/label_transfer_bc.txt', index_col=0, sep='\t')
labels = spot_mixtures.loc[:,'predicted.id'].values.astype(str)
spot_mixtures = spot_mixtures.drop(['predicted.id','prediction.score.max'],axis=1)
spot_mixtures.columns = [col.replace('prediction.score.', '')
                         for col in spot_mixtures.columns]

# Note the format
print(labels)
print(spot_mixtures)
# Check is in correct order
print('Spot mixture order correct?: ', np.all(spot_mixtures.index.values==data.obs_names.values)) 

# NOTE: using the same key in data.obs & data.uns
# Adding the dominant cell type labels per spot
data.obs['cell_type'] = labels 
data.obs['cell_type'] = data.obs['cell_type'].astype('category')
# Adding the cell type scores
data.uns['cell_type'] = spot_mixtures 

st.pl.cluster_plot(data, use_label='cell_type')
plt.savefig('./s1.cluster_plot.jpg')
plt.close()

细胞注释后的标签图：

image-20220303195018625.png

2.运行配体受体分析

运行：st.tl.cci.run这一步耗时比较久，建议在集群上进行分析，n_pairs教程推荐10000次

# Loading the LR databases available within stlearn (from NATMI)
lrs = st.tl.cci.load_lrs(['connectomeDB2020_lit'], species='human')
print(len(lrs))

# Running the analysis #
# min_spots：Filter out any LR pairs with no scores for less than min_spots
# distance： None defaults to spot+immediate neighbours; distance=0 for within-spot mode
# n_pairs： Number of random pairs to generate; low as example, recommend ~10,000
# n_cpus：Number of CPUs for parallel. If None, detects & use all available.
# 这一步耗时比较久，如果选择n_pairs=10000，可能要好几个小时，可以多选择几个cpu
st.tl.cci.run(data, lrs, min_spots = 20, distance=None, n_pairs=100, n_cpus=15)

# A dataframe detailing the LR pairs ranked by number of significant spots.
lr_info = data.uns['lr_summary']
print('\n', lr_info)

Spot neighbour indices保存在data.obsm['spot_neighbours'] 与 data.obsm['spot_neigh_bcs']中

结果保存在：

lr_scores 保存在 adata.obsm['lr_scores']
p_vals 保存在 adata.obsm['p_vals']
p_adjs 保存在 adata.obsm['p_adjs']
-log10(p_adjs) 保存在 adata.obsm['-log10(p_adjs)']
lr_sig_scores 保存在 adata.obsm['lr_sig_scores']

每个spot结果在adata.obsm的列与adata.uns['lr_summary']中的行名一样且顺序一致

LR结果的summary在data.uns['lr_summary']中：

image-20220223220122573.png

3.P-value矫正

可以使用不同的方法矫正 p 值; p 值已经通过运行 st.tl.cci.run 进行了矫正。不同的矫正方法差别在于correct_axis参数，可以是LR，spot或者none。

st.tl.cci.adj_pvals(data, correct_axis='spot', pval_adj_cutoff=0.05, adj_method='fdr_bh')

4.LRs排序并可视化

# Showing the rankings of the LR from a global and local perspective.
# Ranking based on number of significant hotspots.
st.pl.lr_summary(data, n_top=500)
plt.savefig('./s2.lr_summary500.jpg')
plt.close()

st.pl.lr_summary(data, n_top=50, figsize=(10,3))
plt.savefig('./s2.lr_summary50.jpg')
plt.close()

根据每个LR显著spots数展示LR top 50, 500：

image-20220303195900213.png

image-20220303195846511.png

5.诊断图

LR 分析的一个关键是识别显著的hotspot时控制 LR 的表达水平和频率，所以我们的诊断图应该是hotspot区的配受体对与其表达水平和表达频率之间不相关。

下面的诊断图可以检查这个情况；如果相关，需要设置更大的n_pairs扰动次数。

st.pl.lr_diagnostics(data, figsize=(10,2.5))
plt.savefig('./s2.lr_diagnostics.jpg')
plt.close()

左图: LR 表达水平(LR 对中非零点平均基因中位数表达)与 LR 排名的关系
右图: LR 表达频率(LR 对中每个基因的平均零点比例)与 LR 排名之间的关系

这个例子中，LR 表达频率和significant spots 数量之间的有比较弱的相关性，因此，这n _ pairs 参数应该设置得更高，以创建更准确的背景分布(文献中中使用了n _ pairs=10,000)。

image-20220303195624707.png

st.pl.lr_n_spots(data, n_top=50, figsize=(11, 4),max_text=100)
plt.savefig('./s2.lr_n_spots50.jpg')
plt.close()

st.pl.lr_n_spots(data, n_top=500, figsize=(11, 4), max_text=100)
plt.savefig('./s2.lr_n_spots500.jpg')
plt.close()

根据每个配受体对在多少个spot里面lr打分显著进行排序，展示top500、top50：

image-20220303195308738.png

image-20220303195249915.png

6.Biologic Plots (可选)

配受体对的功能富集分析，结果保存在：adata.uns['lr_go']

r_path = "/share/nas5/zhangj/biosoft/miniconda3/envs/R4/bin/Rscript"
st.tl.cci.run_lr_go(data, r_path)

# save
st.pl.lr_go(data, lr_text_fp={'weight': 'bold', 'size': 10}, rot=15, figsize=(12,3.65),n_top=15, show=False)
plt.savefig('./s3.lr_go.jpg')
plt.close()

二、L-R可视化

LR _ result _ plot 在空间数据中绘制指定 LR 对的分析结果，需要的值存储在：data.obsm中，包括: lr_scores, p_vals, p_adjs, -log10(p_adjs), lr_sig_scores

# Just choosing one of the top from lr_summary
best_lr = data.uns['lr_summary'].index.values[0]
best_lr
stats = ['lr_scores', 'p_vals', 'p_adjs', '-log10(p_adjs)']
fig, axes = plt.subplots(ncols=len(stats), figsize=(16,6))
for i, stat in enumerate(stats):
    st.pl.lr_result_plot(data, use_result=stat, use_lr=best_lr, show_color_bar=False, ax=axes[i])
    axes[i].set_title(f'{best_lr} {stat}')

plt.savefig('./s4.lr_result_plot.jpg')
plt.close()

fig, axes = plt.subplots(ncols=2, figsize=(8,6))
st.pl.lr_result_plot(data, use_result='-log10(p_adjs)', use_lr=best_lr, show_color_bar=False, ax=axes[0])
st.pl.lr_result_plot(data, use_result='lr_sig_scores', use_lr=best_lr, show_color_bar=False, ax=axes[1])
axes[0].set_title(f'{best_lr} -log10(p_adjs)')
axes[1].set_title(f'{best_lr} lr_sig_scores')
plt.savefig('./s4.lr_result_plot-1.jpg')
plt.close()

这两张图一般是文献中出现的比较多的图，选择的配受体对的共表达打分，显著性：

image-20220303200316954.png

打分结合显著性后的结果：

image-20220303200344652.png

三、LR 说明可视化

这些图旨在帮助解释细胞之间cross-talk的方向性

st.pl.lr_plot(data, best_lr, inner_size_prop=0.1, outer_mode='binary', pt_scale=5, use_label=None, show_image=True,sig_spots=False)
plt.savefig('./s5.lr_plot.jpg')
plt.close()

st.pl.lr_plot(data, best_lr, outer_size_prop=1, outer_mode='binary', pt_scale=20, use_label=None, show_image=True,sig_spots=True)
plt.savefig('./s5.lr_plot-1.jpg')
plt.close()

红色为配体，绿色为受体，蓝色为共表达。有助于了解配体/受体在哪里以及在多大程度上表达。理想情况是：受体位于细胞表面，配体从细胞表面渗透出来。

image-20220303200512620.png

具有显著性的spot：

image-20220303200620632.png

看表达连续值情况：

# All spots #
st.pl.lr_plot(data, best_lr,
              inner_size_prop=0.04, middle_size_prop=.07, outer_size_prop=.4,
              outer_mode='continuous', pt_scale=60,
              use_label=None, show_image=True,
              sig_spots=False)
plt.savefig('./s5.lr_plot-2.jpg')
plt.close()

# Only significant spots #
st.pl.lr_plot(data, best_lr,
              inner_size_prop=0.04, middle_size_prop=.07, outer_size_prop=.4,
              outer_mode='continuous', pt_scale=60,
              use_label=None, show_image=True,
              sig_spots=True)
plt.savefig('./s5.lr_plot-3.jpg')
plt.close()

这仅在放大并希望同时显示细胞信息和互作方向时有用：

st.pl.lr_plot(data, best_lr,
              inner_size_prop=0.08, middle_size_prop=.3, outer_size_prop=.5,
              outer_mode='binary', pt_scale=50,
              show_image=True, arrow_width=10, arrow_head_width=10,
              sig_spots=True, show_arrows=True)
plt.savefig('./s5.lr_plot-4.jpg')
plt.close()

这个图使用交互式绘图效果更好，交互式绘图参考：https://stlearn.readthedocs.io/en/latest/tutorials/Interactive_plot.html

这里看不出来图上的细节：

image-20220303200837897.png

我们还可以观察到配体或受体在哪里与占优势的点细胞类型同时表达/共表达：

st.pl.lr_plot(data, best_lr,
              inner_size_prop=0.08, middle_size_prop=.3, outer_size_prop=.5,
              outer_mode='binary', pt_scale=150,
              use_label='cell_type', show_image=True,
              sig_spots=True)
plt.savefig('./s5.lr_plot-5.jpg')
plt.close()

外点显示配体(红色)、受体(绿色)和共表达(蓝色)的表达。内点由占参与互作的主要细胞类型着色：

image-20220303201156865.png

四、细胞间互作分析CCIs

Calls significant celltype-celltype interactions based on cell-type data randomisation。
确定了LR 共表达的显著spot区域后，现在可以确定显著的互作细胞类型，st.tl.cci.run_cci参数含义如下：

min_spots: Specifies the minimum number of spots where LR score present to include in subsequent analysis
spot_mixtures: # If True will use the label transfer scores, 前面导入的细胞类型打分，so spots can have multiple cell types if score>cell_prop_cutoff
cell_prop_cutoff: # Spot considered to have cell type if score>0.2
sig_spots: # Only consider neighbourhoods of spots which had significant LR scores.
n_perms: # Permutations of cell information to get background, recommend ~1000

# Running the counting of co-occurence of cell types and LR expression hotspots #
# Spot cell information either in data.obs or data.uns
# 这一步运行也会比较耗时，可以适当增加n_perms次数，推荐使用1000
st.tl.cci.run_cci(data, 'cell_type', min_spots=3, spot_mixtures=True, cell_prop_cutoff=0.2, sig_spots=True, n_perms=100 )

Significant counts of cci_rank interactions for each LR pair stored in dictionary per_lr_cci_cell_type。

五、诊断图: 检查互作与细胞类型种类数的相关性

理论上cci_rank与cell type 种类应该无关，如果有相关，可以增加st.tl.cci.run_cci函数中的n_perms值，增大扰动次数。

st.pl.cci_check(data, 'cell_type')
plt.savefig('./s6.cci_check.jpg')
plt.close()

image-20220303202059103.png

六、CCI 可视化

如果你更喜欢使用R来进行绘图，可以保存 anndata 对象中的邻接矩阵，并使用 CellChat 包的可视化功能来实现可视化。

# Visualising the no. of interactions between cell types across all LR pairs #
pos_1 = st.pl.ccinet_plot(data, 'cell_type', return_pos=True)
plt.savefig('./s7.ccinet_plot.jpg')
plt.close()

# Just examining the cell type interactions between selected pairs #
lrs = data.uns['lr_summary'].index.values[0:3]
for best_lr in lrs[0:3]:
    st.pl.ccinet_plot(data, 'cell_type', best_lr, min_counts=2,figsize=(10,7.5), pos=pos_1)
    plt.savefig(fname=f's8.ccinet_plot_{best_lr}.jpg')
    plt.close()

总互作：

image-20220303202346307.png

圈的大小：某个细胞群参与的spot互作数；箭头大小：两个细胞群间的总互作数：

image-20220303202248454.png

CCI和弦图

可视化少数细胞类型之间的互作时和弦图非常有用

st.pl.lr_chord_plot(data, 'cell_type')
plt.savefig('./s9.lr_chord_plot.jpg')
plt.close()

lrs = data.uns['lr_summary'].index.values[0:3]
for lr in lrs:
    st.pl.lr_chord_plot(data, 'cell_type', lr)
    plt.savefig(fname=f'./s9.lr_chord_plot_{lr}.jpg')
    plt.close()

总的和弦图：

image-20220303202504778.png

单个配受体：

image-20220303202520652.png

Heatmap Visualisations

LR-CCI-Map

我们还提供了一些热度图的可视化，这样你就可以同时观察每个celltype-celltype 的多个 LR 对之间的相互作用

# This will automatically select the top interacting CCIs and their respective LRs #
st.pl.lr_cci_map(data, 'cell_type', lrs=None, min_total=100, figsize=(20,8))
plt.savefig('./s10.lr_cci_map.jpg')
plt.close()

# You can also put in your own LR pairs of interest #
lrs = data.uns['lr_summary'].index.values[0:11]
st.pl.lr_cci_map(data, 'cell_type', lrs=lrs, min_total=100, figsize=(20,12))
plt.savefig('./s10.lr_cci_map-1.jpg')
plt.close()

自动选取top LR对：

image-20220303202621290.png

也可以用户指定配体受体后：

image-20220303202950530.png

CCI Maps

Heatmap visualising sender->receivers of cell type interactions

# of interactions 指的是一个点与接受细胞类型表达配体和源细胞类型表达受体在同一邻域的次数。

st.pl.cci_map(data, 'cell_type')
plt.savefig('./s11.cci_map.jpg')
plt.close()

lrs = data.uns['lr_summary'].index.values[0:3]
for lr in lrs[0:3]:
    st.pl.cci_map(data, 'cell_type', lr)
    plt.savefig(fname='s11.cci_map_{lr}.jpg')
    plt.close()

总的：

image-20220303202645039.png

指定配受体的：

image-20220303203034988.png

七、Spatial cell type interactions

j结合LR分析与CCI分析，我们现在可以看到这些细胞在组织的哪个部位相互通讯。

best_lr = lrs[0]

### This will plot with simple black arrows ####
st.pl.lr_plot(data, best_lr, outer_size_prop=1, outer_mode=None,
              pt_scale=40, use_label='cell_type', show_arrows=True,
              show_image=True, sig_spots=False, sig_cci=True,
              arrow_head_width=4,
              arrow_width=1, cell_alpha=.8 )
plt.savefig('./s12.lr_plot_{best_lr}.jpg')
plt.close()

### This will colour the spot by the mean LR expression in the spots connected by arrow
st.pl.lr_plot(data, best_lr, outer_size_prop=1, outer_mode=None,
              pt_scale=10, use_label='cell_type', show_arrows=True,
              show_image=True, sig_spots=False, sig_cci=True,
              arrow_head_width=4, arrow_width=2,arrow_cmap='YlOrRd', arrow_vmax=1.5)
plt.savefig('./s12.lr_plot_{best_lr}.jpg')
plt.close()

# 保存数据
# 设置备份地址
data.filename = './LR.h5ad'
# 查看是否备份成功
print(data.isbacked) # True

image-20220303203238622.png

八、Visualisation Tips

空间切片中的 CCIs 具有丰富的信息，以上哪种可视化方式会有用，将取决于你想要强调的生物学和关键方面。根据我们的经验，显示感兴趣的 LR 统计数据，然后使用细胞类型信息和箭头绘制图表是很有用的。

在后一种情节中，最好是突出显示感兴趣的区域(如上面的箭头) ，然后放大显示。因此，你可以突出显示预计将发生有趣 cci 的代表区域。请参阅“ Interactive stLearn”教程，使用交互式散景应用程序轻松快速地创建此类可视化效果。

参考资料：

春节福利 | 空间转录组细胞通讯分析指南待查收
官网：https://stlearn.readthedocs.io/en/latest/tutorials/stLearn-CCI.html
官网数据：https://github.com/BiomedicalMachineLearning/stLearn
翻译版本：https://www.jianshu.com/p/b47d406861ce