内分泌胰腺
胰腺内分泌发育,谱系分化有四种命运决定:α、β、δ和ε细胞。有关详细信息,请参阅此处。
数据来自Bastidas-Ponce et al. (2018).
[1]:
import scvelo as scv
scv.logging.print_version()
Running scvelo 0.2.0 (python 3.8.2) on 2020-05-15 00:58.
[2]:
scv.settings.verbosity = 3 # show errors(0), warnings(1), info(2), hints(3)
scv.settings.presenter_view = True # set max width size for presenter view
scv.settings.set_figure_params('scvelo') # for beautified visualization
加载和清洗数据
以下分析基于内置胰腺数据集。
要对自己的数据进行速率分析,请将您的文件(loom, h5ad, xlsx, csv, tab, txt …)读取到 AnnData 对象,使用函数adata = scv.read('path/file.loom', cache=True)
如果您想将loom文件合并到已存在的 AnnData 对象中,请使用scv.utils.merge(adata, adata_loom)
[3]:
adata = scv.datasets.pancreas()
[4]:
# show proportions of spliced/unspliced abundances
scv.utils.show_proportions(adata)
adata
Abundance of ['spliced', 'unspliced']: [0.83 0.17]
[4]:
AnnData object with n_obs × n_vars = 3696 × 27998
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score'
var: 'highly_variable_genes'
uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
obsm: 'X_pca', 'X_umap'
layers: 'spliced', 'unspliced'
预处理数据
必要的预处理包括:
- 通过检测(以最少计数检测)和高变异性(离散度)进行基因选择。
- -按每个细胞的初始大小和logX使每个细胞标准化。
此外,我们需要在PCA空间中最近的邻居之间计算的第一和第二顺序时刻(平均值和去中心化方差)。确定速率估计需要第一个顺序,而随机估计也需要第二个顺序时刻。
[5]:
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
Filtered out 20801 genes that are detected in less than 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Logarithmized X.
computing neighbors
finished (0:00:03) --> added
'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
finished (0:00:00) --> added
'Ms' and 'Mu', moments of spliced/unspliced abundances (adata.layers)
计算速率和速率图
基因特异性速率通过前体(未剪切)和成熟(剪切)mRNA丰度之间的比率来获得,这很好地解释了稳定状态(恒定的转录状态),然后计算观测到的丰度如何偏离稳定状态的预期。(我们将很快发布不再依赖于稳定状态假设的版本)。
每个工具都有其绘图对应函数。例如,scv.tl.velocity的结果可以使用scv.pl.velocity可视化。
[6]:
scv.tl.velocity(adata)
computing velocities
finished (0:00:01) --> added
'velocity', velocity vectors for each individual cell (adata.layers)
这计算了潜在细胞转换与高维空间中速率矢量的相关性。由此产生的速率图具有维度n obs×n obs,并总结了通过速率矢量很好地解释的可能细胞状态变化(通过从一个细胞过渡到另一个细胞)。例如,该图用于将速率投影到低维嵌入中。
[7]:scv.tl.velocity_graph(adata)
computing velocity graph finished (0:00:12) --> added 'velocity_graph', sparse matrix with cosine correlations (adata.uns)
绘制结果图
速率通过指定basis被投影到任何嵌入。可视化的三种可用方式:在单个细胞级别,网格级别,或如图所示的流线图。
[8]:scv.pl.velocity_embedding_stream(adata, basis='umap')
computing velocity embedding finished (0:00:00) --> added 'velocity_umap', embedded velocity vectors (adata.obsm)
[9]:scv.pl.velocity_embedding(adata, basis='umap', arrow_length=2, arrow_size=1.5, dpi=150)
[10]:scv.tl.recover_dynamics(adata)
recovering dynamics finished (0:12:24) --> added 'fit_pars', fitted parameters for splicing dynamics (adata.var)[11]:scv.tl.velocity(adata, mode='dynamical')scv.tl.velocity_graph(adata)
computing velocities finished (0:00:04) --> added 'velocity', velocity vectors for each individual cell (adata.layers)computing velocity graph finished (0:00:07) --> added 'velocity_graph', sparse matrix with cosine correlations (adata.uns)[12]:scv.tl.latent_time(adata)scv.pl.scatter(adata, color='latent_time', color_map='gnuplot', size=80, colorbar=True)
computing terminal states identified 2 regions of root cells and 1 region of end points finished (0:00:00) --> added 'root_cells', root cells of Markov diffusion process (adata.obs) 'end_points', end points of Markov diffusion process (adata.obs)computing latent time finished (0:00:01) --> added 'latent_time', shared time (adata.obs)
[13]:top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300]scv.pl.heatmap(adata, var_names=top_genes, tkey='latent_time', n_convolve=100, col_color='clusters')
[14]:scv.pl.scatter(adata, basis=top_genes[:10], frameon=False, ncols=5)
[15]:scv.pl.scatter(adata, basis=['Actn4', 'Ppp3ca', 'Cpe', 'Nnat'], frameon=False)
[16]:scv.pl.scatter(adata, x='latent_time', y=['Actn4', 'Ppp3ca', 'Cpe', 'Nnat'], frameon=False)
[17]:scv.pl.velocity_embedding_stream(adata, basis='umap', title='', smooth=.8, min_mass=4)
computing velocity embedding finished (0:00:00) --> added 'velocity_umap', embedded velocity vectors (adata.obsm)