Python单细胞复现2022||04-绘制Figure 1

本次学习资料来源,结合视频观看效果更佳,视频相关代码如下:

  • 视频如下:https://www.youtube.com/watch?v=uvyG9yLuNSE
  • 代码:https://github.com/mousepixels/sanbomics_scripts
  • 主代码:https://github.com/mousepixels/sanbomics_scripts/blob/main/single_cell_analysis_complete_class.ipynb

文献中的Fig1如下:


image-20221003102651856.png

如果要绘制Fig1,需要对上次笔记中的注释再进行进一步的详细注释,上次注释结果:

image-20221003102912211.png

数据读取

import scanpy as sc
import scvi
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 注意dir改成自己的路径
dir = '/Pub/Users/zhangjuan/data/GSE171524/'
adata = sc.read_h5ad(dir+'integrated_anno.h5ad')
adata

详细注释

marker采用来自数据库:PanglaoDB https://panglaodb.se/
首先是B,T/NK细胞,有:

  • CD4+ T cells:CD4
  • CD8+ T cells:CD8A, CD8B
  • NK/T cells:NCAM1
  • Plasma cells:MZB1

上皮群:

  • AT1:AGER
  • AT2:SFTPC
  • Airway epithelial:SEC14L3, CDH1

髓系:

  • Macrophages:MRC1
  • Dendritic cells:ZBTB46
  • Monocytes:APOBEC3A

基质细胞:

  • smooth muscle cells:RGS5,ACTA2
markers = adata.uns['markers']

# 看某一个基因的表达情况
markers[markers.names=="RGS5"]

#, layer = 'scvi_normalized'
# , vmax = 5
sc.pl.umap(adata, color = ["RGS5"], frameon = False, layer = 'scvi_normalized', vmax =2)
plt.savefig(dir+"03-intergration_umap_gene_RGS5.png")

先给出每个cluster编号以及需要填充的空格,在后续的操作中这个地方会填上每个注释结果:

for x in range(0,35):
    print(f'"{x}":"", ')

cell_type_sub = {"0":"CD4+ T cells",
"1":"AT1",
"2":"Macrophages",
"3":"Fibroblast",
"4":"Macrophages",
"5":"CD8+ T cells",
"6":"Macrophages",
"7":"AT2",
"8":"Endothelial",
"9":"Fibroblast",
"10":"Macrophages",
"11":"Plasma cells",
"12":"AT2",
"13":"Fibroblast",
"14":"Fibroblast",
"15":"AT2",
"16":"Airway epithelial",
"17":"Macrophages",
"18":"Airway epithelial",
"19":"AT2",
"20":"Neuronal Cell",
"21":"Airway epithelial",
"22":"B Cell",
"23":"Mast Cell",
"24":"Cycling T/NK",
"25":"Plasma cells",
"26":"DCs",
"27":"Endothelial",
"28":"smooth muscle cells",
"29":"Monocytes",
"30":"Erythroid",
"31":"B Cell",
"32":"Neuronal Cell",
"33":"Macrophages"
}

注释后:

adata.obs['cell type_sub'] = adata.obs.leiden.map(cell_type_sub)
sc.pl.umap(adata, color = ['cell type_sub'], frameon = False, legend_loc = "on data")
plt.tight_layout()
plt.savefig(dir+"04-intergration_umap_anno_sub.png", dpi=300)

# 保存
adata.write_h5ad(dir + 'integrated_anno.h5ad')

详细注释结果如下:

image-20221004003637491.png

绘制C、D图

首先对细胞进行计数:

# adata = sc.read_h5ad('integrated.h5ad')
adata.obs.Sample.unique().tolist()

# 添加一列细胞的样本来源
def map_condition(x):
    if 'cov' in x:
        return 'COVID19'
    else:
        return 'Control'

adata.obs['condition'] = adata.obs.Sample.map(map_condition)
adata.obs

结果图:

image-20221004005102198.png
# 每个样本中的细胞数统计
num_tot_cells = adata.obs.groupby(['Sample']).count()
num_tot_cells = dict(zip(num_tot_cells.index, num_tot_cells.doublet))
num_tot_cells

# {'C51ctr': 10934, 'C52ctr': 3967, 'C53ctr': 6076, 'C54ctr': 3860, 'C55ctr': 5110, 'C56ctr': 3609, 'C57ctr': 4307, 'L01cov': 2737,'L03cov': 3604, 'L04cov': 3056, 'L04covaddon': 4073, 'L05cov': 2435, 'L06cov': 5657, 'L07cov': 4217, 'L08cov': 3473, 'L09cov': 3075, 'L10cov': 2713, 'L11cov': 2531, 'L12cov': 3259, 'L13cov': 4244, 'L15cov': 3516, 'L16cov': 1600, 'L17cov': 3937, 'L18cov': 2392, 'L19cov': 2135, 'L21cov': 2961, 'L22cov': 5786}

cell_type_counts = adata.obs.groupby(['Sample', 'condition', 'cell type']).count()
cell_type_counts = cell_type_counts[cell_type_counts.sum(axis = 1) > 0].reset_index()
cell_type_counts = cell_type_counts[cell_type_counts.columns[0:4]]
cell_type_counts

#      Sample condition        cell type  doublet
# 0    C51ctr   Control           B Cell       70
# 1    C51ctr   Control      Endothelial     1153
# 2    C51ctr   Control  Epithelial Cell     3711
# 3    C51ctr   Control       Fibroblast     1497
# 4    C51ctr   Control        Mast Cell      290
# ..      ...       ...              ...      ...
# 246  L22cov   COVID19        Mast Cell        7
# 247  L22cov   COVID19     Myeloid Cell     1663
# 248  L22cov   COVID19    Neuronal Cell       83
# 249  L22cov   COVID19           T Cell     1379
# 250  L22cov   COVID19              pDC      510

# [251 rows x 4 columns]


# 计算频率
cell_type_counts['total_cells'] = cell_type_counts.Sample.map(num_tot_cells).astype(int)
cell_type_counts['frequency'] = cell_type_counts.doublet / cell_type_counts.total_cells
cell_type_counts

每种细胞类型在COVID19与control两种条件下的频率差异:

image-20221004005414903.png

绘图:

plt.figure(figsize = (10,4))
ax = sns.boxplot(data = cell_type_counts, x = 'cell type', y = 'frequency', hue = 'condition')
plt.xticks(rotation = 35, rotation_mode = 'anchor', ha = 'right')
plt.tight_layout()
plt.savefig(dir+"05-intergration_Fig1D.png", dpi=300)

结果图如下:

image-20221004005902832.png

C图如下:

sc.pl.umap(adata, color = ['cell type_sub','condition'], frameon = False, legend_loc = "on data")
plt.savefig(dir+"05-intergration_Fig1B_C.png", dpi=300)

结果图如下:

image-20221004010349956.png

下期见~

你可能感兴趣的:(Python单细胞复现2022||04-绘制Figure 1)