本次学习资料来源,结合视频观看效果更佳,视频相关代码如下:
- 视频如下:https://www.youtube.com/watch?v=uvyG9yLuNSE
- 代码:https://github.com/mousepixels/sanbomics_scripts
- 主代码:https://github.com/mousepixels/sanbomics_scripts/blob/main/single_cell_analysis_complete_class.ipynb
文献中的Fig1如下:
如果要绘制Fig1,需要对上次笔记中的注释再进行进一步的详细注释,上次注释结果:
数据读取
import scanpy as sc
import scvi
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# 注意dir改成自己的路径
dir = '/Pub/Users/zhangjuan/data/GSE171524/'
adata = sc.read_h5ad(dir+'integrated_anno.h5ad')
adata
详细注释
marker采用来自数据库:PanglaoDB https://panglaodb.se/
首先是B,T/NK细胞,有:
- CD4+ T cells:CD4
- CD8+ T cells:CD8A, CD8B
- NK/T cells:NCAM1
- Plasma cells:MZB1
上皮群:
- AT1:AGER
- AT2:SFTPC
- Airway epithelial:SEC14L3, CDH1
髓系:
- Macrophages:MRC1
- Dendritic cells:ZBTB46
- Monocytes:APOBEC3A
基质细胞:
- smooth muscle cells:RGS5,ACTA2
markers = adata.uns['markers']
# 看某一个基因的表达情况
markers[markers.names=="RGS5"]
#, layer = 'scvi_normalized'
# , vmax = 5
sc.pl.umap(adata, color = ["RGS5"], frameon = False, layer = 'scvi_normalized', vmax =2)
plt.savefig(dir+"03-intergration_umap_gene_RGS5.png")
先给出每个cluster编号以及需要填充的空格,在后续的操作中这个地方会填上每个注释结果:
for x in range(0,35):
print(f'"{x}":"", ')
cell_type_sub = {"0":"CD4+ T cells",
"1":"AT1",
"2":"Macrophages",
"3":"Fibroblast",
"4":"Macrophages",
"5":"CD8+ T cells",
"6":"Macrophages",
"7":"AT2",
"8":"Endothelial",
"9":"Fibroblast",
"10":"Macrophages",
"11":"Plasma cells",
"12":"AT2",
"13":"Fibroblast",
"14":"Fibroblast",
"15":"AT2",
"16":"Airway epithelial",
"17":"Macrophages",
"18":"Airway epithelial",
"19":"AT2",
"20":"Neuronal Cell",
"21":"Airway epithelial",
"22":"B Cell",
"23":"Mast Cell",
"24":"Cycling T/NK",
"25":"Plasma cells",
"26":"DCs",
"27":"Endothelial",
"28":"smooth muscle cells",
"29":"Monocytes",
"30":"Erythroid",
"31":"B Cell",
"32":"Neuronal Cell",
"33":"Macrophages"
}
注释后:
adata.obs['cell type_sub'] = adata.obs.leiden.map(cell_type_sub)
sc.pl.umap(adata, color = ['cell type_sub'], frameon = False, legend_loc = "on data")
plt.tight_layout()
plt.savefig(dir+"04-intergration_umap_anno_sub.png", dpi=300)
# 保存
adata.write_h5ad(dir + 'integrated_anno.h5ad')
详细注释结果如下:
绘制C、D图
首先对细胞进行计数:
# adata = sc.read_h5ad('integrated.h5ad')
adata.obs.Sample.unique().tolist()
# 添加一列细胞的样本来源
def map_condition(x):
if 'cov' in x:
return 'COVID19'
else:
return 'Control'
adata.obs['condition'] = adata.obs.Sample.map(map_condition)
adata.obs
结果图:
# 每个样本中的细胞数统计
num_tot_cells = adata.obs.groupby(['Sample']).count()
num_tot_cells = dict(zip(num_tot_cells.index, num_tot_cells.doublet))
num_tot_cells
# {'C51ctr': 10934, 'C52ctr': 3967, 'C53ctr': 6076, 'C54ctr': 3860, 'C55ctr': 5110, 'C56ctr': 3609, 'C57ctr': 4307, 'L01cov': 2737,'L03cov': 3604, 'L04cov': 3056, 'L04covaddon': 4073, 'L05cov': 2435, 'L06cov': 5657, 'L07cov': 4217, 'L08cov': 3473, 'L09cov': 3075, 'L10cov': 2713, 'L11cov': 2531, 'L12cov': 3259, 'L13cov': 4244, 'L15cov': 3516, 'L16cov': 1600, 'L17cov': 3937, 'L18cov': 2392, 'L19cov': 2135, 'L21cov': 2961, 'L22cov': 5786}
cell_type_counts = adata.obs.groupby(['Sample', 'condition', 'cell type']).count()
cell_type_counts = cell_type_counts[cell_type_counts.sum(axis = 1) > 0].reset_index()
cell_type_counts = cell_type_counts[cell_type_counts.columns[0:4]]
cell_type_counts
# Sample condition cell type doublet
# 0 C51ctr Control B Cell 70
# 1 C51ctr Control Endothelial 1153
# 2 C51ctr Control Epithelial Cell 3711
# 3 C51ctr Control Fibroblast 1497
# 4 C51ctr Control Mast Cell 290
# .. ... ... ... ...
# 246 L22cov COVID19 Mast Cell 7
# 247 L22cov COVID19 Myeloid Cell 1663
# 248 L22cov COVID19 Neuronal Cell 83
# 249 L22cov COVID19 T Cell 1379
# 250 L22cov COVID19 pDC 510
# [251 rows x 4 columns]
# 计算频率
cell_type_counts['total_cells'] = cell_type_counts.Sample.map(num_tot_cells).astype(int)
cell_type_counts['frequency'] = cell_type_counts.doublet / cell_type_counts.total_cells
cell_type_counts
每种细胞类型在COVID19与control两种条件下的频率差异:
绘图:
plt.figure(figsize = (10,4))
ax = sns.boxplot(data = cell_type_counts, x = 'cell type', y = 'frequency', hue = 'condition')
plt.xticks(rotation = 35, rotation_mode = 'anchor', ha = 'right')
plt.tight_layout()
plt.savefig(dir+"05-intergration_Fig1D.png", dpi=300)
结果图如下:
C图如下:
sc.pl.umap(adata, color = ['cell type_sub','condition'], frameon = False, legend_loc = "on data")
plt.savefig(dir+"05-intergration_Fig1B_C.png", dpi=300)
结果图如下:
下期见~