ImmuCellAI(Immune Cell Abundance Identifier) 是一个从基因表达数据集(包括RNA-Seq和芯片数据)中估计24种免疫细胞丰度的工具,其中24种免疫细胞由18种T细胞亚型和6种其他免疫细胞组成。B细胞、NK细胞、单核细胞、巨噬细胞、中性粒细胞和DC细胞。
此外,ImmuCellAI可用于估计不同人群免疫细胞浸润的差异,以及预测患者对免疫检查点阻断疗法的反应。同时拥有人和小鼠版本。
人版本: ImmuCellAI (Immune Cell Abundance Identifier)
文章链接:ImmuCellAI: A Unique Method for Comprehensive T-Cell Subsets Abundance Prediction and its Application in Cancer Immunotherapy
ImmuCellAI-mouse (Immune Cell Abundance Identifier for mouse) ImmuCellAI (Advanced Science 2020 , ESI high citation paper) 的小鼠版本。ImmuCellAI-mouse是一个基于RNA-Seq或芯片数据的基因表达谱估计36种免疫细胞丰度的工具。
鼠版本: ImmuCellAI-mouse (Immune Cell Abundance Identifier for mouse)
文章链接:ImmuCellAI-mouse: a tool for comprehensive prediction of mouse immune cell abundance and immune microenvironment depiction
在线分析的过程非常简单,具体过程可以参考此处。
在今年6月,ImmuCellAI的R包可在gitlab(human/mouse)上获取了。这里我们用小鼠版本进行举例,来简单介绍一下R包的用法,及其中可能出现的一些bug和解决办法。
值得注意的是:
#首先从github上安装包
install.packages("devtools")
library(devtools)
install_github("lydiaMyr/ImmuCellAI-mouse@main")
#if the "/bin/gtar: not found" error occured,
#please run the following command "export TAR="/bin/tar" before installation.
> library(ImmuCellAImouse)
#查看一下内置的数据集
> data(package="ImmuCellAImouse")
ImmuCellAI_mouse_example
l1_cell_correction_matrix_new
l1_marker
l2_cell_correction_matrix_new
l2_marker
l3_cell_correction_matrix_new
l3_marker
marker_exp
#读数示例数据集
> data(ImmuCellAI_mouse_example,package = "ImmuCellAImouse")
> #查看数据集,跟一般的矩阵相比多了一行分组信息.这一行可有可无.
> head(ImmuCellAI_mouse_example)
ID GSM4285758 GSM4285759 GSM4285760 GSM4285761 GSM4285762 GSM4285763
1 Group group1 group1 group1 group1 group1 group1
2 0610005C13Rik 0.626444546 1.451541323 0 0.432976388 0.327233472 0
3 0610007P14Rik 12.42246496 14.55469492 14.9259117 8.440200134 12.46177116 13.53301625
4 0610009B22Rik 33.06310717 29.40184679 32.50339237 50.14189624 51.57312435 31.7714277
5 0610009E02Rik 0 0 1.609775505 2.396453283 1.114097077 0
6 0610009L18Rik 8.972534232 0.809856006 7.525532084 4.337161921 0 9.164682133
GSM4285764 GSM4285770 GSM4285771 GSM4285772
1 group1 group2 group2 group2
2 0 0 0 0
3 9.566787233 21.86674662 17.19127214 27.46170734
4 20.70347721 33.18211055 55.01434465 17.68994492
5 0 0.907760087 0 0
6 9.215094219 8.652294423 30.64033231 26.63737635
> #设置一下行名
> rownames(ImmuCellAI_mouse_example) <- ImmuCellAI_mouse_example[,1]
> ImmuCellAI_mouse_example <- ImmuCellAI_mouse_example[,-1]
> head(ImmuCellAI_mouse_example)
GSM4285758 GSM4285759 GSM4285760 GSM4285761 GSM4285762 GSM4285763
Group group1 group1 group1 group1 group1 group1
0610005C13Rik 0.626444546 1.451541323 0 0.432976388 0.327233472 0
0610007P14Rik 12.42246496 14.55469492 14.9259117 8.440200134 12.46177116 13.53301625
0610009B22Rik 33.06310717 29.40184679 32.50339237 50.14189624 51.57312435 31.7714277
0610009E02Rik 0 0 1.609775505 2.396453283 1.114097077 0
0610009L18Rik 8.972534232 0.809856006 7.525532084 4.337161921 0 9.164682133
GSM4285764 GSM4285770 GSM4285771 GSM4285772
Group group1 group2 group2 group2
0610005C13Rik 0 0 0 0
0610007P14Rik 9.566787233 21.86674662 17.19127214 27.46170734
0610009B22Rik 20.70347721 33.18211055 55.01434465 17.68994492
0610009E02Rik 0 0.907760087 0 0
0610009L18Rik 9.215094219 8.652294423 30.64033231 26.63737635
> #运行分析函数
> > test <- ImmuCellAI_mouse(sample =ImmuCellAI_mouse_example,
+ data_type = "rnaseq",#数据类型,可选"rnaseq"/"microarray",即你输入的数据类型
+ group_tag = 1,#是否有分组信息,如果没有则填"0"
+ customer=FALSE)# 是否有自行上传的参考文件,有"1"无"0",一般来说不用上传
Estimating ssGSEA scores for 7 gene sets.
|==========================================================| 100%
Estimating ssGSEA scores for 10 gene sets.
|==========================================================| 100%
Estimating ssGSEA scores for 10 gene sets.
|==========================================================| 100%
Estimating ssGSEA scores for 10 gene sets.
|==========================================================| 100%
Estimating ssGSEA scores for 10 gene sets.
|==========================================================| 100%
Estimating ssGSEA scores for 10 gene sets.
|==========================================================| 100%
Estimating ssGSEA scores for 7 gene sets.
|==========================================================| 100%
Estimating ssGSEA scores for 20 gene sets.
|==========================================================| 100%
Estimating ssGSEA scores for 20 gene sets.
|==========================================================| 100%
Estimating ssGSEA scores for 9 gene sets.
|==========================================================| 100%
Estimating ssGSEA scores for 9 gene sets.
|==========================================================| 100%
There were 18 warnings (use warnings() to see them)
> #查看结果
> names(test)#输出了两个结果,"abundance"是丰度结果,
> # "group_result"是添加了分组信息的结果,如果 group_tag = 0,则该list为NULL
[1] "abundance" "group_result"
> #丰度信息
> head(test$abundance)
B_cell Dendritic_cells Granulocytes Macrophage Monocytes NK T_cell CD4_T_cell CD8_T_cell NKT Tgd
GSM4285758 0.0537 0.0206 0.1798 0.2968 0.1693 0.1165 0.1632 0.0549 0.0767 0.0295 0.0022
GSM4285759 0.0516 0.1143 0.1426 0.2846 0.2007 0.1072 0.0991 0.0000 0.0631 0.0029 0.0330
GSM4285760 0.0776 0.0592 0.1338 0.2834 0.1969 0.1656 0.0835 0.0375 0.0218 0.0148 0.0093
GSM4285761 0.1253 0.0152 0.1369 0.3335 0.1753 0.1061 0.1076 0.0019 0.0545 0.0318 0.0193
GSM4285762 0.0892 0.0759 0.1686 0.3036 0.1252 0.1695 0.0680 0.0282 0.0077 0.0067 0.0254
GSM4285763 0.1168 0.0107 0.1964 0.3320 0.1619 0.0873 0.0949 0.0294 0.0000 0.0000 0.0655
B1_cell Follicular_B Germinal_center_B Marginal_Zone_B Memory_B Plasma_cell cDC1 cDC2 MoDC pDC
GSM4285758 0.0096 0.0239 0.0000 0.0000 0.0038 0.0164 0.0063 0.0030 0.0070 0.0043
GSM4285759 0.0000 0.0080 0.0000 0.0104 0.0122 0.0210 0.0368 0.0272 0.0283 0.0220
GSM4285760 0.0215 0.0071 0.0000 0.0037 0.0215 0.0238 0.0174 0.0138 0.0180 0.0099
GSM4285761 0.0000 0.0331 0.0000 0.0482 0.0220 0.0220 0.0045 0.0039 0.0034 0.0034
GSM4285762 0.0107 0.0128 0.0067 0.0146 0.0135 0.0309 0.0261 0.0173 0.0208 0.0117
GSM4285763 0.0266 0.0145 0.0000 0.0252 0.0180 0.0326 0.0032 0.0021 0.0036 0.0018
Basophil Eosinophil mast_cell Neutrophils M1_macrophage M2_macrophage CD4_Tm Naive_CD4_T T_helper_cell Treg
GSM4285758 0.0310 0.0539 0.0462 0.0487 0.1939 0.1029 0.0035 0.0406 0.0108 0.0000
GSM4285759 0.0130 0.0409 0.0407 0.0480 0.2141 0.0705 0.0000 0.0000 0.0000 0.0000
GSM4285760 0.0085 0.0496 0.0339 0.0419 0.1906 0.0928 0.0189 0.0126 0.0039 0.0022
GSM4285761 0.0161 0.0495 0.0300 0.0413 0.2710 0.0625 0.0003 0.0010 0.0004 0.0003
GSM4285762 0.0235 0.0526 0.0441 0.0483 0.1897 0.1138 0.0070 0.0201 0.0000 0.0010
GSM4285763 0.0101 0.0656 0.0538 0.0669 0.2230 0.1090 0.0102 0.0160 0.0016 0.0016
CD8_Tc CD8_Tcm CD8_Tem CD8_Tex Naive_CD8_T Infiltration_score
GSM4285758 0.0108 0.0175 0.0143 0.0122 0.0219 1.237
GSM4285759 0.0135 0.0127 0.0115 0.0098 0.0156 1.661
GSM4285760 0.0041 0.0063 0.0031 0.0011 0.0072 1.188
GSM4285761 0.0127 0.0114 0.0074 0.0090 0.0140 1.284
GSM4285762 0.0020 0.0019 0.0009 0.0005 0.0024 1.418
GSM4285763 0.0000 0.0000 0.0000 0.0000 0.0000 1.096
> #分组信息
> head(test$group_result)
B_cell Dendritic_cells Granulocytes Macrophage Monocytes NK T_cell CD4_T_cell CD8_T_cell NKT Tgd B1_cell
group1 0.0776 0.0592 0.1453 0.2968 0.1753 0.1165 0.0949 0.0282 0.0545 0.0116 0.0193 0.0096
group2 0.2330 0.1938 0.0066 0.0653 0.2050 0.2682 0.0293 0.0003 0.0068 0.0050 0.0126 0.0000
p value 0.0200 0.0200 0.0200 0.0200 0.2700 0.0200 0.0200 0.2500 0.1800 0.5200 0.6700 0.6300
Follicular_B Germinal_center_B Marginal_Zone_B Memory_B Plasma_cell cDC1 cDC2 MoDC pDC Basophil
group1 0.0128 0.0000 0.0104 0.0135 0.0220 0.0174 0.0138 0.0180 0.0099 0.0130
group2 0.0670 0.0271 0.0168 0.0848 0.0323 0.0579 0.0449 0.0378 0.0555 0.0002
p value 0.0200 0.1600 0.6700 0.0200 0.5200 0.0200 0.0200 0.1200 0.0200 0.0200
Eosinophil mast_cell Neutrophils M1_macrophage M2_macrophage CD4_Tm Naive_CD4_T T_helper_cell Treg CD8_Tc
group1 0.0496 0.0407 0.0483 0.2141 0.0928 0.0035 0.0126 0.0004 0.0003 0.0108
group2 0.0033 0.0023 0.0008 0.0497 0.0156 0.0002 0.0001 0.0001 0.0000 0.0018
p value 0.0200 0.0200 0.0200 0.0200 0.0200 0.3600 0.2100 0.9100 0.4800 0.1800
CD8_Tcm CD8_Tem CD8_Tex Naive_CD8_T Infiltration_score
group1 0.0090 0.0074 0.0068 0.0140 1.255
group2 0.0019 0.0003 0.0006 0.0019 1.647
p value 0.2100 0.1100 0.2700 0.1700 0.020
得到结果之后是热图啊,还是箱线图啊,都可以根据个人的具体需求进行绘制.在此暂且不表。
> data(ImmuCellAI_mouse_example,package = "ImmuCellAImouse")
> test <- ImmuCellAI_mouse(sample =ImmuCellAI_mouse_example,
+ data_type = "rnaseq",#数据类型,可选"rnaseq"/"microarray",即你输入的数据类型
+ group_tag = 1,#是否有分组信息,如果没有则填"0"
+ customer=FALSE)# 是否有自行上传的参考文件,有"1"无"0",一般来说不用上传
Estimating ssGSEA scores for 7 gene sets.
|===================================================================| 100%
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
In addition: Warning message:
In apply(sample, 2, as.numeric) : NAs introduced by coercion
> #按照标准格式设置行名即可解决
> head(colnames(mat))#样本名中含有特殊符号
[1] "AABBCCD/AABBCCD/DDFFEE-CD8(AB+R650,1:50);MLH1(AB+R555,1:50)KSGEASDFAEG-SAG_001_CD45"
[2] "AABBCCD/AABBCCD/DDFFEE-CD8(AB+R650,1:50);MLH1(AB+R555,1:50)KSGEASDFAEG-SAG_001_CD8"
[3] "AABBCCD/AABBCCD/DDFFEE-CD8(AB+R650,1:50);MLH1(AB+R555,1:50)KSGEASDFAEG-SAG_001_TC"
[4] "AABBCCD/AABBCCD/DDFFEE-CD8(AB+R650,1:50);MLH1(AB+R555,1:50)KSGEASDFAEG-SAG_002_CD45"
[5] "AABBCCD/AABBCCD/DDFFEE-CD8(AB+R650,1:50);MLH1(AB+R555,1:50)KSGEASDFAEG-SAG_002_CD8"
[6] "AABBCCD/AABBCCD/DDFFEE-CD8(AB+R650,1:50);MLH1(AB+R555,1:50)KSGEASDFAEG-SAG_002_TC"
> test <- ImmuCellAI_mouse(sample = mat,
+ data_type = "rnaseq",#数据类型,可选"rnaseq"/"microarray",即你输入的数据类型
+ group_tag = 0,#是否有分组信息,如果没有则填"0"
+ customer=FALSE)# 是否有自行上传的参考文件,有"1"无"0",一般来说不用上传
Estimating ssGSEA scores for 7 gene sets.
|========================================================| 100%
Error in .mapGeneSetsToFeatures(gset.idx.list, rownames(expr)) :
No identifiers in the gene sets could be matched to the identifiers in the expression data.
In addition: Warning message:
In .filterFeatures(expr, method) :
1152 genes with constant expression values throuhgout the samples.
Called from: .mapGeneSetsToFeatures(gset.idx.list, rownames(expr))
Browse[1]> Q
> # 这是由于源码中grep()这个函数错误识别了样本名中的特殊符号造成的
> colnames(infil_exp) <- "Ratio"
for (sam in colnames(immune_deviation_sample)) {
infil_marker_cell[[sam]] <- row.names(infil_exp)[grep(sam,
row.names(infil_exp))]
}#所以,应当才此处添加一个参数"fixed = TRUE ",如果为TRUE,则pattern是要按原样匹配的字符串
#grep()函数细节可以通过 ?grep 查看
colnames(infil_exp) <- "Ratio"
for (sam in colnames(immune_deviation_sample)) {
infil_marker_cell[[sam]] <- row.names(infil_exp)[grep(sam,
row.names(infil_exp),fixed = TRUE)]
# 添加完成后重新加载"ImmuCellAI_mouse"(此时将其视作自定义函数)