ImmuCellAI | 免疫浸润计算工具 R包学习

ImmuCellAI

ImmuCellAI(Immune Cell Abundance Identifier) 是一个从基因表达数据集(包括RNA-Seq和芯片数据)中估计24种免疫细胞丰度的工具,其中24种免疫细胞由18种T细胞亚型和6种其他免疫细胞组成。B细胞、NK细胞、单核细胞、巨噬细胞、中性粒细胞和DC细胞。
此外,ImmuCellAI可用于估计不同人群免疫细胞浸润的差异,以及预测患者对免疫检查点阻断疗法的反应。同时拥有人和小鼠版本。
人版本: ImmuCellAI (Immune Cell Abundance Identifier)
文章链接:ImmuCellAI: A Unique Method for Comprehensive T-Cell Subsets Abundance Prediction and its Application in Cancer Immunotherapy
ImmuCellAI-mouse (Immune Cell Abundance Identifier for mouse) ImmuCellAI (Advanced Science 2020 , ESI high citation paper) 的小鼠版本。ImmuCellAI-mouse是一个基于RNA-Seq或芯片数据的基因表达谱估计36种免疫细胞丰度的工具。
鼠版本: ImmuCellAI-mouse (Immune Cell Abundance Identifier for mouse)
文章链接:ImmuCellAI-mouse: a tool for comprehensive prediction of mouse immune cell abundance and immune microenvironment depiction

数据分析

在线分析

在线分析的过程非常简单,具体过程可以参考此处。

R脚本分析

在今年6月,ImmuCellAI的R包可在gitlab(human/mouse)上获取了。这里我们用小鼠版本进行举例,来简单介绍一下R包的用法,及其中可能出现的一些bug和解决办法。
值得注意的是:

  • 这个算法支持RNAseq 和 Microarray 通过参数进行设置
  • 上传的数据应该是log2后的数据,与Cibersort不同的是,ImmCellAI不会检查数据是否进行了log2处理。
#首先从github上安装包
install.packages("devtools")
library(devtools)
install_github("lydiaMyr/ImmuCellAI-mouse@main")
#if the "/bin/gtar: not found" error occured, 
#please run the following command "export TAR="/bin/tar" before installation.
> library(ImmuCellAImouse)
#查看一下内置的数据集
> data(package="ImmuCellAImouse")

ImmuCellAI_mouse_example
l1_cell_correction_matrix_new
l1_marker     
l2_cell_correction_matrix_new           
l2_marker     
l3_cell_correction_matrix_new           
l3_marker     
marker_exp    
#读数示例数据集
> data(ImmuCellAI_mouse_example,package = "ImmuCellAImouse")
> #查看数据集,跟一般的矩阵相比多了一行分组信息.这一行可有可无.
> head(ImmuCellAI_mouse_example)
             ID  GSM4285758  GSM4285759  GSM4285760  GSM4285761  GSM4285762  GSM4285763
1         Group      group1      group1      group1      group1      group1      group1
2 0610005C13Rik 0.626444546 1.451541323           0 0.432976388 0.327233472           0
3 0610007P14Rik 12.42246496 14.55469492  14.9259117 8.440200134 12.46177116 13.53301625
4 0610009B22Rik 33.06310717 29.40184679 32.50339237 50.14189624 51.57312435  31.7714277
5 0610009E02Rik           0           0 1.609775505 2.396453283 1.114097077           0
6 0610009L18Rik 8.972534232 0.809856006 7.525532084 4.337161921           0 9.164682133
   GSM4285764  GSM4285770  GSM4285771  GSM4285772
1      group1      group2      group2      group2
2           0           0           0           0
3 9.566787233 21.86674662 17.19127214 27.46170734
4 20.70347721 33.18211055 55.01434465 17.68994492
5           0 0.907760087           0           0
6 9.215094219 8.652294423 30.64033231 26.63737635
> #设置一下行名
> rownames(ImmuCellAI_mouse_example) <- ImmuCellAI_mouse_example[,1]
> ImmuCellAI_mouse_example <- ImmuCellAI_mouse_example[,-1]
> head(ImmuCellAI_mouse_example)
               GSM4285758  GSM4285759  GSM4285760  GSM4285761  GSM4285762  GSM4285763
Group              group1      group1      group1      group1      group1      group1
0610005C13Rik 0.626444546 1.451541323           0 0.432976388 0.327233472           0
0610007P14Rik 12.42246496 14.55469492  14.9259117 8.440200134 12.46177116 13.53301625
0610009B22Rik 33.06310717 29.40184679 32.50339237 50.14189624 51.57312435  31.7714277
0610009E02Rik           0           0 1.609775505 2.396453283 1.114097077           0
0610009L18Rik 8.972534232 0.809856006 7.525532084 4.337161921           0 9.164682133
               GSM4285764  GSM4285770  GSM4285771  GSM4285772
Group              group1      group2      group2      group2
0610005C13Rik           0           0           0           0
0610007P14Rik 9.566787233 21.86674662 17.19127214 27.46170734
0610009B22Rik 20.70347721 33.18211055 55.01434465 17.68994492
0610009E02Rik           0 0.907760087           0           0
0610009L18Rik 9.215094219 8.652294423 30.64033231 26.63737635
> #运行分析函数
> > test <- ImmuCellAI_mouse(sample =ImmuCellAI_mouse_example,
+                          data_type = "rnaseq",#数据类型,可选"rnaseq"/"microarray",即你输入的数据类型
+                          group_tag = 1,#是否有分组信息,如果没有则填"0"
+                          customer=FALSE)# 是否有自行上传的参考文件,有"1"无"0",一般来说不用上传
Estimating ssGSEA scores for 7 gene sets.
  |==========================================================| 100%

Estimating ssGSEA scores for 10 gene sets.
  |==========================================================| 100%

Estimating ssGSEA scores for 10 gene sets.
  |==========================================================| 100%

Estimating ssGSEA scores for 10 gene sets.
  |==========================================================| 100%

Estimating ssGSEA scores for 10 gene sets.
  |==========================================================| 100%

Estimating ssGSEA scores for 10 gene sets.
  |==========================================================| 100%

Estimating ssGSEA scores for 7 gene sets.
  |==========================================================| 100%

Estimating ssGSEA scores for 20 gene sets.
  |==========================================================| 100%

Estimating ssGSEA scores for 20 gene sets.
  |==========================================================| 100%

Estimating ssGSEA scores for 9 gene sets.
  |==========================================================| 100%

Estimating ssGSEA scores for 9 gene sets.
  |==========================================================| 100%

There were 18 warnings (use warnings() to see them)
> #查看结果
> names(test)#输出了两个结果,"abundance"是丰度结果,
> # "group_result"是添加了分组信息的结果,如果 group_tag = 0,则该list为NULL
[1] "abundance"    "group_result"  
> #丰度信息  
> head(test$abundance)
           B_cell Dendritic_cells Granulocytes Macrophage Monocytes     NK T_cell CD4_T_cell CD8_T_cell    NKT    Tgd
GSM4285758 0.0537          0.0206       0.1798     0.2968    0.1693 0.1165 0.1632     0.0549     0.0767 0.0295 0.0022
GSM4285759 0.0516          0.1143       0.1426     0.2846    0.2007 0.1072 0.0991     0.0000     0.0631 0.0029 0.0330
GSM4285760 0.0776          0.0592       0.1338     0.2834    0.1969 0.1656 0.0835     0.0375     0.0218 0.0148 0.0093
GSM4285761 0.1253          0.0152       0.1369     0.3335    0.1753 0.1061 0.1076     0.0019     0.0545 0.0318 0.0193
GSM4285762 0.0892          0.0759       0.1686     0.3036    0.1252 0.1695 0.0680     0.0282     0.0077 0.0067 0.0254
GSM4285763 0.1168          0.0107       0.1964     0.3320    0.1619 0.0873 0.0949     0.0294     0.0000 0.0000 0.0655
           B1_cell Follicular_B Germinal_center_B Marginal_Zone_B Memory_B Plasma_cell   cDC1   cDC2   MoDC    pDC
GSM4285758  0.0096       0.0239            0.0000          0.0000   0.0038      0.0164 0.0063 0.0030 0.0070 0.0043
GSM4285759  0.0000       0.0080            0.0000          0.0104   0.0122      0.0210 0.0368 0.0272 0.0283 0.0220
GSM4285760  0.0215       0.0071            0.0000          0.0037   0.0215      0.0238 0.0174 0.0138 0.0180 0.0099
GSM4285761  0.0000       0.0331            0.0000          0.0482   0.0220      0.0220 0.0045 0.0039 0.0034 0.0034
GSM4285762  0.0107       0.0128            0.0067          0.0146   0.0135      0.0309 0.0261 0.0173 0.0208 0.0117
GSM4285763  0.0266       0.0145            0.0000          0.0252   0.0180      0.0326 0.0032 0.0021 0.0036 0.0018
           Basophil Eosinophil mast_cell Neutrophils M1_macrophage M2_macrophage CD4_Tm Naive_CD4_T T_helper_cell   Treg
GSM4285758   0.0310     0.0539    0.0462      0.0487        0.1939        0.1029 0.0035      0.0406        0.0108 0.0000
GSM4285759   0.0130     0.0409    0.0407      0.0480        0.2141        0.0705 0.0000      0.0000        0.0000 0.0000
GSM4285760   0.0085     0.0496    0.0339      0.0419        0.1906        0.0928 0.0189      0.0126        0.0039 0.0022
GSM4285761   0.0161     0.0495    0.0300      0.0413        0.2710        0.0625 0.0003      0.0010        0.0004 0.0003
GSM4285762   0.0235     0.0526    0.0441      0.0483        0.1897        0.1138 0.0070      0.0201        0.0000 0.0010
GSM4285763   0.0101     0.0656    0.0538      0.0669        0.2230        0.1090 0.0102      0.0160        0.0016 0.0016
           CD8_Tc CD8_Tcm CD8_Tem CD8_Tex Naive_CD8_T Infiltration_score
GSM4285758 0.0108  0.0175  0.0143  0.0122      0.0219              1.237
GSM4285759 0.0135  0.0127  0.0115  0.0098      0.0156              1.661
GSM4285760 0.0041  0.0063  0.0031  0.0011      0.0072              1.188
GSM4285761 0.0127  0.0114  0.0074  0.0090      0.0140              1.284
GSM4285762 0.0020  0.0019  0.0009  0.0005      0.0024              1.418
GSM4285763 0.0000  0.0000  0.0000  0.0000      0.0000              1.096
> #分组信息
> head(test$group_result)
        B_cell Dendritic_cells Granulocytes Macrophage Monocytes     NK T_cell CD4_T_cell CD8_T_cell    NKT    Tgd B1_cell
group1  0.0776          0.0592       0.1453     0.2968    0.1753 0.1165 0.0949     0.0282     0.0545 0.0116 0.0193  0.0096
group2  0.2330          0.1938       0.0066     0.0653    0.2050 0.2682 0.0293     0.0003     0.0068 0.0050 0.0126  0.0000
p value 0.0200          0.0200       0.0200     0.0200    0.2700 0.0200 0.0200     0.2500     0.1800 0.5200 0.6700  0.6300
        Follicular_B Germinal_center_B Marginal_Zone_B Memory_B Plasma_cell   cDC1   cDC2   MoDC    pDC Basophil
group1        0.0128            0.0000          0.0104   0.0135      0.0220 0.0174 0.0138 0.0180 0.0099   0.0130
group2        0.0670            0.0271          0.0168   0.0848      0.0323 0.0579 0.0449 0.0378 0.0555   0.0002
p value       0.0200            0.1600          0.6700   0.0200      0.5200 0.0200 0.0200 0.1200 0.0200   0.0200
        Eosinophil mast_cell Neutrophils M1_macrophage M2_macrophage CD4_Tm Naive_CD4_T T_helper_cell   Treg CD8_Tc
group1      0.0496    0.0407      0.0483        0.2141        0.0928 0.0035      0.0126        0.0004 0.0003 0.0108
group2      0.0033    0.0023      0.0008        0.0497        0.0156 0.0002      0.0001        0.0001 0.0000 0.0018
p value     0.0200    0.0200      0.0200        0.0200        0.0200 0.3600      0.2100        0.9100 0.4800 0.1800
        CD8_Tcm CD8_Tem CD8_Tex Naive_CD8_T Infiltration_score
group1   0.0090  0.0074  0.0068      0.0140              1.255
group2   0.0019  0.0003  0.0006      0.0019              1.647
p value  0.2100  0.1100  0.2700      0.1700              0.020               

得到结果之后是热图啊,还是箱线图啊,都可以根据个人的具体需求进行绘制.在此暂且不表。

可能存在的Bug

  • 如果直接读取示例数据但是没有设定行名的话,会出现bug。例如:
> data(ImmuCellAI_mouse_example,package = "ImmuCellAImouse")
> test <- ImmuCellAI_mouse(sample =ImmuCellAI_mouse_example,
+                          data_type = "rnaseq",#数据类型,可选"rnaseq"/"microarray",即你输入的数据类型
+                          group_tag = 1,#是否有分组信息,如果没有则填"0"
+                          customer=FALSE)# 是否有自行上传的参考文件,有"1"无"0",一般来说不用上传
Estimating ssGSEA scores for 7 gene sets.
  |===================================================================| 100%

Error in dimnames(x) <- dn : 
  length of 'dimnames' [2] not equal to array extent
In addition: Warning message:
In apply(sample, 2, as.numeric) : NAs introduced by coercion
> #按照标准格式设置行名即可解决
  • 当样本名中含有特殊符号时,也会出现无法匹配的错误,例如:
> head(colnames(mat))#样本名中含有特殊符号
[1] "AABBCCD/AABBCCD/DDFFEE-CD8(AB+R650,1:50);MLH1(AB+R555,1:50)KSGEASDFAEG-SAG_001_CD45"
[2] "AABBCCD/AABBCCD/DDFFEE-CD8(AB+R650,1:50);MLH1(AB+R555,1:50)KSGEASDFAEG-SAG_001_CD8" 
[3] "AABBCCD/AABBCCD/DDFFEE-CD8(AB+R650,1:50);MLH1(AB+R555,1:50)KSGEASDFAEG-SAG_001_TC"  
[4] "AABBCCD/AABBCCD/DDFFEE-CD8(AB+R650,1:50);MLH1(AB+R555,1:50)KSGEASDFAEG-SAG_002_CD45"
[5] "AABBCCD/AABBCCD/DDFFEE-CD8(AB+R650,1:50);MLH1(AB+R555,1:50)KSGEASDFAEG-SAG_002_CD8" 
[6] "AABBCCD/AABBCCD/DDFFEE-CD8(AB+R650,1:50);MLH1(AB+R555,1:50)KSGEASDFAEG-SAG_002_TC"  
> test <- ImmuCellAI_mouse(sample = mat,
+                          data_type = "rnaseq",#数据类型,可选"rnaseq"/"microarray",即你输入的数据类型
+                          group_tag = 0,#是否有分组信息,如果没有则填"0"
+                          customer=FALSE)# 是否有自行上传的参考文件,有"1"无"0",一般来说不用上传
Estimating ssGSEA scores for 7 gene sets.
  |========================================================| 100%

Error in .mapGeneSetsToFeatures(gset.idx.list, rownames(expr)) : 
  No identifiers in the gene sets could be matched to the identifiers in the expression data.
In addition: Warning message:
In .filterFeatures(expr, method) :
  1152 genes with constant expression values throuhgout the samples.
Called from: .mapGeneSetsToFeatures(gset.idx.list, rownames(expr))
Browse[1]> Q
> # 这是由于源码中grep()这个函数错误识别了样本名中的特殊符号造成的
> colnames(infil_exp) <- "Ratio"
    for (sam in colnames(immune_deviation_sample)) {
      infil_marker_cell[[sam]] <- row.names(infil_exp)[grep(sam, 
                                                            row.names(infil_exp))]
    }#所以,应当才此处添加一个参数"fixed = TRUE ",如果为TRUE,则pattern是要按原样匹配的字符串
     #grep()函数细节可以通过 ?grep 查看
colnames(infil_exp) <- "Ratio"
    for (sam in colnames(immune_deviation_sample)) {
      infil_marker_cell[[sam]] <- row.names(infil_exp)[grep(sam, 
                                                            row.names(infil_exp),fixed = TRUE)]
# 添加完成后重新加载"ImmuCellAI_mouse"(此时将其视作自定义函数)

你可能感兴趣的:(r语言,免疫浸润,ssGSEA,ImmCellAI,小鼠)