R绘图|pheatmap热图绘制——高阶篇

绘图场景:在高通量检测数据中,我们在数据展示上往往都喜欢将感兴趣的功能或者通路基因绘制成热图形式,能够让读者一目了然的发现某功能是增强了?还是减弱了?在实际过程中,表型实验上已经证明了炎症反应的走向,从初期增强到后期减弱。但在对应的高通量数据中却不是如我们所想,趋势很凌乱,很难说炎症这个过程的增强减弱。因此在拿到数据后,需要有选择性的筛选和展示。今天就以项目为例,记录整个绘制过程。

1. 对促炎数据做聚类分析并提取所需信息

首先清除环境,安装并加载所需要的R包

rm(list = ls()) #清除环境内存
#install.packages("pheatmap")  #安装pheatmap包
#install.packages("readxl")  #安装readxl包
#install.packages("ggplot2") #安装ggplot2包
#install.packages(c('officer', 'rvg', 'flextable', 'stargazer', 'broom' ))#export的依赖包
#下载export,链接https://cran.r-project.org/src/contrib/Archive/export/export_0.2.2.tar.gz
#install.packages("D:/R-3.6.2/library/export_0.2.2.tar.gz", repos = NULL)

library(ggplot2)
library(pheatmap) #加载pheatmap包
library(readxl)  #加载readxl包
library(export) #用于输出不同格式的结果

读入促炎数据并对数据做简单处理

data_pos<-read_excel("demo3.xlsx",1)    #读入excel数据
data_pos<-as.data.frame(data_pos) # 将data转换为data.frame格式
rownames(data_pos)<-data_pos[,1]  #将第一列数据设置为行名
data_pos<-data_pos[,-1] #去除重复的第一列
colnames(data_pos) = gsub("_expression","",colnames(data_pos))
head(data_pos)
##               12h       24h        3d       7d
## Il1rl1  1.0475045 0.9948235 1.0268230 1.010622
## Tnfsf4  0.9327795 1.0742431 0.9676329 1.010017
## Tnfsf18 0.8085097 0.8327313 1.0099118 1.020472
## Hspd1   0.9905324 0.9932249 0.9839949 1.002263
## Pla2g4a 1.0225494 0.9872852 1.0146842 1.002581
## Xcl1    0.9028666 0.9687647 0.9989017 1.065237

绘图查看基因聚类情况

A<-pheatmap(data_pos,fontsize_col=12,fontsize_row=8,
         show_rownames=F,show_colnames = T,
         cluster_row=T,cluster_col=F,scale="row",
         treeheight_row=25,treeheight_col=25,
         border_color=NA,main="Heatmap",
         color = colorRampPalette(c("blue", "white", "red"))(50),
         cutree_rows=3)  # 画图
image
ggsave(A,filename = "demo3_pos.pdf",width=3,height=5)
ggsave(A,filename = "demo3_pos.png",width=3,height=5)

从上图发现,基因按照表达模式可分为3大类,其中1和2类是想要的数据。
那如何提取这部分数据呢?

首先将表达数据与聚类分类信息合并

clus_a=cutree(A$tree_row,3) #获取按照3块,得到聚类模块信息
table(clus_a)# 每个聚类模块的数量
## clus_a
##  1  2  3
## 12 17 31
clus_a<-as.data.frame(clus_a)#转化为data.frame格式
data_pos= cbind(data_pos, clus_a)# 按列合并数据
data_pos #查看合并后数据
##                12h       24h        3d        7d clus_a
## Il1rl1   1.0475045 0.9948235 1.0268230 1.0106219      1
## Tnfsf4   0.9327795 1.0742431 0.9676329 1.0100173      2
## Tnfsf18  0.8085097 0.8327313 1.0099118 1.0204721      3
## Hspd1    0.9905324 0.9932249 0.9839949 1.0022628      3
## Pla2g4a  1.0225494 0.9872852 1.0146842 1.0025812      1
## Xcl1     0.9028666 0.9687647 0.9989017 1.0652369      3
## Tgm2     0.9719927 0.9908516 0.9937439 1.0165441      3
## S100a8   1.1048567 0.9885955 1.1913848 0.9661311      1
## Ctss     1.0075379 0.9725514 1.0238447 1.0097870      3
## Adora3   0.8768417 0.9808549 0.9766739 1.0384855      3
## Fabp4    1.0056085 1.0287909 0.9618795 0.9613461      2
## Tlr2     0.9925440 1.0575861 0.9961778 0.9766016      2
## S100a9   0.9364502 0.9293324 0.9946467 1.0106743      3
## Tlr4     1.0503438 1.0367557 1.0205949 0.9283988      1
## Zp3      0.8793431 0.9968576 1.0090100 1.0340921      3
## Clock    1.0138973 1.0231884 1.0113094 0.9766799      2
## Trpv4    0.9513371 0.9818378 1.0161012 1.0326306      3
## Ccl26    0.9786242 0.9316714 1.0050418 1.0356912      3
## Ccl24    0.9361875 0.9389519 1.0271715 1.0666504      3
## Serpine1 0.9992854 1.0101892 0.9987391 0.9768985      2
## Il17ra   1.0549646 1.0201907 0.9829150 0.9061829      1
## Tnfrsf1a 1.0196453 1.0356289 0.9909272 0.9341439      2
## Pde2a    0.9950061 1.0437024 1.0036394 0.9912396      2
## Gprc5b   0.9359255 0.9440990 1.0133725 1.0411849      3
## Adam8    1.0053608 1.0149603 1.0010803 1.0082890      2
## Cx3cl1   0.9103365 0.9611052 0.9865766 1.0385095      3
## Tlr3     1.0712287 0.9860523 1.0268784 0.9320602      1
## Ednra    0.9760352 1.0376250 0.9664353 1.0616230      3
## Ldlr     0.9951658 1.0412401 1.0044728 0.9757518      2
## Ets1     0.9649667 0.9932523 0.9633367 1.0178563      3
## Tlr9     0.9845119 1.0433439 1.0179088 0.9331340      2
## Hyal2    0.9805355 0.9952188 0.9537844 1.0180491      3
## Ccr2     1.0781580 0.9166186 0.9737452 0.9468374      1
## Ccr5     1.0379514 1.0791485 0.9571004 0.9994896      2
## Cdk19    1.0834689 0.9774149 0.9731305 1.0011084      1
## Egfr     0.9594616 0.9865012 0.9968854 1.0231812      3
## Ccl2     0.9410630 0.9234708 0.9172620 1.0855231      3
## Ccl7     0.9377025 0.9959727 0.9830637 0.9999279      3
## Ccl11    0.9395817 1.0046657 0.9952031 1.1051408      3
## Ccl12    1.0210894 0.9428178 0.9676126 1.0958590      3
## Ccl4     0.9173600 1.0137382 0.9935696 0.8478784      2
## Stat5a   0.9984534 1.0020705 0.9927384 1.0171578      3
## Ace      0.9411769 0.9791180 0.9885292 1.0293273      3
## Tnip1    0.9967821 1.0371306 0.9868051 0.9751483      2
## Ccl1     0.9490765 1.0061028 1.0299452 1.0591937      3
## Ccl9     1.0748156 1.0255019 1.0143933 1.0110047      1
## Ccl6     0.9836699 1.0019130 0.9749013 1.0098487      3
## Ccl3     1.0027344 1.0298867 1.0158420 0.9723747      2
## Stat5b   1.0067589 1.0196183 0.9994107 0.9936060      2
## Prkca    1.0188645 0.9967514 0.9482353 0.9719247      1
## Itga2    0.9682602 0.9558409 0.9855012 1.0177017      3
## Wnt5a    0.9772947 0.9830877 0.9540308 1.0548477      3
## Il17rb   0.9341813 0.9980961 0.9743833 1.0585646      3
## Ptger4   0.9765908 1.0148698 1.0042148 0.9546091      2
## Cd47     0.9976905 0.9901815 1.0223011 1.0073487      3
## Ager     0.9438379 1.0014070 0.9860253 1.0298349      3
## Tnf      1.0131602 1.0011471 1.0135716 0.9635546      1
## Gpsm3    0.9960939 1.0271684 0.9634463 0.9898771      2
## Jak2     1.0512738 0.9872834 0.9996479 0.9795569      1
## Il33     1.0075621 0.9593182 0.9695374 1.0968678      3

按照聚类顺序将聚类1和2的数据提取出来

order_row = A$tree_row$order  #记录热图的行排序
data_pos = data_pos[order_row,]   # 按照热图的顺序,重新排原始数据
data_pos = data.frame(rownames(data_pos),data_pos,check.names =T)  # 将行名加到表格数据中
data_pos =data_pos[(data_pos$clus %in% c("1","2")),]
#或者data_pos =data_pos[(data_pos$clus < 3),]
data_pos #查看数据
##          rownames.data_pos.      X12h      X24h       X3d       X7d clus_a
## Prkca                 Prkca 1.0188645 0.9967514 0.9482353 0.9719247      1
## Ccr2                   Ccr2 1.0781580 0.9166186 0.9737452 0.9468374      1
## Jak2                   Jak2 1.0512738 0.9872834 0.9996479 0.9795569      1
## Cdk19                 Cdk19 1.0834689 0.9774149 0.9731305 1.0011084      1
## Ccl9                   Ccl9 1.0748156 1.0255019 1.0143933 1.0110047      1
## Tlr4                   Tlr4 1.0503438 1.0367557 1.0205949 0.9283988      1
## Il17ra               Il17ra 1.0549646 1.0201907 0.9829150 0.9061829      1
## Tlr3                   Tlr3 1.0712287 0.9860523 1.0268784 0.9320602      1
## Tnf                     Tnf 1.0131602 1.0011471 1.0135716 0.9635546      1
## S100a8               S100a8 1.1048567 0.9885955 1.1913848 0.9661311      1
## Il1rl1               Il1rl1 1.0475045 0.9948235 1.0268230 1.0106219      1
## Pla2g4a             Pla2g4a 1.0225494 0.9872852 1.0146842 1.0025812      1
## Tnfsf4               Tnfsf4 0.9327795 1.0742431 0.9676329 1.0100173      2
## Adam8                 Adam8 1.0053608 1.0149603 1.0010803 1.0082890      2
## Ccr5                   Ccr5 1.0379514 1.0791485 0.9571004 0.9994896      2
## Gpsm3                 Gpsm3 0.9960939 1.0271684 0.9634463 0.9898771      2
## Ccl4                   Ccl4 0.9173600 1.0137382 0.9935696 0.8478784      2
## Ptger4               Ptger4 0.9765908 1.0148698 1.0042148 0.9546091      2
## Tlr9                   Tlr9 0.9845119 1.0433439 1.0179088 0.9331340      2
## Ccl3                   Ccl3 1.0027344 1.0298867 1.0158420 0.9723747      2
## Ldlr                   Ldlr 0.9951658 1.0412401 1.0044728 0.9757518      2
## Tlr2                   Tlr2 0.9925440 1.0575861 0.9961778 0.9766016      2
## Pde2a                 Pde2a 0.9950061 1.0437024 1.0036394 0.9912396      2
## Tnfrsf1a           Tnfrsf1a 1.0196453 1.0356289 0.9909272 0.9341439      2
## Clock                 Clock 1.0138973 1.0231884 1.0113094 0.9766799      2
## Serpine1           Serpine1 0.9992854 1.0101892 0.9987391 0.9768985      2
## Fabp4                 Fabp4 1.0056085 1.0287909 0.9618795 0.9613461      2
## Tnip1                 Tnip1 0.9967821 1.0371306 0.9868051 0.9751483      2
## Stat5b               Stat5b 1.0067589 1.0196183 0.9994107 0.9936060      2
data_pos_reorder=data_pos[,-6]
write.table(data_pos_reorder,file="data_pos_reorder.txt",row.names=FALSE,quote = FALSE,sep='\t')  #输出结果,按照热图中的顺序

2. 对抑炎数据做聚类分析并提取所需信息

读入抑炎数据并对数据做简单处理

data_neg<-read_excel("demo3.xlsx",2)    #读入excel数据
data_neg<-as.data.frame(data_neg) # 将data转换为data.frame格式
rownames(data_neg)<-data_neg[,1]  #将第一列数据设置为行名
data_neg<-data_neg[,-1] #去除重复的第一列
colnames(data_neg) = gsub("_expression","",colnames(data_neg))
head(data_neg)
##               12h       24h        3d       7d
## Il10    0.8684374 0.9661305 0.9805500 1.053469
## Adora1  0.9303991 0.9886013 0.9663481 1.056422
## Il2ra   0.9800390 0.9831342 0.9467422 1.011937
## Tnfaip6 0.9267055 1.0009174 1.0224794 1.063077
## Tyro3   0.9413618 0.9496315 0.9760268 1.060393
## Gata3   0.9527483 0.9656051 0.9631235 1.045775

绘图查看基因聚类情况

B<-pheatmap(data_neg,fontsize_col=12,fontsize_row=8,
            show_rownames=F,show_colnames = T,
            cluster_row=T,cluster_col=F,scale="row",
            treeheight_row=25,treeheight_col=25,
            border_color=NA,main="Heatmap",
            color = colorRampPalette(c("blue", "white", "red"))(50),
            cutree_rows=2)  # 画图
image.gif
ggsave(B,filename = "demo3_neg.pdf",width=3,height=8)
ggsave(B,filename = "demo3_neg.png",width=3,height=8)

从上图发现,基因按照表达模式可分为2大类,其中1类是想要的数据。
那如何提取这部分数据呢?

首先将表达数据与聚类分类信息合并

clus_b=cutree(B$tree_row,2) #获取按照3块,得到聚类模块信息
table(clus_b)# 每个聚类模块的数量
## clus_b
##  1  2
## 52 30
clus_b<-as.data.frame(clus_b)
data_neg= cbind(data_neg, clus_b)
head(data_pos) #查看合并后数据
##       rownames.data_pos.     X12h      X24h       X3d       X7d clus_a
## Prkca              Prkca 1.018865 0.9967514 0.9482353 0.9719247      1
## Ccr2                Ccr2 1.078158 0.9166186 0.9737452 0.9468374      1
## Jak2                Jak2 1.051274 0.9872834 0.9996479 0.9795569      1
## Cdk19              Cdk19 1.083469 0.9774149 0.9731305 1.0011084      1
## Ccl9                Ccl9 1.074816 1.0255019 1.0143933 1.0110047      1
## Tlr4                Tlr4 1.050344 1.0367557 1.0205949 0.9283988      1

按照聚类顺序将聚类1和2的数据提取出来

order_row = B$tree_row$order  #记录热图的行排序
data_neg = data_neg[order_row,]   # 按照热图的顺序,重新排原始数据
data_neg = data.frame(rownames(data_neg),data_neg,check.names =T)  # 将行名加到表格数据中
table(data_neg$clus)
##
##  1  2
## 52 30
data_neg =data_neg[(data_neg$clus %in% c("1")),]
#或者data_neg =data_neg[(data_neg$clus < 2),]
head(data_neg)
##        rownames.data_neg.      X12h      X24h       X3d       X7d clus_b
## Chrna7             Chrna7 0.9867821 0.9545026 0.9598899 0.9981845      1
## Aoah                 Aoah 1.0066938 0.9492302 0.9263789 1.0097112      1
## Pbk                   Pbk 1.0071100 0.9945398 1.0009174 1.0369612      1
## Il2                   Il2 0.9414986 0.9277887 0.9402898 1.1531456      1
## Apoe                 Apoe 0.9638917 0.9554340 0.9667258 1.0191987      1
## Apoa1               Apoa1 0.9878721 0.9745789 0.9888851 1.0551538      1
data_neg_reorder=data_neg[,-6]
write.table(data_neg_reorder,file="data_neg_reorder.txt",row.names=FALSE,quote = FALSE,sep='\t')  #输出结果,按照热图中的顺序

3. 限定pos组数据的颜色,统一颜色标尺

首席清除环境,加载R包并读入上游筛选出的促炎数据

rm(list = ls()) #清除环境内存
library(pheatmap) #加载pheatmap包
library(readxl)  #加载readxl包
data=read.table('data_pos_reorder.txt',sep = '\t',stringsAsFactors = F,header = T)
data<-as.data.frame(data) # 将data转换为data.frame格式
rownames(data)<-data[,1]  #将第一列数据设置为行名
data<-data[,-1] #去除重复的第一列
colnames(data) = gsub("X","",colnames(data))

限定颜范围

ys <- c(seq(-1,-0.01,by=0.01),seq(0,1,by=0.01))#设定颜色范围数

绘图(促炎热图)

p1=pheatmap(data,fontsize_col=8,fontsize_row=7,
         show_rownames=T,show_colnames = T,
         cluster_row=T,cluster_col=F,scale="row",
         treeheight_row=25,treeheight_col=25,
         border_color=NA,main="Heatmap",
         color = c(colorRampPalette(colors = c("blue","white"))(length(ys)/2),colorRampPalette(colors = c("white","red"))(length(ys)/2)),
         legend_breaks=seq(-1,1,0.5),
         breaks=ys)
image
ggsave(p1,filename = "demo3_1.pdf",width=3,height=5)
ggsave(p1,filename = "demo3_1.png",width=3,height=5)

4. 限定neg组数据的颜色,统一颜色标尺

首席清除环境,加载R包并读入上游筛选出的促炎数据

rm(list = ls()) #清除环境内存
library(pheatmap) #加载pheatmap包
library(readxl)  #加载readxl包
data=read.table('data_neg_reorder.txt',sep = '\t',stringsAsFactors = F,header = T)
data<-as.data.frame(data) # 将data转换为data.frame格式
rownames(data)<-data[,1]  #将第一列数据设置为行名
data<-data[,-1] #去除重复的第一列
colnames(data) = gsub("X","",colnames(data))

限定颜范围

ys <- c(seq(-1,-0.01,by=0.01),seq(0,1,by=0.01))#设定颜色范围数

绘图(抑炎热图)

p2=pheatmap(data,fontsize_col=8,fontsize_row=7,
         show_rownames=T,show_colnames = T,
         cluster_row=T,cluster_col=F,scale="row",
         treeheight_row=25,treeheight_col=25,
         border_color=NA,main="Heatmap",
         color = c(colorRampPalette(colors = c("blue","white"))(length(ys)/2),colorRampPalette(colors = c("white","red"))(length(ys)/2)),
         legend_breaks=seq(-1,1,0.5),
         breaks=ys)
image
ggsave(p2,filename = "demo3_2.pdf",width=3,height=8)
ggsave(p2,filename = "demo3_2.png",width=3,height=8)

5. 合并两张图片

最后可以用AI组合两个结果,也许R包也可以,还没有学。。。成图如下:整体体现出促炎反应随着时间减弱,抑炎随着时间增强,两个功能呈现此消彼长的趋势,符合实验表型。

image.png

显示运行环境

sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936
## [2] LC_CTYPE=Chinese (Simplified)_China.936
## [3] LC_MONETARY=Chinese (Simplified)_China.936
## [4] LC_NUMERIC=C
## [5] LC_TIME=Chinese (Simplified)_China.936
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
##
## other attached packages:
## [1] export_0.2.2    readxl_1.3.1    pheatmap_1.0.12 ggplot2_3.2.1
##
## loaded via a namespace (and not attached):
##  [1] rgl_0.100.50            Rcpp_1.0.3              lattice_0.20-38
##  [4] tidyr_1.0.2             assertthat_0.2.1        digest_0.6.24
##  [7] mime_0.9                R6_2.4.1                cellranger_1.1.0
## [10] backports_1.1.5         evaluate_0.14           pillar_1.4.3
## [13] gdtools_0.2.1           rlang_0.4.4             lazyeval_0.2.2
## [16] uuid_0.1-4              miniUI_0.1.1.1          data.table_1.12.8
## [19] flextable_0.5.9         rmarkdown_2.1           webshot_0.5.2
## [22] stringr_1.4.0           htmlwidgets_1.5.1       munsell_0.5.0
## [25] shiny_1.4.0             broom_0.5.5             compiler_3.6.2
## [28] httpuv_1.5.2            xfun_0.12               pkgconfig_2.0.3
## [31] systemfonts_0.1.1       base64enc_0.1-3         rvg_0.2.4
## [34] htmltools_0.4.0         tidyselect_1.0.0        tibble_2.1.3
## [37] crayon_1.3.4            dplyr_0.8.4             withr_2.1.2
## [40] later_1.0.0             grid_3.6.2              nlme_3.1-142
## [43] jsonlite_1.6.1          xtable_1.8-4            gtable_0.3.0
## [46] lifecycle_0.1.0         magrittr_1.5            scales_1.1.0
## [49] zip_2.0.4               stringi_1.4.6           promises_1.1.0
## [52] xml2_1.2.2              vctrs_0.2.3             generics_0.0.2
## [55] openxlsx_4.1.4          stargazer_5.2.2         RColorBrewer_1.1-2
## [58] tools_3.6.2             manipulateWidget_0.10.1 glue_1.3.1
## [61] officer_0.3.8           purrr_0.3.3             crosstalk_1.0.0
## [64] fastmap_1.0.1           yaml_2.2.1              colorspace_1.4-1
## [67] knitr_1.28

往期回顾
R绘图|ggplot2散点图的绘制
R绘图|pheatmap热图绘制——基础篇
R绘图|pheatmap热图绘制——中阶篇

今天的内容就到这里~~,更多内容可关注公共号“YJY技能修炼”~~

你可能感兴趣的:(R绘图|pheatmap热图绘制——高阶篇)