论文
A genome-scale TF-DNA interaction network of transcriptional regulation of Arabidopsis primary and specialized metabolism
https://www.embopress.org/doi/full/10.15252/msb.202110625
论文中提供了figure1中4个柱形图的数据和代码,今天的推文介绍一下画柱形图的代码以及使用ggplot2作图后如何把多个图拼接到一起,拼图使用R语言的patchwork这个R包
做柱形图的数据和代码下载链接
https://github.com/melletang/ccp_y1h
首先是读取数据
library(tidyverse)
library(readxl)
network <- readxl::read_excel("MSB-2021-10625-DatasetEV3-Network.xls",
sheet = "CC_Y1H_network")
整理数据的代码
binding_summary <- network %>% select(Promoter_AGI, Target_Pathway) %>% unique() %>% group_by(Target_Pathway) %>%
tally() %>% rename(num_gene = n)
binding_summary <- left_join(binding_summary,
network %>% select(TF_AGI, Target_Pathway) %>%
unique() %>% group_by(Target_Pathway) %>% tally() %>%
rename(num_tf = n))
binding_summary <- left_join(binding_summary,
network %>% select(TF_AGI, Promoter_AGI, Target_Pathway) %>%
unique() %>% group_by(Target_Pathway) %>% tally() %>%
rename(num_int = n))
这里遇到一个新的函数tally()
,这个函数来自dplyr这个包,作用是统计每个元素出现的个数,比如用iris这个数据集做一个简单的演示
iris %>% group_by(Species) %>% tally()
记下来是四个柱形图的代码
library(ggplot2)
panel_b <- ggplot(binding_summary, aes(reorder(Target_Pathway,num_gene), num_gene)) + geom_bar(stat = "identity", fill = "black") + coord_flip() + theme_bw() +
ylab("Number of genes") + xlab("Pathway") + theme(
axis.text = element_text(color = "black", size = "10"),
axis.title = element_text(color = "black", size = "10")
)
panel_b
panel_c <- ggplot(binding_summary, aes(reorder(Target_Pathway,num_gene), num_tf)) + geom_bar(stat = "identity", fill = "black") + coord_flip() + theme_bw() +
ylab("Number of TFs") + xlab("Pathway") + theme(
axis.text = element_text(color = "black", size = "10"),
axis.title = element_text(color = "black", size = "10"),
plot.margin = unit(c(0, 0.5, 0, 0), "cm")
)
panel_c
panel_d <- ggplot(binding_summary, aes(reorder(Target_Pathway,num_gene), num_int)) + geom_bar(stat = "identity", fill = "black") +
coord_flip() + theme_bw() + ylab("Number of interactions") + xlab("Pathway") + theme(
axis.text = element_text(color = "black", size = "10"),
axis.title = element_text(color = "black", size = "10"))
panel_d
num_path <- network %>% select(TF_AGI, Target_Pathway) %>% unique() %>% group_by(TF_AGI) %>% tally()
numpathbar <- num_path %>% group_by(n) %>% tally()
panel_e <- ggplot(numpathbar, aes(n, nn)) + geom_bar(stat = "identity", fill = "black")+ theme_bw() + ylab("Number of TFs") + xlab("Number of Pathways") + theme(
axis.text = element_text(color = "black", size = "10"),
axis.title = element_text(color = "black", size = "10")) + scale_x_continuous(breaks=seq(0,12,1))
panel_e
最后是拼图
其中的A图带概率是借助PPT做的,这里我的处理方式是用ggplot2做一个空白图占据位置,拼图后将整个图导出PPT,然后再PPT里作图A
先做个空白图
ggplot()+
theme_void() -> pA
拼图代码
library(patchwork)
(pA + (panel_b/panel_c))/(panel_d+panel_e)
添加ABCDE的文字标签
library(patchwork)
(pA + (panel_b/panel_c))/(panel_d+panel_e)+
plot_layout(heights =c(2,1) )+
plot_annotation(tag_levels = "A")
导出为PPT
library(patchwork)
(pA + (panel_b/panel_c))/(panel_d+panel_e)+
plot_layout(heights =c(2,1) )+
plot_annotation(tag_levels = "A") -> x
library(export)
export::graph2ppt(x=x,file="figure1.ppt",
width=10,
height=10,
aspectr=3/2)
欢迎大家关注我的公众号
小明的数据分析笔记本
小明的数据分析笔记本 公众号 主要分享:1、R语言和python做数据分析和数据可视化的简单小例子;2、园艺植物相关转录组学、基因组学、群体遗传学文献阅读笔记;3、生物信息学入门学习资料及自己的学习笔记!