R语言：clusterProfiler进行GO富集分析和Gene

一、读取文件，ID转换

1.读取文件

library(clusterProfiler)
library(org.Hs.eg.db)
#读取文件，原始文件中使用空格分割的
go_ythdf2 <- read.table("./goList/RIP YTHDF2 RNAseq.txt",sep=" ")
go_ythdf2 <- t(go_ythdf2)

2.ID转换，ENTREZID是进行GO分析最好的ID类型（针对clusterProfiler）

ID转换用到的是bitr()函数，bitr()的使用方法：

bitr(geneID, fromType, toType, OrgDb, drop = TRUE)

geneID：一个含有gene_name的矢量
orgDb：人类的注释包是 org.Hs.eg.db
fromType：输入的gene_name的类型
toType：需要转换成的gene_name的类型，可以是多种类型，用大写character类型向量表示

3.gene_name的类型查看

org.Hs.eg.db包含有多种gene_name的类型

#查看org.Hs.eg.db中可以被选择/使用的类型
keytypes(org.Hs.eg.db) 

[1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS" "ENTREZID"    
[7] "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"     "GO"           "GOALL"       
[13] "IPI"          "MAP"          "OMIM"         "ONTOLOGY"     "ONTOLOGYALL"  "PATH"        
[19] "PFAM"         "PMID"         "PROSITE"      "REFSEQ"       "SYMBOL"       "UCSCKG"      
[25] "UNIGENE"      "UNIPROT"

#ID转换，把gene_symbol转换成ENTREZID。
go_ythdf2_id_trance <- bitr(go_ythdf2,fromType = "SYMBOL",toType = "ENTREZID",OrgDb = "org.Hs.eg.db",drop = T)

4.org.Hs.eg.db的相关函数

keytypes()：keytypes(x)，查看注释包中可以使用的类型
columns()：类似于keytypes()，针对org.Hs.eg.db两个函数返回值一致
select()：select(x, keys, columns, keytype, ...) eg.

#类似于bitr()的注释方式
select(org.Hs.eg.db, keys=geneids, columns=c("SYMBOL", "GENENAME"，"GO"), keytype="ENSEMBL")

二、GO注释

函数enrichGO()进行GO富集分析，enrichGO()的使用方法：

enrichGO(gene, OrgDb, keyType = "ENTREZID", ont = "MF", pvalueCutoff = 0.05, 
           pAdjustMethod = "BH", universe, qvalueCutoff = 0.2, minGSSize = 10, 
           maxGSSize = 500, readable = FALSE, pool = FALSE)
           
Arguments:
  gene                 a vector of entrez gene id.
  OrgDb                OrgDb
  keyType              keytype of input gene
  ont                  One of "MF", "BP", and "CC" subontologies or 'ALL'.
  pvalueCutoff         Cutoff value of pvalue.
  pAdjustMethod        one of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"
  universe             background genes
  qvalueCutoff         qvalue cutoff
  minGSSize            minimal size of genes annotated by Ontology term for testing.
  maxGSSize            maximal size of genes annotated for testing
  readable             whether mapping gene ID to gene Name
  pool                 If ont=’ALL’, whether pool 3 GO sub-ontologies

举例：

result_go_ythdf2 <- enrichGO(go_ythdf2_id_trance$ENTREZID,OrgDb = "org.Hs.eg.db",
                             keyType = "ENTREZID",ont = "ALL",readable = T)

结果处理

画图

dotplot(result_go_ythdf2,showCategory=50)
barplot(result_go_ythdf2,showCategory=20)

保存

df_go_ythdf2 <- as.data.frame(result_go_ythdf2)
write.table(df_go_ythdf2,"df_go_ythdf2.xls",quote = F,sep = "\t")

三、整体代码

library("clusterProfiler")
library("org.Hs.eg.db")

#读取数据
go_ythdf2 <- read.table("./goList/RIP YTHDF2 RNAseq.txt",sep=" ")
go_ythdf2 <- t(go_ythdf2)

#ID转换
go_ythdf2_id_trance <- bitr(go_ythdf2,fromType = "SYMBOL",toType = "ENTREZID",
                            OrgDb = "org.Hs.eg.db",drop = T)

#GO分析
result_go_ythdf2 <- enrichGO(go_ythdf2_id_trance$ENTREZID,OrgDb = "org.Hs.eg.db",
                             keyType = "ENTREZID",ont = "ALL",readable = T)

#绘图
dotplot(result_go_ythdf2,showCategory=50)
barplot(result_go_ythdf2,showCategory=20)

#保存
df_go_ythdf2 <- as.data.frame(result_go_ythdf2)
write.table(df_go_ythdf2,"df_go_ythdf2.xls",quote = F,sep = "\t")

我想建立并管理一个高质量的生信&统计相关的微信讨论群，如果你想参与讨论，可以添加微信：veryqun 。我会拉你进群，当然有问题也可以微信咨询我。

R语言：clusterProfiler进行GO富集分析和Gene_ID转换