快捷查找KEGG里的通路和基因

需求

1.快捷查找ID对应的description,知道通路对应的编号是多少。
2.找出某一个/几个通路里的全部基因,用来做单独的下游分析。

如果是要做KEGG的富集分析,clusterProfiler可以搞定:https://bioconductor.org/packages/release/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html

想看kegg通路图的话,用R包pathview来看,看函数的帮助文档就行。

1.找通路ID与description的对应关系

1.1网站搜索

不批量找的话,直接网站搜最简单 https://www.genome.jp/kegg/kegg2.html

1.2.借助msigdbr

需要找全部的对应关系,基于前面讲的msigdbr可以完成:https://www.jianshu.com/p/0098baf2df46

msigdb里面本来就包括了kegg,而且挺齐全的,ID,description,基因,全都有啦。

library(msigdbr)
KEGG_df = msigdbr(species = "Homo sapiens",category = "C2",subcategory = "CP:KEGG") %>% 
  dplyr::select(gs_exact_source,gene_symbol,gs_description)
head(KEGG_df)
## # A tibble: 6 x 3
##   gs_exact_source gene_symbol gs_description  
##                                
## 1 hsa02010        ABCA1       ABC transporters
## 2 hsa02010        ABCA10      ABC transporters
## 3 hsa02010        ABCA12      ABC transporters
## 4 hsa02010        ABCA13      ABC transporters
## 5 hsa02010        ABCA2       ABC transporters
## 6 hsa02010        ABCA3       ABC transporters
kegg1 = split(KEGG_df$gene_symbol,KEGG_df$gs_exact_source)
lapply(kegg1[1:6],head)
## $hsa00010
## [1] "ACSS1" "ACSS2" "ADH1A" "ADH1B" "ADH1C" "ADH4" 
## 
## $hsa00020
## [1] "ACLY" "ACO1" "ACO2" "CS"   "DLAT" "DLD" 
## 
## $hsa00030
## [1] "ALDOA" "ALDOB" "ALDOC" "DERA"  "FBP1"  "FBP2" 
## 
## $hsa00040
## [1] "AKR1B1" "CRYL1"  "DCXR"   "DHDH"   "GUSB"   "RPE"   
## 
## $hsa00051
## [1] "AKR1B1"  "AKR1B10" "ALDOA"   "ALDOB"   "ALDOC"   "FBP1"   
## 
## $hsa00052
## [1] "AKR1B1"  "B4GALT1" "B4GALT2" "G6PC"    "G6PC2"   "GAA"

2.通路ID与基因之间的对应关系

在org.Hs.eg.db包里有:

library(clusterProfiler)
library(org.Hs.eg.db)
kegg <- org.Hs.egPATH2EG
mapped <- mappedkeys(kegg)
kegg2 <- as.list(kegg[mapped])
lapply(kegg2[1:6],head)
## $`04610`
## [1] "2"   "462" "623" "624" "629" "710"
## 
## $`00232`
## [1] "9"    "10"   "1544" "1548" "1549" "1553"
## 
## $`00983`
## [1] "9"    "10"   "978"  "1066" "1548" "1549"
## 
## $`01100`
## [1] "9"  "10" "15" "18" "28" "30"
## 
## $`00380`
## [1] "15"  "26"  "38"  "39"  "217" "219"
## 
## $`00970`
## [1] "16"   "833"  "1615" "2058" "2193" "2617"

看起来像一堆密码?这个列表,名字是通路的id,只是省略了hsa,内容是基因的entrizid。

举个栗子,提取hsa03030里的基因,并且转换成symbol。

genes = unlist(kegg2["03030"])
length(genes)
## [1] 36
#想让他变成symbol直接bitr即可
genes = bitr(genes,
             fromType = "ENTREZID",
             toType = "SYMBOL",
             OrgDb = "org.Hs.eg.db")$SYMBOL
head(genes)
## [1] "DNA2" "FEN1" "LIG1" "MCM2" "MCM3" "MCM4"

你可能感兴趣的:(快捷查找KEGG里的通路和基因)