R语言爬取HMDB,获取关键代谢物相关代谢通路

R语言爬取HMDB,获取关键代谢物相关代谢通路

      • HMDB数据库是代谢组学常用的代谢物查询数据库
      • 数据分析后获取关键代谢物,需要对其代谢通路进行富集分析

HMDB数据库是代谢组学常用的代谢物查询数据库

当然,更常用的是KEGG,这里先介绍HMDB

数据分析后获取关键代谢物,需要对其代谢通路进行富集分析

使用R语言Rurl和xml包对HMDB代谢通路数据进行自动获取
代码

library(XML);library(RCurl)//载入软件包,请先自行安装
pathways <- function(id){//自定义函数pathways,函数参数为HMDB代谢物的ID,如乳酸(Lactate)的id是[HMDB0000190](https://hmdb.ca/metabolites/HMDB0000190)
	url <- paste('https://hmdb.ca/metabolites/',id,'.xml',sep = '')//获取该id的HMDB网址
	wp <- getURL(url)                                              //得到当前网址的网页内容,有点慢,跟网速有关
	root <- xmlRoot(xmlParse(wp))								   //解析网页内容并得到所有根节点	
	paths <-  xmlChildren(root[[25]][[4]])						   //代谢物相关pathway内容位于根节点25,其下的子节点4
	pathways <- lapply(paths,function(x) xmlValue(x[[1]][[1]]))	   //返回所有相关pathway的内容,返回值为列表
	return(pathways)
}

pathways("HMDB0000190")  //使用乳酸的HMDB id 进行查询,不能少了引号      

结果

> pathways("HMDB0000190")
$pathway
[1] "Fructose-1,6-diphosphatase deficiency"

$pathway
[1] "Gluconeogenesis"

$pathway
[1] "Glutaminolysis and Cancer"

$pathway
[1] "Glycogen Storage Disease Type 1A (GSD1A) or Von Gierke Disease"

$pathway
[1] "Glycogenosis, Type IA. Von gierke disease"

$pathway
[1] "Glycogenosis, Type IB"

$pathway
[1] "Glycogenosis, Type IC"

$pathway
[1] "Leigh Syndrome"

$pathway
[1] "Phosphoenolpyruvate carboxykinase deficiency 1 (PEPCK1)"

$pathway
[1] "Primary hyperoxaluria II, PH2"

$pathway
[1] "Pyruvate Decarboxylase E1 Component Deficiency (PDHE1 Deficiency)"

$pathway
[1] "Pyruvate Dehydrogenase Complex Deficiency"

$pathway
[1] "Pyruvate kinase deficiency"

$pathway
[1] "Pyruvate Metabolism"

$pathway
[1] "Triosephosphate isomerase"

$pathway
[1] "Warburg Effect"

乳酸主要跟无氧酵解,糖异生,丙酮酸生成,肿瘤Warburg效应等代谢有关

你可能感兴趣的:(生信R语言,代谢组学,数据库,r语言)