网上有一些关于GOATOOLS包的使用,但感觉跨度有点大,对一些生物信息学刚入门的同学很不利,我特意贴了一些案例,给同学们参考。
重点理解在于,我们应将goterm看成一种数据类型,并在obo_parser.py文件查找对应的使用方法。我写的案例十分简单,希望能带来一些帮助。
案例来源:https://link.springer.com/content/pdf/10.1007%2F978-1-4939-3743-1.pdf
Here we will use the Python package - GOATOOLS to query the GO. This package can read the GO structure stored in OBO format, which is available from the GO website. After loading this file, it is convenient to traverse the GO hierarchy, search for particular GO terms, and find out which other terms they are related to and how.
You can install the goatools
through pip
tool.The GOATOOLS package contains the function obo_parser.GODag()
to load the GO file. Each GO term in the resulting object is an instance of the GOTerm class, which contains many useful attributes, including:
GOTerm.name
: textual definition;GOTerm.namespace
: the ontology the term belongs to (i.e., MF, BP, CC);GOTerm.parents
: list of parent terms;GOTerm.children
: list of children terms;GOTerm.level
: shortest distance to the root node.go-basic.obo
), and load it using the function obo_parser.GODag()
from the package GOATOOLS.GO:0048527
?GO:0048527
?GO:0048527
?GO:0048527
. Hint: use yourGO:0048527
and GO:0097178
?GO:0007124
(pseudohyphal growth)? Hint: load the relationshipfrom goatools import obo_parser
term1 = obo_parser.GODag('D:/edit/biology informatics/go-basic.obo').query_term('GO:0048527')
print("The the immediate parent(s):")
print(term1.parents)
print("The the immediate children:")
print(term1.children)
print("all the parent and child terms:")
print(term1.get_all_parents())
print(term1.get_all_children())
reader = obo_parser.OBOReader('D:/edit/biology informatics/go-basic.obo')
x = []
i = 0
for rec in reader:
if "growth" in rec.name:
x = x + [rec.id]
i = i + 1
print("GO terms have the word “growth”")
print(x)
print("The number of GO terms have the word “growth”")
print(i)
parent1 = term1.get_all_parents()
term2 = obo_parser.GODag('D:/edit/biology informatics/go-basic.obo', optional_attrs = "relationship").query_term('GO:0097178')
parent2 = term2.get_all_parents()
sameparents = parent1&parent2
print("sameparent")
print(sameparents)
term = obo_parser.OBOReader('D:/edit/biology informatics/go-basic.obo', optional_attrs = "relationship")
print("regulate GO:0007124")
rela1 = {'regulates': {'GO:0007124'}}
for rec1 in term:
if rela1 == rec1.relationship:
print(rec1)
跑出来的结果如下:
The the immediate parent(s):
{GOTerm('GO:0048528'):
id:GO:0048528
name:post-embryonic root development
namespace:biological_process
_parents: 2 items
GO:0048364
GO:0090696
parents: 2 items
GO:0090696 level-03 depth-04 post-embryonic plant organ development [biological_process]
GO:0048364 level-04 depth-04 root development [biological_process]
children: 1 items
GO:0048527 level-05 depth-06 lateral root development [biological_process]
level:4
depth:5
is_obsolete:False
alt_ids: 0 items}
The the immediate children:
set()
all the parent and child terms:
{'GO:0009791', 'GO:0048364', 'GO:0090696', 'GO:0048528', 'GO:0032502', 'GO:0099402', 'GO:0008150', 'GO:0048856', 'GO:0032501'}
set()
GO terms have the word “growth”
['GO:0000190', 'GO:0000191', 'GO:0000192', 'GO:0000193', 'GO:0000194', 'GO:0000195', 'GO:0000903', 'GO:0001402', 'GO:0001403', 'GO:0001404', 'GO:0001544', 'GO:0001545', 'GO:0001546', 'GO:0001547', 'GO:0001555', 'GO:0001557', 'GO:0001558', 'GO:0001559', 'GO:0001560', 'GO:0001571', 'GO:0001616', 'GO:0001617', 'GO:0001832', 'GO:0003135', 'GO:0003141', 'GO:0003241', 'GO:0003243', 'GO:0003244', 'GO:0003245', 'GO:0003246', 'GO:0003247', 'GO:0003248', 'GO:0003268', 'GO:0003302', 'GO:0003416', 'GO:0003417', 'GO:0003418', 'GO:0003419', 'GO:0003420', 'GO:0003421', 'GO:0003422', 'GO:0003423', 'GO:0003424', 'GO:0003425', 'GO:0003426', 'GO:0003427', 'GO:0003428', 'GO:0003429', 'GO:0003430', 'GO:0003431', 'GO:0003432', 'GO:0003434', 'GO:0003435', 'GO:0003436', 'GO:0003437', 'GO:0004903', 'GO:0005006', 'GO:0005007', 'GO:0005008', 'GO:0005010', 'GO:0005017', 'GO:0005018', 'GO:0005019', 'GO:0005021', 'GO:0005024', 'GO:0005025', 'GO:0005026', 'GO:0005072', 'GO:0005104', 'GO:0005105', 'GO:0005111', 'GO:0005114', 'GO:0005131', 'GO:0005154', 'GO:0005155', 'GO:0005156', 'GO:0005159', 'GO:0005160', 'GO:0005161', 'GO:0005163', 'GO:0005171', 'GO:0005172', 'GO:0005520', 'GO:0007117', 'GO:0007118', 'GO:0007119', 'GO:0007124', 'GO:0007125', 'GO:0007150', 'GO:0007173', 'GO:0007174', 'GO:0007175', 'GO:0007176', 'GO:0007179', 'GO:0007180', 'GO:0007181', 'GO:0007285', 'GO:0007295', 'GO:0007426', 'GO:0007446', 'GO:0008083', 'GO:0008084', 'GO:0008259', 'GO:0008543', 'GO:0008582', 'GO:0009825', 'GO:0009826', 'GO:0009831', 'GO:0009860', 'GO:0009932', 'GO:0010075', 'GO:0010080', 'GO:0010081', 'GO:0010082', 'GO:0010083', 'GO:0010448', 'GO:0010449', 'GO:0010450', 'GO:0010451', 'GO:0010465', 'GO:0010568', 'GO:0010570', 'GO:0010573', 'GO:0010574', 'GO:0010575', 'GO:0010640', 'GO:0010641', 'GO:0010642', 'GO:0014815', 'GO:0014843', 'GO:0015058', 'GO:0016049', 'GO:0016520', 'GO:0016608', 'GO:0016942', 'GO:0017015', 'GO:0017052', 'GO:0017134', 'GO:0019838', 'GO:0021811', 'GO:0021875', 'GO:0021899', 'GO:0021907', 'GO:0022003', 'GO:0022026', 'GO:0030252', 'GO:0030307', 'GO:0030308', 'GO:0030353', 'GO:0030372', 'GO:0030373', 'GO:0030426', 'GO:0030427', 'GO:0030447', 'GO:0030448', 'GO:0030511', 'GO:0030512', 'GO:0030616', 'GO:0030617', 'GO:0030618', 'GO:0030715', 'GO:0030947', 'GO:0030948', 'GO:0030949', 'GO:0031384', 'GO:0031385', 'GO:0031770', 'GO:0031994', 'GO:0031995', 'GO:0032455', 'GO:0032584', 'GO:0032601', 'GO:0032605', 'GO:0032643', 'GO:0032646', 'GO:0032683', 'GO:0032686', 'GO:0032723', 'GO:0032726', 'GO:0032902', 'GO:0032903', 'GO:0032904', 'GO:0032905', 'GO:0032906', 'GO:0032907', 'GO:0032908', 'GO:0032909', 'GO:0032910', 'GO:0032911', 'GO:0032912', 'GO:0032913', 'GO:0032914', 'GO:0032915', 'GO:0032916', 'GO:0033665', 'GO:0033666', 'GO:0033667', 'GO:0034713', 'GO:0034714', 'GO:0035001', 'GO:0035264', 'GO:0035265', 'GO:0035266', 'GO:0035318', 'GO:0035463', 'GO:0035464', 'GO:0035465', 'GO:0035559', 'GO:0035602', 'GO:0035603', 'GO:0035604', 'GO:0035607', 'GO:0035625', 'GO:0035728', 'GO:0035729', 'GO:0035766', 'GO:0035768', 'GO:0035790', 'GO:0035791', 'GO:0035793', 'GO:0035842', 'GO:0035924', 'GO:0035980', 'GO:0036095', 'GO:0036119', 'GO:0036120', 'GO:0036165', 'GO:0036168', 'GO:0036170', 'GO:0036171', 'GO:0036177', 'GO:0036178', 'GO:0036180', 'GO:0036187', 'GO:0036267', 'GO:0036323', 'GO:0036324', 'GO:0036325', 'GO:0036332', 'GO:0036363', 'GO:0036364', 'GO:0036365', 'GO:0036366', 'GO:0036454', 'GO:0036458', 'GO:0038004', 'GO:0038005', 'GO:0038029', 'GO:0038033', 'GO:0038044', 'GO:0038045', 'GO:0038084', 'GO:0038085', 'GO:0038086', 'GO:0038087', 'GO:0038088', 'GO:0038089', 'GO:0038090', 'GO:0038091', 'GO:0038167', 'GO:0038168', 'GO:0038180', 'GO:0040007', 'GO:0040008', 'GO:0040009', 'GO:0040010', 'GO:0040014', 'GO:0040015', 'GO:0040018', 'GO:0040036', 'GO:0040037', 'GO:0042057', 'GO:0042058', 'GO:0042059', 'GO:0042065', 'GO:0042066', 'GO:0042547', 'GO:0042567', 'GO:0042568', 'GO:0042702', 'GO:0042814', 'GO:0042815', 'GO:0043183', 'GO:0043184', 'GO:0043185', 'GO:0043567', 'GO:0043568', 'GO:0043569', 'GO:0043929', 'GO:0043930', 'GO:0044110', 'GO:0044112', 'GO:0044116', 'GO:0044117', 'GO:0044119', 'GO:0044121', 'GO:0044123', 'GO:0044125', 'GO:0044126', 'GO:0044128', 'GO:0044130', 'GO:0044133', 'GO:0044135', 'GO:0044137', 'GO:0044139', 'GO:0044140', 'GO:0044142', 'GO:0044144', 'GO:0044146', 'GO:0044148', 'GO:0044151', 'GO:0044153', 'GO:0044180', 'GO:0044181', 'GO:0044182', 'GO:0044294', 'GO:0044295', 'GO:0044344', 'GO:0044408', 'GO:0044412', 'GO:0045189', 'GO:0045311', 'GO:0045420', 'GO:0045421', 'GO:0045422', 'GO:0045570', 'GO:0045571', 'GO:0045572', 'GO:0045741', 'GO:0045742', 'GO:0045743', 'GO:0045886', 'GO:0045887', 'GO:0045926', 'GO:0045927', 'GO:0045967', 'GO:0046620', 'GO:0046621', 'GO:0046622', 'GO:0048008', 'GO:0048009', 'GO:0048010', 'GO:0048012', 'GO:0048175', 'GO:0048176', 'GO:0048177', 'GO:0048178', 'GO:0048406', 'GO:0048407', 'GO:0048408', 'GO:0048588', 'GO:0048589', 'GO:0048630', 'GO:0048631', 'GO:0048632', 'GO:0048633', 'GO:0048638', 'GO:0048639', 'GO:0048640', 'GO:0048689', 'GO:0048768', 'GO:0050431', 'GO:0051124', 'GO:0051210', 'GO:0051211', 'GO:0051394', 'GO:0051395', 'GO:0051396', 'GO:0051510', 'GO:0051511', 'GO:0051512', 'GO:0051513', 'GO:0051514', 'GO:0051515', 'GO:0051516', 'GO:0051517', 'GO:0051518', 'GO:0051519', 'GO:0051520', 'GO:0051521', 'GO:0051522', 'GO:0051523', 'GO:0051524', 'GO:0051819', 'GO:0051827', 'GO:0051831', 'GO:0051853', 'GO:0051854', 'GO:0051857', 'GO:0052019', 'GO:0052024', 'GO:0052108', 'GO:0052171', 'GO:0052184', 'GO:0052186', 'GO:0052512', 'GO:0052513', 'GO:0055017', 'GO:0055021', 'GO:0055022', 'GO:0055023', 'GO:0060123', 'GO:0060124', 'GO:0060125', 'GO:0060243', 'GO:0060258', 'GO:0060396', 'GO:0060397', 'GO:0060398', 'GO:0060399', 'GO:0060400', 'GO:0060416', 'GO:0060419', 'GO:0060420', 'GO:0060421', 'GO:0060437', 'GO:0060447', 'GO:0060499', 'GO:0060507', 'GO:0060560', 'GO:0060595', 'GO:0060682', 'GO:0060724', 'GO:0060726', 'GO:0060727', 'GO:0060728', 'GO:0060736', 'GO:0060737', 'GO:0060763', 'GO:0060787', 'GO:0060797', 'GO:0060798', 'GO:0060799', 'GO:0060801', 'GO:0060822', 'GO:0060825', 'GO:0060826', 'GO:0060835', 'GO:0060851', 'GO:0060878', 'GO:0061033', 'GO:0061049', 'GO:0061050', 'GO:0061051', 'GO:0061052', 'GO:0061112', 'GO:0061117', 'GO:0061313', 'GO:0061335', 'GO:0061387', 'GO:0061388', 'GO:0061389', 'GO:0061390', 'GO:0061391', 'GO:0061850', 'GO:0061913', 'GO:0061914', 'GO:0061916', 'GO:0061917', 'GO:0062031', 'GO:0070018', 'GO:0070019', 'GO:0070020', 'GO:0070021', 'GO:0070022', 'GO:0070123', 'GO:0070186', 'GO:0070195', 'GO:0070783', 'GO:0070784', 'GO:0070785', 'GO:0070786', 'GO:0070848', 'GO:0070849', 'GO:0070851', 'GO:0071363', 'GO:0071364', 'GO:0071378', 'GO:0071559', 'GO:0071560', 'GO:0071604', 'GO:0071634', 'GO:0071635', 'GO:0071636', 'GO:0071774', 'GO:0072690', 'GO:0075013', 'GO:0075014', 'GO:0075065', 'GO:0075066', 'GO:0075067', 'GO:0075068', 'GO:0075305', 'GO:0075309', 'GO:0075337', 'GO:0075338', 'GO:0075339', 'GO:0075340', 'GO:0080034', 'GO:0080092', 'GO:0080112', 'GO:0080113', 'GO:0080117', 'GO:0080186', 'GO:0080189', 'GO:0080190', 'GO:0090010', 'GO:0090012', 'GO:0090013', 'GO:0090033', 'GO:0090080', 'GO:0090214', 'GO:0090243', 'GO:0090269', 'GO:0090270', 'GO:0090271', 'GO:0090272', 'GO:0090287', 'GO:0090288', 'GO:0090360', 'GO:0090361', 'GO:0090362', 'GO:0090667', 'GO:0090668', 'GO:0090723', 'GO:0090724', 'GO:0090725', 'GO:0097076', 'GO:0097317', 'GO:0097318', 'GO:0097321', 'GO:0098867', 'GO:0098868', 'GO:0099126', 'GO:0100040', 'GO:0100041', 'GO:0100042', 'GO:0100064', 'GO:1900238', 'GO:1900428', 'GO:1900429', 'GO:1900430', 'GO:1900431', 'GO:1900432', 'GO:1900433', 'GO:1900434', 'GO:1900435', 'GO:1900436', 'GO:1900437', 'GO:1900438', 'GO:1900439', 'GO:1900440', 'GO:1900441', 'GO:1900442', 'GO:1900443', 'GO:1900444', 'GO:1900445', 'GO:1900456', 'GO:1900460', 'GO:1900461', 'GO:1900462', 'GO:1900741', 'GO:1900742', 'GO:1900743', 'GO:1900746', 'GO:1900747', 'GO:1900748', 'GO:1901048', 'GO:1901388', 'GO:1901389', 'GO:1901390', 'GO:1901392', 'GO:1901393', 'GO:1901394', 'GO:1901395', 'GO:1901396', 'GO:1901397', 'GO:1901398', 'GO:1901399', 'GO:1901400', 'GO:1902178', 'GO:1902202', 'GO:1902203', 'GO:1902204', 'GO:1902352', 'GO:1902547', 'GO:1902548', 'GO:1902727', 'GO:1902728', 'GO:1902733', 'GO:1903547', 'GO:1903548', 'GO:1903549', 'GO:1903844', 'GO:1903845', 'GO:1903846', 'GO:1904046', 'GO:1904740', 'GO:1904741', 'GO:1904847', 'GO:1904848', 'GO:1904849', 'GO:1904857', 'GO:1904858', 'GO:1904859', 'GO:1905251', 'GO:1905252', 'GO:1905253', 'GO:1905254', 'GO:1905282', 'GO:1905283', 'GO:1905284', 'GO:1905313', 'GO:1905427', 'GO:1905613', 'GO:1905614', 'GO:1905615', 'GO:1905942', 'GO:1905943', 'GO:1905944', 'GO:1990089', 'GO:1990090', 'GO:1990265', 'GO:1990270', 'GO:1990314', 'GO:1990418', 'GO:1990761', 'GO:1990812', 'GO:1990835', 'GO:1990864', 'GO:2000217', 'GO:2000218', 'GO:2000219', 'GO:2000220', 'GO:2000221', 'GO:2000222', 'GO:2000313', 'GO:2000314', 'GO:2000315', 'GO:2000387', 'GO:2000388', 'GO:2000544', 'GO:2000545', 'GO:2000546', 'GO:2000583', 'GO:2000584', 'GO:2000585', 'GO:2000586', 'GO:2000587', 'GO:2000588', 'GO:2000603', 'GO:2000604', 'GO:2000605', 'GO:2000699', 'GO:2000702', 'GO:2000703', 'GO:2000704', 'GO:2001112', 'GO:2001113', 'GO:2001114', 'GO:2001201', 'GO:2001202', 'GO:2001203']
The number of GO terms have the word “growth”
663
optional_attrs(relationship)
sameparent
{'GO:0008150'}
regulate GO:0007124
GO:2000220 regulation of pseudohyphal growth [biological_process]
答案:
• What is the name of the GO term GO:0048527?
post-embryonic root development
• What are the immediate parent(s) of the term GO:0048527?
GO:0048364 and GO:0090696
• What are the immediate children of the term GO:0048527
None
• Recursively find all the parent and child terms of the term GO:0048527Hint: use your solutions to the previous two questions, with a recursiveloop
all the parent is {‘GO:0009791’, ‘GO:0048364’, ‘GO:0090696’, ‘GO:0048528’, ‘GO:0032502’, ‘GO:0099402’, ‘GO:0008150’, ‘GO:0048856’, ‘GO:0032501’}
and all the child terms is none.
• How many GO terms have the word “growth” in their name
663
• What is the lowest common ancestor term of GO:0048527 andGO:0097178?
GO:0008150
• Which GO terms regulate GO:0007124 (pseudohyphal growth)? Hint:load the relationship tags and look for terms which define regulatio
GO:2000220
The GOATOOLS package also includes functions to visualize the GO graphs. For instance, it is possible to depict the location of a particular GO term in the ontology using the method GOTerm.draw_lineage().
GO:0097190
. From the figure, what is the name of this term?GO:0097191
(extrinsic apoptotic signaling pathway) and GO:0038034
(signalExercise B 需要安装一些包和软件比较麻烦,我直接用pycharm无法安装pygraphviz,我在这里提供给大家一个安装方法。
纪念一下装了一天终于成功了的 pygraphviz
代码:
from goatools import obo_parser
g = obo_parser.GODag('D:/edit/biology informatics/go-basic.obo', optional_attrs = "relationship")
rec = g.query_term('GO:0097190')
g.draw_lineage([rec])
term1 = obo_parser.GODag('D:/edit/biology informatics/go-basic.obo', optional_attrs = "relationship").query_term('GO:0097191')
term2 = obo_parser.GODag('D:/edit/biology informatics/go-basic.obo', optional_attrs = "relationship").query_term('GO:0038034')
parent1 = term1.get_all_parents()
parent2 = term2.get_all_parents()
sameparents = parent1&parent2
print("sameparent")
print(sameparents)
会生成一张图片,自己去试一试。
答案:
The most specific term that is in the parent terms of both GO:0097191 (extrinsic apoptotic signalling pathway) and GO:0038034 (signal transduction in absence of ligand) is GO:0007165.
An alternative to GOATOOLS and OBO files is to retrieve information relating to a specific term from a web service. One such service is the EMBL-EBI QuickGO, which can provide descriptive information about GO terms in OBO-XML format via the following URL:
http://www.ebi.ac.uk/QuickGO/GTerm?id=&format=oboxml
get_oboxml(go_id)
to
urllib
package to request the OBO-XML;dict
;GO:0048527
(lateral root development). Hint: print out the dictionary returned by the function or create a visualization using the Python package visualisedictionary
to study the structure.这里话不多说,给大家贴代码:
import future.standard_library
future.standard_library.install_aliases()
from urllib.request import urlopen
import xmltodict
def get_oboxml(go_id):
"""
This function retrieves the OBO-XML for a
given Gene Ontology term, using EMBL-EBI's
QuickGO browser.
Input: go_id - a valid Gene Ontology ID,
e.g. GO:0048527.
"""
quickgo_url = "http://ebi.ac.uk/QuickGO/GTerm?id="+go_id+"&format=oboxml"
oboxml = urlopen(quickgo_url)
# Check the response
if(oboxml.getcode() == 200):
obodict = xmltodict.parse(oboxml.read())
return obodict
else:
raise ValueError("Couldn't receive OBOXML from QuickGO. Check URL and try again.")
get_oboxml('GO:0048527')
In this exercise, we will learn how to parse a GAF file (GO Annotation File) downloaded from the UniProt-GOA database using an iterator from the BioPython package (Bio.UniProt.GOA.gafiterator
):
from Bio.UniProt.GOA import gafiterator
import gzip
fname = "gene_association.goa_arabidopsis.gz"
with gzip.open(fname, "rt") as fp:
for annotation in gafiterator(fp):
print(annotation['DB_Object_ID'])
A GAF file is a tab-delimited file containing 17 fields including:
DB
: the protein database;DB_Object_ID
: protein ID;Qualifier
: annotation qualifier (such as NOT);GO_ID
: GO term;Evidence
: evidence code.from Bio.UniProt.GOA import gafiterator
import gzip
import string
i = 0
j = 0
k = 0
t = 'NOT'
fname = "D:/edit/biology informatics/untitled1/goa_arabidopsis.gaf.gz"
with gzip.open(fname, "rb") as fp:
for annotation in gafiterator(fp):
i = i + 1
for key in annotation:
result = t in annotation[key]
if result == True:
j = 1
if j == 1:
k = k + 1
j = 0
print i, k
168540
GO:0048527
(lateral root development)?1044
from Bio.UniProt.GOA import gafiterator
import gzip
import string
i=0
j=0
k=0
growth='growth'
list1=[]
fname = "D:/edit/biology informatics/untitled1/goa_arabidopsis.gaf.gz"
with gzip.open(fname, "rb") as fp:
for annotation in gafiterator(fp):
i = i + 1
for key in annotation:
result = growth in annotation[key]
if result == True:
j = 1
if j == 1:
k = k + 1
j = 0
list1.append(annotation)
print i, k
for m in list1:
print m