Programmatic Access to Gene Ontology


This page documents various ways to query the ontology and annotations using the GO APIs


Query GO ontology and annotations with GOlr

The ontology and GO annotations can easily be searched and retrieved via the GO Solr search engine API called GOlr.

The following is a query example to retrieve all meta data about the GO term GO:0030182:

http://golr-aux.geneontology.io/solr/select?fq=document_category:"ontology_class"&q=*:*&fq=id:"GO:0030182"&wt=json

While the following is a query example to retrieve all annotations of the TP53 gene in rats:

http://golr-aux.geneontology.io/solr/select?fq=document_category:"annotation"&q=*:*&fq=bioentity:"RGD:3889"&wt=json

The complete XML schema of GOLr is available here. This can be used as a reference to check which fields are stored and can be queried.

Note: GOlr is powering the faceted search of AmiGO.


The purpose of the BioLink Data Model is to provide a high level datamodel of biological entities (genes, diseases, phenotypes, pathways, individuals, substances, etc), their properties, relationships, and ways in which they can be associated. The GO BioLink API implementation and its associated swagger documentation are available at http://api.geneontology.org/api.

In the BioLink Data Model, any GO term is referred to as “function”, hence the following query returns meta data about the GO term {id}:

/bioentity/function/{id}

Example: annotations for the apoptotic process term

http://api.geneontology.org/api/bioentity/function/GO:0006915

Note: pagination can be achieved by using the start & rows parameter. Example :

http://api.geneontology.org/api/bioentity/function/GO:0006915?start=0&rows=2

Query GO Causal Activity Models (Experimental)

GO also provides an API to query data about GO-CAMs as well as a swagger documentation to familiarize with the routes and parameters. The API is used to power the http://geneontology.org/go-cam section of this site.

Query example to retrieve all GO terms contained in the GO-CAM 59a6110e00000067:

https://api.geneontology.cloud/models/go?gocams=59a6110e00000067

Query example to retrieve all gene products used in the GO-CAM 59a6110e00000067:

https://api.geneontology.cloud/models/gp?gocams=59a6110e00000067

Query example to retrieve all PMIDs cited in the GO-CAM 59a6110e00000067:

https://api.geneontology.cloud/models/pmid?gocams=59a6110e00000067

Query example to retrieve all GO-CAMs implicating the mouse Rtl4 gene:

https://api.geneontology.cloud/gp/http%3A%2F%2Fidentifiers.org%2Fmgi%2FMGI%3A3588192/models

More information available on the swagger documentation of the API


Programmatic Download: BDBag

The following example requires both python and pip to be installed. Once this is done, you can install the BDBag cli by following those steps:

pip install bdbag
git clone https://github.com/fair-research/bdbag
python setup.py install

Then check that you pass all tests:

python setup.py test

Create a symlink from your bdbag application to your /usr/bin/ folder:

sudo ln -s ./bdbag /usr/bin/bdbag

Once the BDBag cli is installed, fetch a DOI versioned of GO dataset, either the full archive or the holey bag.

In this example, we plan on accessing single files, so the holey bag (containing only the references to our files) is sufficient. Once you have retrieved our DOI versioned of GO from one of the two links above, notice a file named fetch.txt. It describes all the files contained and accessible from this archive. Its syntax is as follow:

URL Length Filename
http://release.geneontology.org/2018-10-08/annotations/aspgd.gaf.gz 6346222 data/annotations/aspgd.gaf.gz
http://release.geneontology.org/2018-10-08/annotations/aspgd.gpad.gz 4883110 data/annotations/aspgd.gpad.gz
http://release.geneontology.org/2018-10-08/annotations/aspgd.gpi.gz 1367586 data/annotations/aspgd.gpi.gz

The full extent of possible queries over BDbags are described here.

GO to your DOI versioned of GO BDbag folder, you can now for instance retrieve the first file (aspgd.gaf.gz) with two different methods:

By the URL of the file

bdbag --resolve all --fetch-filter url==http://release.geneontology.org/2018-10-08/annotations/aspgd.gaf.gz ./

By the name of the file

bdbag --resolve all --fetch-filter filename==data/annotations/aspgd.gaf.gz ./

The file retrieved will be stored in the same folder hierarchy as described in the filename. In the previous example, the file aspgd.gaf.gz retrieved will be stored locally in data/annotations/

Notes:

  • this specific file could be accessed by using length==6346222 but there is no guaranty that this size is unique. The length filter is therefore better used to retrieve a set of files smaller than or greater than a certain threshold
  • holey BDbags are a very convenient way to retrieve only the files important to you as the holey BDbags only contain the references needed to actually retrieve the files of interest