

Here we document the public REST API that comes with PubSeq. The tests run in emacs org-babel. See the bottom of this document for running the tests inside emacs. See bottom of the page how to run tests.


We built a REST API for COVID-19 PubSeq. The API source code can be found in api.py. To see if the service is up try

curl http://covid19.genenetwork.org/api/version
  "service": "PubSeq",
  "version": 0.1

The current API can fetch data

curl http://covid19.genenetwork.org/api/search?s=MT533203.1
    "collection": "http://covid19.genenetwork.org/resource",
    "fasta": "http://covid19.genenetwork.org/resource/lugli-4zz18-uovend31hdwa5ks",
    "id": "MT533203.1",
    "info": "http://identifiers.org/insdc/MT533203.1#sequence"

curl http://covid19.genenetwork.org/api/sample/MT533203.1.json
    "collection": "http://covid19.genenetwork.org/resource",
    "date": "2020-04-27",
    "fasta": "http://covid19.genenetwork.org/resource/lugli-4zz18-uovend31hdwa5ks",
    "id": "MT533203.1",
    "info": "http://identifiers.org/insdc/MT533203.1#sequence",
    "mapper": "minimap v. 2.17",
    "sequencer": "http://www.ebi.ac.uk/efo/EFO_0008632",
    "specimen": "http://purl.obolibrary.org/obo/NCIT_C155831"

The Python3 version is

import requests
baseURL="http://localhost:5067" # for development
# baseURL="http://covid19.genenetwork.org"
response = requests.get(baseURL+"/api/version")
response_body = response.json()
assert response_body["service"] == "PubSeq", "PubSeq API not found"
service : PubSeq version : 0.1

Search for an entry

When you use the search box on PubSeq it queries the REST end point for information on the search items. For example

collection : http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126 fasta : http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126/sequence.fasta id : MT533203.1 info : http://identifiers.org/insdc/MT533203.1#sequence

where collection is the raw uploaded data. The hash value in c= is computed on the contents of the Arvados keep collection and effectively acts as a deduplication uuid.

Fetch metadata

Using above collection link you can fetch the metadata in JSON as it was uploaded originally from the SHeX expression, e.g. using https://collections.lugli.arvadosapi.com/c=0015b0d65dfd2e82bb3cee4436bf2893+126/

But better to use the more advanced sample metadata fetcher because is does a bit more in terms of expansion

collection : http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126 date : 2020-04-27 fasta : http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126/sequence.fasta id : MT533203.1 info : http://identifiers.org/insdc/MT533203.1#sequence mapper : minimap v. 2.17 sequencer : http://www.ebi.ac.uk/efo/EFO_0008632 specimen : http://purl.obolibrary.org/obo/NCIT_C155831


PubSeq provides an API that is used to export formats that are suitable for uploading data to EBI/ENA from our EXPORT menu. This is documented here.

<?xml version="1.0" encoding="UTF-8"?>
  <SAMPLE alias="MT326090.1" center_name="COVID-19 PubSeq">
    <TITLE>COVID-19 PubSeq Sample</TITLE>
      <SCIENTIFIC_NAME>Severe acute respiratory syndrome coronavirus 2</SCIENTIFIC_NAME>
        <TAG>investigation type</TAG>
        <TAG>sequencing method</TAG>
        <TAG>collection date</TAG>
        <TAG>geographic location (latitude)</TAG>
        <TAG>geographic location (longitude)</TAG>
     <TAG>geographic location (country and/or sea)</TAG>
        <TAG>geographic location (region and locality)</TAG>
        <TAG>environment (material)</TAG>

Configure emacs to run tests

Execute a code block with C-c C-c. You may need to set

 '((python . t)))
(setq org-babel-python-command "python3")
(setq org-babel-eval-verbose t)
(setq org-confirm-babel-evaluate nil)

To skip confirmations you may also want to set

(setq org-confirm-babel-evaluate nil)

To see output of the interpreter open then Python buffer.