aboutsummaryrefslogtreecommitdiff

PubSeq REST API

Here we document the public REST API that comes with PubSeq. The tests run in emacs org-babel. See the bottom of this document for running the tests inside emacs. See bottom of the page how to run tests.

Introduction

We built a REST API for COVID-19 PubSeq. The API source code can be found in api.py. To see if the service is up try

curl http://covid19.genenetwork.org/api/version
{
  "service": "PubSeq",
  "version": 0.1
}

The current API can fetch data

curl http://covid19.genenetwork.org/api/search?s=MT533203.1
[
  {
    "collection": "http://covid19.genenetwork.org/resource",
    "fasta": "http://covid19.genenetwork.org/resource/lugli-4zz18-uovend31hdwa5ks",
    "id": "MT533203.1",
    "info": "http://identifiers.org/insdc/MT533203.1#sequence"
  }
]

curl http://covid19.genenetwork.org/api/sample/MT533203.1.json
[
  {
    "collection": "http://covid19.genenetwork.org/resource",
    "date": "2020-04-27",
    "fasta": "http://covid19.genenetwork.org/resource/lugli-4zz18-uovend31hdwa5ks",
    "id": "MT533203.1",
    "info": "http://identifiers.org/insdc/MT533203.1#sequence",
    "mapper": "minimap v. 2.17",
    "sequencer": "http://www.ebi.ac.uk/efo/EFO_0008632",
    "specimen": "http://purl.obolibrary.org/obo/NCIT_C155831"
  }
]

The Python3 version is

import requests
baseURL="http://localhost:5067" # for development
# baseURL="http://covid19.genenetwork.org"
response = requests.get(baseURL+"/api/version")
response_body = response.json()
assert response_body["service"] == "PubSeq", "PubSeq API not found"
response_body
service : PubSeq version : 0.1

Search for an entry

When you use the search box on PubSeq it queries the REST end point for information on the search items. For example

requests.get(baseURL+"/api/search?s=MT533203.1").json()
collection : http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126 fasta : http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126/sequence.fasta id : MT533203.1 info : http://identifiers.org/insdc/MT533203.1#sequence

where collection is the raw uploaded data. The hash value in c= is computed on the contents of the Arvados keep collection and effectively acts as a deduplication uuid.

Fetch metadata

Using above collection link you can fetch the metadata in JSON as it was uploaded originally from the SHeX expression, e.g. using https://collections.lugli.arvadosapi.com/c=0015b0d65dfd2e82bb3cee4436bf2893+126/

But better to use the more advanced sample metadata fetcher because is does a bit more in terms of expansion

requests.get(baseURL+"/api/sample/MT533203.1.json").json()
collection : http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126 date : 2020-04-27 fasta : http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126/sequence.fasta id : MT533203.1 info : http://identifiers.org/insdc/MT533203.1#sequence mapper : minimap v. 2.17 sequencer : http://www.ebi.ac.uk/efo/EFO_0008632 specimen : http://purl.obolibrary.org/obo/NCIT_C155831

Fetch EBI XML

PubSeq provides an API that is used to export formats that are suitable for uploading data to EBI/ENA from our EXPORT menu. This is documented here.

requests.get(baseURL+"/api/ebi/sample-MT326090.1.xml").text
<?xml version="1.0" encoding="UTF-8"?>
<SAMPLE_SET>
  <SAMPLE alias="MT326090.1" center_name="COVID-19 PubSeq">
    <TITLE>COVID-19 PubSeq Sample</TITLE>
    <SAMPLE_NAME>
      <TAXON_ID>2697049</TAXON_ID>
      <SCIENTIFIC_NAME>Severe acute respiratory syndrome coronavirus 2</SCIENTIFIC_NAME>
      <COMMON_NAME>SARS-CoV-2</COMMON_NAME>
    </SAMPLE_NAME>
    <SAMPLE_ATTRIBUTES>
      <SAMPLE_ATTRIBUTE>
        <TAG>investigation type</TAG>
        <VALUE></VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>sequencing method</TAG>
        <VALUE>http://purl.obolibrary.org/obo/OBI_0000759</VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>collection date</TAG>
        <VALUE>2020-03-21</VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>geographic location (latitude)</TAG>
        <VALUE></VALUE>
     <UNITS>DD</UNITS>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>geographic location (longitude)</TAG>
        <VALUE></VALUE>
     <UNITS>DD</UNITS>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
     <TAG>geographic location (country and/or sea)</TAG>
     <VALUE></VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>geographic location (region and locality)</TAG>
        <VALUE></VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>environment (material)</TAG>
        <VALUE>http://purl.obolibrary.org/obo/NCIT_C155831</VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>ENA-CHECKLIST</TAG>
        <VALUE>ERC000011</VALUE>
      </SAMPLE_ATTRIBUTE>
    </SAMPLE_ATTRIBUTES>
  </SAMPLE>
</SAMPLE_SET>

Configure emacs to run tests

Execute a code block with C-c C-c. You may need to set

(org-babel-do-load-languages
 'org-babel-load-languages
 '((python . t)))
(setq org-babel-python-command "python3")
(setq org-babel-eval-verbose t)
(setq org-confirm-babel-evaluate nil)

To skip confirmations you may also want to set

(setq org-confirm-babel-evaluate nil)

To see output of the interpreter open then Python buffer.