aboutsummaryrefslogtreecommitdiff
path: root/test/rest-api.org
blob: 2ea2b11abc9bd48bba933a8e6d053e31783abffb (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
# C-c C-e h h   publish
# C-c !         insert date (use . for active agenda, C-u C-c ! for date+time, C-u C-c . for time)
# C-c C-t       task rotate
# RSS_IMAGE_URL: http://xxxx.xxxx.free.fr/rss_icon.png
# C-c C-c to run test blocks
#
# This page runs tests and the HTML export doubles as documentation on
# http://covid19.genenetwork.org/apidoc

#+TITLE: PubSeq REST API
#+AUTHOR: Pjotr Prins
#+HTML_LINK_HOME: http://covid19.genenetwork.org/apidoc
# OPTIONS: section-numbers: nil, with-drawers: t

#+HTML_HEAD: <link rel="Blog stylesheet" type="text/css" href="blog.css" />

* PubSeq REST API

Here we document the public REST API that comes with PubSeq. The tests
run in emacs [[https://orgmode.org/worg/org-contrib/babel/languages/ob-doc-python.html][org-babel]].  See the bottom of this document for running
the tests inside emacs. See bottom of the page how to run tests.

** Introduction

We built a REST API for COVID-19 PubSeq. The API source code can be
found in [[https://github.com/arvados/bh20-seq-resource/tree/master/bh20simplewebuploader/api.py][api.py]]. To see if the service is up try

#+begin_src sh
curl http://covid19.genenetwork.org/api/version
#+end_src

#+begin_src js
{
  "service": "PubSeq",
  "version": 0.1
}
#+end_src

The current API can fetch data

#+begin_src js
curl http://covid19.genenetwork.org/api/search?s=MT533203.1
[
  {
    "collection": "http://covid19.genenetwork.org/resource",
    "fasta": "http://covid19.genenetwork.org/resource/lugli-4zz18-uovend31hdwa5ks",
    "id": "MT533203.1",
    "info": "http://identifiers.org/insdc/MT533203.1#sequence"
  }
]

curl http://covid19.genenetwork.org/api/sample/MT533203.1.json
[
  {
    "collection": "http://covid19.genenetwork.org/resource",
    "date": "2020-04-27",
    "fasta": "http://covid19.genenetwork.org/resource/lugli-4zz18-uovend31hdwa5ks",
    "id": "MT533203.1",
    "info": "http://identifiers.org/insdc/MT533203.1#sequence",
    "mapper": "minimap v. 2.17",
    "sequencer": "http://www.ebi.ac.uk/efo/EFO_0008632",
    "specimen": "http://purl.obolibrary.org/obo/NCIT_C155831"
  }
]
#+end_src


The Python3 version is

#+begin_src python :session :exports both
import requests
baseURL="http://localhost:5067" # for development
# baseURL="http://covid19.genenetwork.org"
response = requests.get(baseURL+"/api/version")
response_body = response.json()
assert response_body["service"] == "PubSeq", "PubSeq API not found"
response_body
#+end_src

#+RESULTS:
| service | : | PubSeq | version | : | 0.1 |

** Search for an entry

When you use the search box on PubSeq it queries the REST end point
for information on the search items. For example

#+begin_src python :session :exports both
requests.get(baseURL+"/api/search?s=MT533203.1").json()
#+end_src

#+RESULTS:
| collection | : | http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126 | fasta | : | http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126/sequence.fasta | id | : | MT533203.1 | info | : | http://identifiers.org/insdc/MT533203.1#sequence |

where collection is the raw uploaded data. The hash value in ~c=~ is
computed on the contents of the Arvados keep [[https://doc.arvados.org/v2.0/user/tutorials/tutorial-keep-mount-gnu-linux.html][collection]] and effectively
acts as a deduplication uuid.

** Fetch metadata

Using above collection link you can fetch the metadata in JSON as it
was uploaded originally from the SHeX expression, e.g. using
https://collections.lugli.arvadosapi.com/c=0015b0d65dfd2e82bb3cee4436bf2893+126/

But better to use the more advanced sample metadata fetcher
because is does a bit more in terms of expansion

#+begin_src python :session :exports both
requests.get(baseURL+"/api/sample/MT533203.1.json").json()
#+end_src

#+RESULTS:
| collection | : | http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126 | date | : | 2020-04-27 | fasta | : | http://collections.lugli.arvadosapi.com/c=b16901333ea1754a1e0409bf3caf7d22+126/sequence.fasta | id | : | MT533203.1 | info | : | http://identifiers.org/insdc/MT533203.1#sequence | mapper | : | minimap v. 2.17 | sequencer | : | http://www.ebi.ac.uk/efo/EFO_0008632 | specimen | : | http://purl.obolibrary.org/obo/NCIT_C155831 |



** Fetch EBI XML

PubSeq provides an API that is used to export formats that are
suitable for uploading data to EBI/ENA from our [[http://covid19.genenetwork.org/export][EXPORT]] menu. This is
documented [[http://covid19.genenetwork.org/blog?id=using-covid-19-pubseq-part6][here]].

#+begin_src python :session :exports both
requests.get(baseURL+"/api/ebi/sample-MT326090.1.xml").text
#+end_src

#+RESULTS:
#+begin_example
<?xml version="1.0" encoding="UTF-8"?>
<SAMPLE_SET>
  <SAMPLE alias="MT326090.1" center_name="COVID-19 PubSeq">
    <TITLE>COVID-19 PubSeq Sample</TITLE>
    <SAMPLE_NAME>
      <TAXON_ID>2697049</TAXON_ID>
      <SCIENTIFIC_NAME>Severe acute respiratory syndrome coronavirus 2</SCIENTIFIC_NAME>
      <COMMON_NAME>SARS-CoV-2</COMMON_NAME>
    </SAMPLE_NAME>
    <SAMPLE_ATTRIBUTES>
      <SAMPLE_ATTRIBUTE>
        <TAG>investigation type</TAG>
        <VALUE></VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>sequencing method</TAG>
        <VALUE>http://purl.obolibrary.org/obo/OBI_0000759</VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>collection date</TAG>
        <VALUE>2020-03-21</VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>geographic location (latitude)</TAG>
        <VALUE></VALUE>
     <UNITS>DD</UNITS>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>geographic location (longitude)</TAG>
        <VALUE></VALUE>
     <UNITS>DD</UNITS>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
     <TAG>geographic location (country and/or sea)</TAG>
     <VALUE></VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>geographic location (region and locality)</TAG>
        <VALUE></VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>environment (material)</TAG>
        <VALUE>http://purl.obolibrary.org/obo/NCIT_C155831</VALUE>
      </SAMPLE_ATTRIBUTE>
      <SAMPLE_ATTRIBUTE>
        <TAG>ENA-CHECKLIST</TAG>
        <VALUE>ERC000011</VALUE>
      </SAMPLE_ATTRIBUTE>
    </SAMPLE_ATTRIBUTES>
  </SAMPLE>
</SAMPLE_SET>
#+end_example

* Configure emacs to run tests

Execute a code block with C-c C-c. You may need to set

#+begin_src elisp
  (org-babel-do-load-languages
   'org-babel-load-languages
   '((python . t)))
  (setq org-babel-python-command "python3")
  (setq org-babel-eval-verbose t)
  (setq org-confirm-babel-evaluate nil)
#+end_src

#+RESULTS:

To skip confirmations you may also want to set

: (setq org-confirm-babel-evaluate nil)

To see output of the interpreter open then *Python* buffer.