blob: 6dd66163abfb1be4e6bd3fe01153689e0bde20fa (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
|
# C-c C-e h h publish
# C-c ! insert date (use . for active agenda, C-u C-c ! for date+time, C-u C-c . for time)
# C-c C-t task rotate
# RSS_IMAGE_URL: http://xxxx.xxxx.free.fr/rss_icon.png
#+TITLE: PubSeq REST API
#+AUTHOR: Pjotr Prins
#+HTML_LINK_HOME: http://covid19.genenetwork.org/apidoc
# OPTIONS: section-numbers: nil, with-drawers: t
#+HTML_HEAD: <link rel="Blog stylesheet" type="text/css" href="blog.css" />
* PubSeq REST API
Here we document the public REST API that comes with PubSeq. The tests
run in the amazing emacs [[https://orgmode.org/worg/org-contrib/babel/languages/ob-doc-python.html][org-babel]]. See the bottom of this document
for running the tests inside emacs.
** Introduction
We built a REST API for COVID-19 PubSeq. The API source code can be
found in [[https://github.com/arvados/bh20-seq-resource/tree/master/bh20simplewebuploader/api.py][api.py]]. To see if the service is up try
#+begin_src sh
curl http://covid19.genenetwork.org/api/version
#+end_src
#+begin_src js
{
"service": "PubSeq",
"version": 0.1
}
#+end_src
The Python3 version is
#+begin_src python :session :exports both
import requests
baseURL="http://localhost:5000" # for development
# baseURL="http://covid19.genenetwork.org"
response = requests.get(baseURL+"/api/version")
response_body = response.json()
assert response_body["service"] == "PubSeq", "PubSeq API not found"
response_body
#+end_src
#+RESULTS:
| service | : | PubSeq | version | : | 0.1 |
** Search for an entry
When you use the search box on PubSeq it queries the REST end point
for information on the search items. For example
#+begin_src python :session :exports both
requests.get(baseURL+"/api/search?s=MT533203.1").json()
#+end_src
#+RESULTS:
| collection | : | http://collections.lugli.arvadosapi.com/c=0015b0d65dfd2e82bb3cee4436bf2893+126 | fasta | : | http://collections.lugli.arvadosapi.com/c=0015b0d65dfd2e82bb3cee4436bf2893+126/sequence.fasta | id | : | MT533203.1 | info | : | http://identifiers.org/insdc/MT533203.1#sequence |
where collection is the raw uploaded data. The hash value in ~c=~ is
computed on the contents of the Arvados keep [[https://doc.arvados.org/v2.0/user/tutorials/tutorial-keep-mount-gnu-linux.html][collection]] and effectively
acts as a deduplication uuid.
** Fetch metadata
Using above collection link you can fetch the metadata in JSON as it
was uploaded originally from the SHeX expression, e.g. using
https://collections.lugli.arvadosapi.com/c=0015b0d65dfd2e82bb3cee4436bf2893+126/
But better to use the more advanced sample metadata fetcher
because is does a bit more in terms of expansion
#+begin_src python :session :exports both
requests.get(baseURL+"/api/sample/MT533203.1.json").json()
#+end_src
#+RESULTS:
** Fetch EBI XML
PubSeq provides an API that is used to export formats that are
suitable for uploading data to EBI/ENA from our [[http://covid19.genenetwork.org/export][EXPORT]] menu. This is
documented [[http://covid19.genenetwork.org/blog?id=using-covid-19-pubseq-part6][here]].
#+begin_src python :session :exports both
requests.get(baseURL+"/api/ebi/sample-MT326090.1.xml").text
#+end_src
#+RESULTS:
#+begin_example
<?xml version="1.0" encoding="UTF-8"?>
<SAMPLE_SET>
<SAMPLE alias="MT326090.1" center_name="COVID-19 PubSeq">
<TITLE>COVID-19 PubSeq Sample</TITLE>
<SAMPLE_NAME>
<TAXON_ID>2697049</TAXON_ID>
<SCIENTIFIC_NAME>Severe acute respiratory syndrome coronavirus 2</SCIENTIFIC_NAME>
<COMMON_NAME>SARS-CoV-2</COMMON_NAME>
</SAMPLE_NAME>
<SAMPLE_ATTRIBUTES>
<SAMPLE_ATTRIBUTE>
<TAG>investigation type</TAG>
<VALUE></VALUE>
</SAMPLE_ATTRIBUTE>
<SAMPLE_ATTRIBUTE>
<TAG>sequencing method</TAG>
<VALUE></VALUE>
</SAMPLE_ATTRIBUTE>
<SAMPLE_ATTRIBUTE>
<TAG>collection date</TAG>
<VALUE></VALUE>
</SAMPLE_ATTRIBUTE>
<SAMPLE_ATTRIBUTE>
<TAG>geographic location (latitude)</TAG>
<VALUE></VALUE>
<UNITS>DD</UNITS>
</SAMPLE_ATTRIBUTE>
<SAMPLE_ATTRIBUTE>
<TAG>geographic location (longitude)</TAG>
<VALUE></VALUE>
<UNITS>DD</UNITS>
</SAMPLE_ATTRIBUTE>
<SAMPLE_ATTRIBUTE>
<TAG>geographic location (country and/or sea)</TAG>
<VALUE></VALUE>
</SAMPLE_ATTRIBUTE>
<SAMPLE_ATTRIBUTE>
<TAG>geographic location (region and locality)</TAG>
<VALUE></VALUE>
</SAMPLE_ATTRIBUTE>
<SAMPLE_ATTRIBUTE>
<TAG>environment (material)</TAG>
<VALUE></VALUE>
</SAMPLE_ATTRIBUTE>
<SAMPLE_ATTRIBUTE>
<TAG>ENA-CHECKLIST</TAG>
<VALUE>ERC000011</VALUE>
</SAMPLE_ATTRIBUTE>
</SAMPLE_ATTRIBUTES>
</SAMPLE>
</SAMPLE_SET>
#+end_example
* Configure emacs to run tests
Execute a code
block with C-c C-c. You may need to set
#+begin_src elisp
(org-babel-do-load-languages
'org-babel-load-languages
'((python . t)))
(setq org-babel-python-command "python3")
(setq org-babel-eval-verbose t)
#+end_src
#+RESULTS:
: python3
To skip confirmations you may also want to set
: (setq org-confirm-babel-evaluate nil)
To see output of the inpreter open then *Python* buffer.
|