aboutsummaryrefslogtreecommitdiff
path: root/doc/blog/using-covid-19-pubseq-part1.org
diff options
context:
space:
mode:
authorPjotr Prins2020-07-19 09:11:41 +0100
committerPjotr Prins2020-07-19 09:11:41 +0100
commit7b2d388dbed11384c6a388a5437cca0b8f2914fd (patch)
treef2707c6811948b9c6adc63534ff456266508c109 /doc/blog/using-covid-19-pubseq-part1.org
parent0e4cb2c14b62ed4f39271c6006a99cea954fc688 (diff)
downloadbh20-seq-resource-7b2d388dbed11384c6a388a5437cca0b8f2914fd.tar.gz
bh20-seq-resource-7b2d388dbed11384c6a388a5437cca0b8f2914fd.tar.lz
bh20-seq-resource-7b2d388dbed11384c6a388a5437cca0b8f2914fd.zip
Wiring up export function
Diffstat (limited to 'doc/blog/using-covid-19-pubseq-part1.org')
-rw-r--r--doc/blog/using-covid-19-pubseq-part1.org22
1 files changed, 15 insertions, 7 deletions
diff --git a/doc/blog/using-covid-19-pubseq-part1.org b/doc/blog/using-covid-19-pubseq-part1.org
index 0fd5589..9c8a1c0 100644
--- a/doc/blog/using-covid-19-pubseq-part1.org
+++ b/doc/blog/using-covid-19-pubseq-part1.org
@@ -60,7 +60,6 @@ graph in triples. Soon we will at multi sequence alignments (MSA) and
more. Anyone can contribute data, tools and workflows to this
initiative!
-
* Fetch sequence data
The latest run of the pipeline can be viewed [[https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca][here]]. Each of these
@@ -162,10 +161,11 @@ select (COUNT(distinct ?dataset) as ?num)
}
#+end_src
+Run this [[http://sparql.genenetwork.org/sparql/?default-graph-uri=&query=PREFIX+pubseq%3A+%3Chttp%3A%2F%2Fbiohackathon.org%2Fbh20-seq-schema%23MainSchema%2F%3E%0D%0Aselect+%28COUNT%28distinct+%3Fdataset%29+as+%3Fnum%29%0D%0A%7B%0D%0A+++%3Fdataset+pubseq%3Asubmitter+%3Fid+.%0D%0A+++%3Fid+%3Fp+%3Fsubmitter%0D%0A%7D&format=text%2Fhtml&timeout=0&debug=on&run=+Run+Query+][query]].
* Fetch submitter info and other metadata
-To get dataests with submitters we can do the above
+To get datasets with submitters we can do the above
#+begin_src sql
PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/>
@@ -176,6 +176,8 @@ select distinct ?dataset ?p ?submitter
}
#+end_src
+Run this [[http://sparql.genenetwork.org/sparql/?default-graph-uri=&query=PREFIX+pubseq%3A+%3Chttp%3A%2F%2Fbiohackathon.org%2Fbh20-seq-schema%23MainSchema%2F%3E%0D%0Aselect+distinct+%3Fdataset+%3Fp+%3Fsubmitter%0D%0A%7B%0D%0A+++%3Fdataset+pubseq%3Asubmitter+%3Fid+.%0D%0A+++%3Fid+%3Fp+%3Fsubmitter%0D%0A%7D&format=text%2Fhtml&timeout=0&debug=on&run=+Run+Query+][query]].
+
Tells you one submitter is "Roychoudhury,P.;Greninger,A.;Jerome,K."
with a URL [[http://purl.obolibrary.org/obo/NCIT_C42781][predicate]] (http://purl.obolibrary.org/obo/NCIT_C42781)
explaining "The individual who is responsible for the content of a
@@ -223,6 +225,8 @@ select distinct ?sid ?sample ?p1 ?dataset ?submitter
}
#+end_src
+Run [[http://sparql.genenetwork.org/sparql/?default-graph-uri=&query=%0D%0APREFIX+pubseq%3A+%3Chttp%3A%2F%2Fbiohackathon.org%2Fbh20-seq-schema%23MainSchema%2F%3E%0D%0Aselect+distinct+%3Fsid+%3Fsample+%3Fp1+%3Fdataset+%3Fsubmitter%0D%0A%7B%0D%0A+++%3Fdataset+pubseq%3Asubmitter+%3Fid+.%0D%0A+++%3Fid+%3Fp+%3Fsubmitter+.%0D%0A+++FILTER%28CONTAINS%28%3Fsubmitter%2C%22Roychoudhury%22%29%29+.%0D%0A+++%3Fdataset+pubseq%3Asample+%3Fsid+.%0D%0A+++%3Fsid+%3Fp1+%3Fsample%0D%0A%7D%0D%0A&format=text%2Fhtml&timeout=0&debug=on&run=+Run+Query+][query]].
+
which shows pretty much [[http://sparql.genenetwork.org/sparql/?default-graph-uri=&query=PREFIX+pubseq%3A+%3Chttp%3A%2F%2Fbiohackathon.org%2Fbh20-seq-schema%23MainSchema%2F%3E%0D%0Aselect+distinct+%3Fsid+%3Fsample+%3Fp1+%3Fdataset+%3Fsubmitter%0D%0A%7B%0D%0A+++%3Fdataset+pubseq%3Asubmitter+%3Fid+.%0D%0A+++%3Fid+%3Fp+%3Fsubmitter+.%0D%0A+++FILTER%28CONTAINS%28%3Fsubmitter%2C%22Roychoudhury%22%29%29+.%0D%0A+++%3Fdataset+pubseq%3Asample+%3Fsid+.%0D%0A+++%3Fsid+%3Fp1+%3Fsample%0D%0A%7D&format=text%2Fhtml&timeout=0&debug=on&run=+Run+Query+][everything known]] about their submissions in
this database. Let's focus on one sample "MT326090.1" with predicate
http://semanticscience.org/resource/SIO_000115.
@@ -237,13 +241,16 @@ select distinct ?sample ?p ?o
}
#+end_src
-This [[http://sparql.genenetwork.org/sparql/?default-graph-uri=&query=PREFIX+pubseq%3A+%3Chttp%3A%2F%2Fbiohackathon.org%2Fbh20-seq-schema%23MainSchema%2F%3E%0D%0APREFIX+sio%3A+%3Chttp%3A%2F%2Fsemanticscience.org%2Fresource%2F%3E%0D%0Aselect+distinct+%3Fsample+%3Fp+%3Fo%0D%0A%7B%0D%0A+++%3Fsample+sio%3ASIO_000115+%22MT326090.1%22+.%0D%0A+++%3Fsample+%3Fp+%3Fo+.%0D%0A%7D&format=text%2Fhtml&timeout=0&debug=on&run=+Run+Query+][query]] tells us the sample was submitted "2020-03-21" and
+Run [[http://sparql.genenetwork.org/sparql/?default-graph-uri=&query=%0D%0APREFIX+pubseq%3A+%3Chttp%3A%2F%2Fbiohackathon.org%2Fbh20-seq-schema%23MainSchema%2F%3E%0D%0APREFIX+sio%3A+%3Chttp%3A%2F%2Fsemanticscience.org%2Fresource%2F%3E%0D%0Aselect+distinct+%3Fsample+%3Fp+%3Fo%0D%0A%7B%0D%0A+++%3Fsample+sio%3ASIO_000115+%22MT326090.1%22+.%0D%0A+++%3Fsample+%3Fp+%3Fo+.%0D%0A%7D&format=text%2Fhtml&timeout=0&debug=on&run=+Run+Query+][query]].
+
+This query tells us the sample was submitted "2020-03-21" and
originates from http://www.wikidata.org/entity/Q30, i.e., the USA and
is a biospecimen collected from the back of the throat by swabbing.
-We can track it back to the original GenBank [[http://identifiers.org/insdc/MT326090.1#sequence][submission]].
+We can track it back to the original GenBank [[http://identifiers.org/insdc/MT326090.1#sequence][submission]] using the
+http://identifiers.org/insdc/MT326090.1 link.
We have also added country and label data to make it a bit easier
-to view/query the database.
+to view/query the database and place the sequence on the [[http://covid19.genenetwork.org/][map]].
* Fetch all sequences from Washington state
@@ -258,8 +265,8 @@ select ?seq ?sample
}
#+end_src
-which lists 300 sequences originating from Washington state! Which is almost
-half of the set coming out of GenBank.
+which lists 300 sequences originating from Washington state! Which in
+April was almost half of the set coming out of GenBank.
Likewise to list all sequences from Turkey we can find the wikidata
entity is [[https://www.wikidata.org/wiki/Q43][Q43]]:
@@ -272,6 +279,7 @@ select ?seq ?sample
}
#+end_src
+Run [[http://sparql.genenetwork.org/sparql/?default-graph-uri=&query=%0D%0Aselect+%3Fseq+%3Fsample%0D%0A%7B%0D%0A++++%3Fseq+%3Chttp%3A%2F%2Fbiohackathon.org%2Fbh20-seq-schema%23MainSchema%2Fsample%3E+%3Fsample+.%0D%0A++++%3Fsample+%3Chttp%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FGAZ_00000448%3E+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ43%3E%0D%0A%7D&format=text%2Fhtml&timeout=0&debug=on&run=+Run+Query+][query]].
* Discussion