From bef48abab5e8596703dd825b2d920ea25314d868 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Mon, 26 Oct 2020 10:23:00 +0000 Subject: Update blog --- doc/blog/using-covid-19-pubseq-part1.html | 257 ++++++++++++++++++++---------- 1 file changed, 177 insertions(+), 80 deletions(-) (limited to 'doc/blog/using-covid-19-pubseq-part1.html') diff --git a/doc/blog/using-covid-19-pubseq-part1.html b/doc/blog/using-covid-19-pubseq-part1.html index deeb749..454eeb5 100644 --- a/doc/blog/using-covid-19-pubseq-part1.html +++ b/doc/blog/using-covid-19-pubseq-part1.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - + COVID-19 PubSeq - query metadata (part 1) @@ -40,7 +40,7 @@ } pre.src { position: relative; - overflow: visible; + overflow: auto; padding-top: 1.2em; } pre.src:before { @@ -195,50 +195,26 @@ @@ -248,20 +224,20 @@ for the JavaScript code in this tag.

Table of Contents

-
-

1 What does this mean?

+
+

1 What does this mean?

This means that when someone uploads a SARS-CoV-2 sequence using one @@ -313,11 +289,11 @@ initiative!

-
-

2 Fetch sequence data

+
+

2 Fetch sequence data

-The latest run of the pipeline can be viewed here. Each of these +The latest run of the pipeline can be viewed here. Each of these generated files can just be downloaded for your own use and sharing! Data is published under a Creative Commons 4.0 attribution license (CC-BY-4.0). This means that, unlike some other 'public' resources, @@ -338,8 +314,8 @@ these identifiers throughout.

-
-

3 Predicates

+
+

3 Predicates

To explore an RDF dataset, the first query we can do is open and gets @@ -452,8 +428,8 @@ Run this -

4 Fetch submitter info and other metadata

+
+

4 Fetch submitter info and other metadata

To get datasets with submitters we can do the above @@ -558,26 +534,94 @@ PREFIX sio: <http://semanticscience.org/resource/"> sio: <http://semantics

-Run query. +Run this query.

This query tells us the sample was submitted "2020-03-21" and originates from http://www.wikidata.org/entity/Q30, i.e., the USA and is a biospecimen collected from the back of the throat by swabbing. -We can track it back to the original GenBank submission using the -http://identifiers.org/insdc/MT326090.1 link. +We have also added country and label data to make it a bit easier to +view/query the database and place the sequence on the map. We use +wikidata entities for disambiguation. By using 'Q30' for the USA we +don't have to figure out the different ways people spell the name. To +get from the wikidata entity to a human readable form we provide a +country name translation for convenience. For example when the +predicate is http://purl.obolibrary.org/obo/GAZ_00000448 we can do +

+ + + +

+Which will show the geoname spelled out as 'United States'.

-We have also added country and label data to make it a bit easier -to view/query the database and place the sequence on the map. +For this sample we can also track it back to the original GenBank +submission using the listed http://identifiers.org/insdc/MT326090.1 +link.

-
-

5 Fetch all sequences from Washington state

+ +
+

5 Fetch all sequences from Washington state

Now we know how to get at the origin we can do it the other way round @@ -585,19 +629,72 @@ and fetch all sequences referring to Washington state

-
select ?seq ?sample
+
select ?date ?name ?identifier ?seq
 {
     ?seq <http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample .
-    ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q1223>
-}
+
+    ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q1223> .
+    ?sample <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C25164> ?date .
+    ?sample <http://semanticscience.org/resource/SIO_000115> ?name .
+    ?sample <http://edamontology.org/data_2091"><http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample .
+
+    ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q1223> .
+    ?sample <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C25164> ?date .
+    ?sample <http://semanticscience.org/resource/SIO_000115> ?name .
+    ?sample <http://edamontology.org/data_2091> ?identifier .
+} order by ?date
 

-which lists 300 sequences originating from Washington state! Which in +Run query +

+ +

+Which shows the date and links to NCBI and raw sequence data in FASTA format, +e.g. +

+ +
+"date"  "name"  "identifier"  "seq"
+"2020-01-15"  "MT252760.1"  "http://identifiers.org/insdc/MT252760.1#sequence"  "http://collections.lugli.arvadosapi.com/c=0164784cba5e3e39b7ba8d83fdc92649+126/sequence.fasta"
+"2020-01-15"  "MT252720.1"  "http://identifiers.org/insdc/MT252720.1#sequence"  "http://collections.lugli.arvadosapi.com/c=0387a3e47dd8a0c9ea0a4a21931f6308+126/sequence.fasta"
+(...)
+
+ + +

+The query lists 300 sequences originating from Washington state! Which in April was almost half of the set coming out of GenBank.

@@ -624,8 +721,8 @@ Run -

6 Discussion

+
+

6 Discussion

The public sequence uploader collects sequences, raw data and @@ -636,8 +733,8 @@ referenced in publications and origins are citeable.

-
-

7 Acknowledgements

+
+

7 Acknowledgements

The overall effort was due to magnificent freely donated input by a @@ -652,7 +749,7 @@ Garrison this initiative would not have existed!

-
Created by
Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-08-26 Wed 05:02
. +
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-10-26 Mon 05:21
.
-- cgit v1.2.3