diff options
Diffstat (limited to 'doc/blog')
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part1.html | 192 | ||||
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part4.html | 44 | ||||
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part5.html | 194 | ||||
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part6.html | 393 | ||||
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part6.org | 102 |
5 files changed, 800 insertions, 125 deletions
diff --git a/doc/blog/using-covid-19-pubseq-part1.html b/doc/blog/using-covid-19-pubseq-part1.html index 1959fac..0e6136c 100644 --- a/doc/blog/using-covid-19-pubseq-part1.html +++ b/doc/blog/using-covid-19-pubseq-part1.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> -<!-- 2020-05-29 Fri 12:06 --> +<!-- 2020-07-17 Fri 05:05 --> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <title>COVID-19 PubSeq (part 1)</title> @@ -248,20 +248,20 @@ for the JavaScript code in this tag. <h2>Table of Contents</h2> <div id="text-table-of-contents"> <ul> -<li><a href="#org9afe6ab">1. What does this mean?</a></li> -<li><a href="#orgf4bc3d4">2. Fetch sequence data</a></li> -<li><a href="#org9d7d482">3. Predicates</a></li> -<li><a href="#orgc6046bb">4. Fetch submitter info and other metadata</a></li> -<li><a href="#orgdcb216b">5. Fetch all sequences from Washington state</a></li> -<li><a href="#org7060f51">6. Discussion</a></li> -<li><a href="#orgdc51ccc">7. Acknowledgements</a></li> +<li><a href="#org0db5db0">1. What does this mean?</a></li> +<li><a href="#orge5267fd">2. Fetch sequence data</a></li> +<li><a href="#orgfbd3adc">3. Predicates</a></li> +<li><a href="#org08e70e1">4. Fetch submitter info and other metadata</a></li> +<li><a href="#org9194557">5. Fetch all sequences from Washington state</a></li> +<li><a href="#org76317ad">6. Discussion</a></li> +<li><a href="#orgeb871a1">7. Acknowledgements</a></li> </ul> </div> </div> -<div id="outline-container-org9afe6ab" class="outline-2"> -<h2 id="org9afe6ab"><span class="section-number-2">1</span> What does this mean?</h2> +<div id="outline-container-org0db5db0" class="outline-2"> +<h2 id="org0db5db0"><span class="section-number-2">1</span> What does this mean?</h2> <div class="outline-text-2" id="text-1"> <p> This means that when someone uploads a SARS-CoV-2 sequence using one @@ -274,24 +274,24 @@ expressed in a <a href="https://github.com/arvados/bh20-seq-resource/blob/master type: record fields: host_species: - doc: Host species as defined in NCBITaxon, e.g. http://purl.obolibrary.org/obo/NCBITaxon_<span style="color: #8bc34a;">9606</span> for Homo sapiens + doc: Host species as defined in NCBITaxon, e.g. http://purl.obolibrary.org/obo/NCBITaxon_9606 for Homo sapiens type: string jsonldPredicate: - _id: http://www.ebi.ac.uk/efo/EFO_<span style="color: #8bc34a;">0000532</span> - _type: <span style="color: #9ccc65;">"@id"</span> - noLinkCheck: <span style="color: #8bc34a;">true</span> + _id: http://www.ebi.ac.uk/efo/EFO_0000532 + _type: "@id" + noLinkCheck: true host_sex: - doc: Sex of the host as defined in PATO, expect male <span style="color: #e91e63;">()</span> or female <span style="color: #e91e63;">()</span> + doc: Sex of the host as defined in PATO, expect male () or female () type: string? jsonldPredicate: - _id: http://purl.obolibrary.org/obo/PATO_<span style="color: #8bc34a;">0000047</span> - _type: <span style="color: #9ccc65;">"@id"</span> - noLinkCheck: <span style="color: #8bc34a;">true</span> + _id: http://purl.obolibrary.org/obo/PATO_0000047 + _type: "@id" + noLinkCheck: true host_age: - doc: Age of the host as number <span style="color: #e91e63;">(</span>e.g. <span style="color: #8bc34a;">50</span><span style="color: #e91e63;">)</span> + doc: Age of the host as number (e.g. 50) type: int? jsonldPredicate: - _id: http://purl.obolibrary.org/obo/PATO_<span style="color: #8bc34a;">0000011</span> + _id: http://purl.obolibrary.org/obo/PATO_0000011 </pre> </div> @@ -314,8 +314,8 @@ initiative! </div> -<div id="outline-container-orgf4bc3d4" class="outline-2"> -<h2 id="orgf4bc3d4"><span class="section-number-2">2</span> Fetch sequence data</h2> +<div id="outline-container-orge5267fd" class="outline-2"> +<h2 id="orge5267fd"><span class="section-number-2">2</span> Fetch sequence data</h2> <div class="outline-text-2" id="text-2"> <p> The latest run of the pipeline can be viewed <a href="https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca">here</a>. Each of these @@ -339,8 +339,8 @@ these identifiers throughout. </div> </div> -<div id="outline-container-org9d7d482" class="outline-2"> -<h2 id="org9d7d482"><span class="section-number-2">3</span> Predicates</h2> +<div id="outline-container-orgfbd3adc" class="outline-2"> +<h2 id="orgfbd3adc"><span class="section-number-2">3</span> Predicates</h2> <div class="outline-text-2" id="text-3"> <p> To explore an RDF dataset, the first query we can do is open and gets @@ -350,10 +350,10 @@ the following in a SPARQL end point </p> <div class="org-src-container"> -<pre class="src src-sql"><span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?p -<span style="color: #e91e63;">{</span> +<pre class="src src-sql">select distinct ?p +{ ?o ?p ?s -<span style="color: #e91e63;">}</span> +} </pre> </div> @@ -364,10 +364,10 @@ To get a <a href="http://sparql.genenetwork.org/sparql/?default-graph-uri=&q </p> <div class="org-src-container"> -<pre class="src src-sql"><span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?g -<span style="color: #e91e63;">{</span> - GRAPH ?g <span style="color: #2196F3;">{</span>?s ?p ?o<span style="color: #2196F3;">}</span> -<span style="color: #e91e63;">}</span> +<pre class="src src-sql">select distinct ?g +{ + GRAPH ?g {?s ?p ?o} +} </pre> </div> @@ -383,10 +383,10 @@ To list all submitters, try </p> <div class="org-src-container"> -<pre class="src src-sql"><span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?s -<span style="color: #e91e63;">{</span> - ?o <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/submitter"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/submitter">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/submitter">#MainSchema/submitter></a> ?s -<span style="color: #e91e63;">}</span> +<pre class="src src-sql">select distinct ?s +{ + ?o <http://biohackathon.org/bh20-seq-schema#MainSchema/submitter> ?s +} </pre> </div> @@ -397,11 +397,11 @@ and by </p> <div class="org-src-container"> -<pre class="src src-sql"><span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?s -<span style="color: #e91e63;">{</span> - ?o <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/submitter"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/submitter">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/submitter">#MainSchema/submitter></a> ?id . +<pre class="src src-sql">select distinct ?s +{ + ?o <http://biohackathon.org/bh20-seq-schema#MainSchema/submitter> ?id . ?id ?p ?s -<span style="color: #e91e63;">}</span> +} </pre> </div> @@ -415,12 +415,12 @@ To lift the full URL out of the query you can use a header like </p> <div class="org-src-container"> -<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">#MainSchema/></a> -<span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?dataset ?submitter -<span style="color: #e91e63;">{</span> +<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> +select distinct ?dataset ?submitter +{ ?dataset pubseq:submitter ?id . ?id ?p ?submitter -<span style="color: #e91e63;">}</span> +} </pre> </div> @@ -438,32 +438,32 @@ Now we got this far, lets <a href="http://sparql.genenetwork.org/sparql/?default </p> <div class="org-src-container"> -<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">#MainSchema/></a> -<span style="color: #fff59d;">select</span> <span style="color: #e91e63;">(</span><span style="color: #ff8A65;">COUNT</span><span style="color: #2196F3;">(</span><span style="color: #fff59d;">distinct</span> ?dataset<span style="color: #2196F3;">)</span> <span style="color: #fff59d;">as</span> ?num<span style="color: #e91e63;">)</span> -<span style="color: #e91e63;">{</span> +<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> +select (COUNT(distinct ?dataset) as ?num) +{ ?dataset pubseq:submitter ?id . ?id ?p ?submitter -<span style="color: #e91e63;">}</span> +} </pre> </div> </div> </div> -<div id="outline-container-orgc6046bb" class="outline-2"> -<h2 id="orgc6046bb"><span class="section-number-2">4</span> Fetch submitter info and other metadata</h2> +<div id="outline-container-org08e70e1" class="outline-2"> +<h2 id="org08e70e1"><span class="section-number-2">4</span> Fetch submitter info and other metadata</h2> <div class="outline-text-2" id="text-4"> <p> To get dataests with submitters we can do the above </p> <div class="org-src-container"> -<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">#MainSchema/></a> -<span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?dataset ?p ?submitter -<span style="color: #e91e63;">{</span> +<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> +select distinct ?dataset ?p ?submitter +{ ?dataset pubseq:submitter ?id . ?id ?p ?submitter -<span style="color: #e91e63;">}</span> +} </pre> </div> @@ -480,13 +480,13 @@ Let's focus on one sample with </p> <div class="org-src-container"> -<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">#MainSchema/></a> -<span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?dataset ?submitter -<span style="color: #e91e63;">{</span> +<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> +select distinct ?dataset ?submitter +{ ?dataset pubseq:submitter ?id . ?id ?p ?submitter . - FILTER<span style="color: #2196F3;">(</span><span style="color: #fff59d;">CONTAINS</span><span style="color: #EF6C00;">(</span>?submitter,"Roychoudhury"<span style="color: #EF6C00;">)</span><span style="color: #2196F3;">)</span> . -<span style="color: #e91e63;">}</span> + FILTER(CONTAINS(?submitter,"Roychoudhury")) . +} </pre> </div> @@ -496,12 +496,12 @@ see if we can get a sample ID by listing sample predicates </p> <div class="org-src-container"> -<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">#MainSchema/></a> -<span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?p -<span style="color: #e91e63;">{</span> +<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> +select distinct ?p +{ ?dataset ?p ?o . ?dataset pubseq:submitter ?id . -<span style="color: #e91e63;">}</span> +} </pre> </div> @@ -513,15 +513,15 @@ Let's zoom in on those of Roychoudhury with <div class="org-src-container"> -<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">#MainSchema/></a> -<span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?sid ?sample ?p1 ?dataset ?submitter -<span style="color: #e91e63;">{</span> +<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> +select distinct ?sid ?sample ?p1 ?dataset ?submitter +{ ?dataset pubseq:submitter ?id . ?id ?p ?submitter . - FILTER<span style="color: #2196F3;">(</span><span style="color: #fff59d;">CONTAINS</span><span style="color: #EF6C00;">(</span>?submitter,"Roychoudhury"<span style="color: #EF6C00;">)</span><span style="color: #2196F3;">)</span> . + FILTER(CONTAINS(?submitter,"Roychoudhury")) . ?dataset pubseq:sample ?sid . ?sid ?p1 ?sample -<span style="color: #e91e63;">}</span> +} </pre> </div> @@ -532,18 +532,13 @@ this database. Let's focus on one sample "MT326090.1" with predicate </p> <div class="org-src-container"> -<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/> -PREFIX sio: <http://semanticscience.org/resource/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/> -PREFIX sio: <http://semanticscience.org/resource/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/> -PREFIX sio: <http://semanticscience.org/resource/">#MainSchema/> -</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/> -PREFIX sio: <http://semanticscience.org/resource/">PREFIX</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/> -PREFIX sio: <http://semanticscience.org/resource/"> sio: <http://semanticscience.org/resource/></a> -<span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?sample ?p ?o -<span style="color: #e91e63;">{</span> +<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> +PREFIX sio: <http://semanticscience.org/resource/> +select distinct ?sample ?p ?o +{ ?sample sio:SIO_000115 "MT326090.1" . ?sample ?p ?o . -<span style="color: #e91e63;">}</span> +} </pre> </div> @@ -561,8 +556,8 @@ to view/query the database. </div> </div> -<div id="outline-container-orgdcb216b" class="outline-2"> -<h2 id="orgdcb216b"><span class="section-number-2">5</span> Fetch all sequences from Washington state</h2> +<div id="outline-container-org9194557" class="outline-2"> +<h2 id="org9194557"><span class="section-number-2">5</span> Fetch all sequences from Washington state</h2> <div class="outline-text-2" id="text-5"> <p> Now we know how to get at the origin we can do it the other way round @@ -570,15 +565,11 @@ and fetch all sequences referring to Washington state </p> <div class="org-src-container"> -<pre class="src src-sql"> -<span style="color: #fff59d;">select</span> ?seq ?sample -<span style="color: #e91e63;">{</span> - ?seq <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample . - ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q1223"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample . - ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q1223">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample . - ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q1223">#MainSchema/sample> ?sample . - ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q1223></a> -<span style="color: #e91e63;">}</span> +<pre class="src src-sql">select ?seq ?sample +{ + ?seq <http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample . + ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q1223> +} </pre> </div> @@ -586,11 +577,26 @@ and fetch all sequences referring to Washington state which lists 300 sequences originating from Washington state! Which is almost half of the set coming out of GenBank. </p> + +<p> +Likewise to list all sequences from Turkey we can find the wikidata +entity is <a href="https://www.wikidata.org/wiki/Q43">Q43</a>: +</p> + +<div class="org-src-container"> +<pre class="src src-sql">select ?seq ?sample +{ + ?seq <http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample . + ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q43> +} +</pre> </div> </div> +</div> + -<div id="outline-container-org7060f51" class="outline-2"> -<h2 id="org7060f51"><span class="section-number-2">6</span> Discussion</h2> +<div id="outline-container-org76317ad" class="outline-2"> +<h2 id="org76317ad"><span class="section-number-2">6</span> Discussion</h2> <div class="outline-text-2" id="text-6"> <p> The public sequence uploader collects sequences, raw data and @@ -601,8 +607,8 @@ referenced in publications and origins are citeable. </div> </div> -<div id="outline-container-orgdc51ccc" class="outline-2"> -<h2 id="orgdc51ccc"><span class="section-number-2">7</span> Acknowledgements</h2> +<div id="outline-container-orgeb871a1" class="outline-2"> +<h2 id="orgeb871a1"><span class="section-number-2">7</span> Acknowledgements</h2> <div class="outline-text-2" id="text-7"> <p> The overall effort was due to magnificent freely donated input by a @@ -617,7 +623,7 @@ Garrison this initiative would not have existed! </div> </div> <div id="postamble" class="status"> -<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-05-29 Fri 12:06</small>. +<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-07-17 Fri 05:02</small>. </div> </body> </html> diff --git a/doc/blog/using-covid-19-pubseq-part4.html b/doc/blog/using-covid-19-pubseq-part4.html index b5a05ca..c975c21 100644 --- a/doc/blog/using-covid-19-pubseq-part4.html +++ b/doc/blog/using-covid-19-pubseq-part4.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> -<!-- 2020-07-12 Sun 06:24 --> +<!-- 2020-07-17 Fri 05:04 --> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <title>COVID-19 PubSeq (part 4)</title> @@ -161,6 +161,19 @@ .footdef { margin-bottom: 1em; } .figure { padding: 1em; } .figure p { text-align: center; } + .equation-container { + display: table; + text-align: center; + width: 100%; + } + .equation { + vertical-align: middle; + } + .equation-label { + display: table-cell; + text-align: right; + vertical-align: middle; + } .inlinetask { padding: 10px; border: 2px solid gray; @@ -186,7 +199,7 @@ @licstart The following is the entire license notice for the JavaScript code in this tag. -Copyright (C) 2012-2018 Free Software Foundation, Inc. +Copyright (C) 2012-2020 Free Software Foundation, Inc. The JavaScript code in this tag is free software: you can redistribute it and/or modify it under the terms of the GNU @@ -235,15 +248,16 @@ for the JavaScript code in this tag. <h2>Table of Contents</h2> <div id="text-table-of-contents"> <ul> -<li><a href="#org8f8b64a">1. What does this mean?</a></li> -<li><a href="#orgcc7a403">2. Modify Workflow</a></li> +<li><a href="#orgc2ee09f">1. What does this mean?</a></li> +<li><a href="#org0d37881">2. Where can I find the workflows?</a></li> +<li><a href="#orgddb0531">3. Modify Workflow</a></li> </ul> </div> </div> -<div id="outline-container-org8f8b64a" class="outline-2"> -<h2 id="org8f8b64a"><span class="section-number-2">1</span> What does this mean?</h2> +<div id="outline-container-orgc2ee09f" class="outline-2"> +<h2 id="orgc2ee09f"><span class="section-number-2">1</span> What does this mean?</h2> <div class="outline-text-2" id="text-1"> <p> This means that when someone uploads a SARS-CoV-2 sequence using one @@ -253,18 +267,28 @@ which triggers a rerun of our workflows. </div> </div> - -<div id="outline-container-orgcc7a403" class="outline-2"> -<h2 id="orgcc7a403"><span class="section-number-2">2</span> Modify Workflow</h2> +<div id="outline-container-org0d37881" class="outline-2"> +<h2 id="org0d37881"><span class="section-number-2">2</span> Where can I find the workflows?</h2> <div class="outline-text-2" id="text-2"> <p> +Workflows are written in the common workflow language (CWL) and listed +on <a href="https://github.com/arvados/bh20-seq-resource/tree/master/workflows">github</a>. PubSeq being an open project these workflows can be studied +and modified! +</p> +</div> +</div> + +<div id="outline-container-orgddb0531" class="outline-2"> +<h2 id="orgddb0531"><span class="section-number-2">3</span> Modify Workflow</h2> +<div class="outline-text-2" id="text-3"> +<p> <i>Work in progress!</i> </p> </div> </div> </div> <div id="postamble" class="status"> -<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-07-12 Sun 06:24</small>. +<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-07-17 Fri 01:47</small>. </div> </body> </html> diff --git a/doc/blog/using-covid-19-pubseq-part5.html b/doc/blog/using-covid-19-pubseq-part5.html index 80bf559..4caa5ac 100644 --- a/doc/blog/using-covid-19-pubseq-part5.html +++ b/doc/blog/using-covid-19-pubseq-part5.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> -<!-- 2020-07-12 Sun 06:24 --> +<!-- 2020-07-17 Fri 05:03 --> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <title>COVID-19 PubSeq (part 4)</title> @@ -161,6 +161,19 @@ .footdef { margin-bottom: 1em; } .figure { padding: 1em; } .figure p { text-align: center; } + .equation-container { + display: table; + text-align: center; + width: 100%; + } + .equation { + vertical-align: middle; + } + .equation-label { + display: table-cell; + text-align: right; + vertical-align: middle; + } .inlinetask { padding: 10px; border: 2px solid gray; @@ -186,7 +199,7 @@ @licstart The following is the entire license notice for the JavaScript code in this tag. -Copyright (C) 2012-2018 Free Software Foundation, Inc. +Copyright (C) 2012-2020 Free Software Foundation, Inc. The JavaScript code in this tag is free software: you can redistribute it and/or modify it under the terms of the GNU @@ -235,38 +248,40 @@ for the JavaScript code in this tag. <h2>Table of Contents</h2> <div id="text-table-of-contents"> <ul> -<li><a href="#org871ad58">1. Modify Metadata</a></li> -<li><a href="#org07e8755">2. What is the schema?</a></li> -<li><a href="#org4857280">3. How is the website generated?</a></li> -<li><a href="#orge709ae2">4. Modifying the schema</a></li> +<li><a href="#org758b923">1. Modify Metadata</a></li> +<li><a href="#orgec32c13">2. What is the schema?</a></li> +<li><a href="#org2e487b2">3. How is the website generated?</a></li> +<li><a href="#orge4dfe84">4. Modifying the schema</a></li> +<li><a href="#org564a7a8">5. Adding fields to the form</a></li> +<li><a href="#org633781a">6. <span class="todo TODO">TODO</span> Testing the license fields</a></li> </ul> </div> </div> -<div id="outline-container-org871ad58" class="outline-2"> -<h2 id="org871ad58"><span class="section-number-2">1</span> Modify Metadata</h2> +<div id="outline-container-org758b923" class="outline-2"> +<h2 id="org758b923"><span class="section-number-2">1</span> Modify Metadata</h2> <div class="outline-text-2" id="text-1"> <p> The public sequence resource uses multiple data formats listed on the -<a href="./download">DOWNLOAD</a> page. One of the most exciting features is the full support +<a href="http://covid19.genenetwork.org/download">download</a> page. One of the most exciting features is the full support for RDF and semantic web/linked data ontologies. This technology allows for querying data in unprescribed ways - that is, you can formulate your own queries without dealing with a preset model of that data (so typical of CSV files and SQL tables). Examples of exploring -data are listed <a href="./blog?id=using-covid-19-pubseq-part1">here</a>. +data are listed <a href="http://covid19.genenetwork.org/blog?id=using-covid-19-pubseq-part1">here</a>. </p> <p> In this BLOG we are going to look at the metadata entered on the -<a href="./">COVID-19 PubSeq</a> website (or command line client). It is important to +COVID-19 PubSeq website (or command line client). It is important to understand that anyone, including you, can change that information! </p> </div> </div> -<div id="outline-container-org07e8755" class="outline-2"> -<h2 id="org07e8755"><span class="section-number-2">2</span> What is the schema?</h2> +<div id="outline-container-orgec32c13" class="outline-2"> +<h2 id="orgec32c13"><span class="section-number-2">2</span> What is the schema?</h2> <div class="outline-text-2" id="text-2"> <p> The default metadata schema is listed <a href="https://github.com/arvados/bh20-seq-resource/blob/master/bh20sequploader/bh20seq-schema.yml">here</a>. @@ -274,8 +289,8 @@ The default metadata schema is listed <a href="https://github.com/arvados/bh20-s </div> </div> -<div id="outline-container-org4857280" class="outline-2"> -<h2 id="org4857280"><span class="section-number-2">3</span> How is the website generated?</h2> +<div id="outline-container-org2e487b2" class="outline-2"> +<h2 id="org2e487b2"><span class="section-number-2">3</span> How is the website generated?</h2> <div class="outline-text-2" id="text-3"> <p> Using the schema we use <a href="https://pypi.org/project/PyShEx/">pyshex</a> shex expressions and <a href="https://github.com/common-workflow-language/schema_salad">schema salad</a> to @@ -285,13 +300,13 @@ All from that one metadata schema. </div> </div> -<div id="outline-container-orge709ae2" class="outline-2"> -<h2 id="orge709ae2"><span class="section-number-2">4</span> Modifying the schema</h2> +<div id="outline-container-orge4dfe84" class="outline-2"> +<h2 id="orge4dfe84"><span class="section-number-2">4</span> Modifying the schema</h2> <div class="outline-text-2" id="text-4"> <p> -One of the first things we wanted to do is to add a field for the data -license. Initially we only support CC-4.0 as a license by default, but -now we want to give uploaders the option to make it an even more +One of the first things we want to do is to add a field for the data +license. Initially we only supported CC-4.0 as a license, but +we wanted to give uploaders the option to use an even more liberal CC0 license. The first step is to find a good ontology term for the field. Searching for `creative commons cc0 rdf' rendered this useful <a href="https://creativecommons.org/ns">page</a>. We also find an <a href="https://wiki.creativecommons.org/wiki/CC_License_Rdf_Overview">overview</a> where CC0 is represented as URI @@ -302,13 +317,148 @@ attributionName and attributionURL. </p> <p> -<i>Note: work in progress</i> +A minimal triple should be +</p> + +<pre class="example"> +id xhtml:license <http://creativecommons.org/licenses/by/4.0/> . +</pre> + + +<p> +Other suggestions are +</p> + +<pre class="example"> +id dc:title "Description" . +id cc:attributionName "Your Name" . +id cc:attributionURL <http://resource.org/id> +</pre> + + +<p> +and 'dc:source' which indicates the original source of any modified +work, specified as a URI. +The prefix 'cc:' is an abbreviation for <a href="http://creativecommons.org/ns">http://creativecommons.org/ns</a>#. +</p> + +<p> +Going back to the schema, where does it fit? Under host, sample, +virus, technology or submitter block? It could fit under sample, but +actually the license concerns the whole metadata block and sequence, +so I think we can fit under its own license tag. For example +</p> + + +<p> +id: placeholder +</p> + +<pre class="example"> +license: + license_type: http://creativecommons.org/licenses/by/4.0/ + attribution_title: "Sample ID" + attribution_name: "John doe, Joe Boe, Jonny Oe" + attribution_url: http://covid19.genenetwork.org/id + attribution_source: https://www.ncbi.nlm.nih.gov/pubmed/323088888 +</pre> + + +<p> +So, let's update the example. Notice the license info is optional - if it is missing +we just assume the default CC-4.0. +</p> + +<p> +One thing that is interesting is that in the name space <a href="https://creativecommons.org/ns">https://creativecommons.org/ns</a> there +is no mention of a title. I think it is useful, however, because we have no such field. +So, we'll add it simply as a title field. Now the draft schema is </p> + +<div class="org-src-container"> +<pre class="src src-js">- name: licenseSchema + type: record + fields: + license_type: + doc: License types as refined in https://wiki.creativecommons.org/images/d/d6/Ccrel-1.0.pdf + type: string? + jsonldPredicate: + _id: https://creativecommons.org/ns#License + title: + doc: Attribution title related to license + type: string? + jsonldPredicate: + _id: http://semanticscience.org/resource/SIO_001167 + attribution_url: + doc: Attribution URL related to license + type: string? + jsonldPredicate: + _id: https://creativecommons.org/ns#Work + attribution_source: + doc: Attribution source URL + type: string? + jsonldPredicate: + _id: https://creativecommons.org/ns#Work +</pre> +</div> + +<p> +Now, we are no ontology experts, right? So, next we submit a patch to +our source tree and ask for feedback before wiring it up in the data +entry form. The pull request was submitted <a href="https://github.com/arvados/bh20-seq-resource/pull/97">here</a> and reviewed on the +gitter channel and I merged it. +</p> +</div> </div> + +<div id="outline-container-org564a7a8" class="outline-2"> +<h2 id="org564a7a8"><span class="section-number-2">5</span> Adding fields to the form</h2> +<div class="outline-text-2" id="text-5"> +<p> +To add the new fields to the form we have to modify it a little. If we +go to the upload form we need to add the license box. The schema is +loaded in <a href="https://github.com/arvados/bh20-seq-resource/blob/a0c8ebd57b875f265e8b0efec4abfaf892eb6c45/bh20simplewebuploader/main.py#L229">main.py</a> in the 'generate<sub>form</sub>' function. +</p> + +<p> +With this <a href="https://github.com/arvados/bh20-seq-resource/commit/b9691c7deae30bd6422fb7b0681572b7b6f78ae3">patch</a> the website adds the license input fields on the form. +</p> + +<p> +Finally, to make RDF output work we need to add expressions to bh20seq-shex.rdf. This +was done with this <a href="https://github.com/arvados/bh20-seq-resource/commit/f4ed46dae20abe5147871495ede2d6ac2b0854bc">patch</a>. In the end we decided to use the Dublin core title, +<a href="http://purl.org/metadata/dublin_core_elements#Title">http://purl.org/metadata/dublin_core_elements#Title</a>: +</p> + +<div class="org-src-container"> +<pre class="src src-js">:licenseShape{ + cc:License xsd:string; + dc:Title xsd:string ?; + cc:attributionName xsd:string ?; + cc:attributionURL xsd:string ?; + cc:attributionSource xsd:string ?; +} +</pre> +</div> + +<p> +Note that cc:AttributionSource is not really defined in the cc standard. +</p> + +<p> +When pushing the license info we discovered the workflow broke because +the existing data had no licensing info. So we changed the license +field to be optional - a missing license assumes it is CC-BY-4.0. +</p> +</div> +</div> + +<div id="outline-container-org633781a" class="outline-2"> +<h2 id="org633781a"><span class="section-number-2">6</span> <span class="todo TODO">TODO</span> Testing the license fields</h2> </div> </div> <div id="postamble" class="status"> -<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-07-12 Sun 06:24</small>. +<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-07-16 Thu 03:27</small>. </div> </body> </html> diff --git a/doc/blog/using-covid-19-pubseq-part6.html b/doc/blog/using-covid-19-pubseq-part6.html new file mode 100644 index 0000000..278abe8 --- /dev/null +++ b/doc/blog/using-covid-19-pubseq-part6.html @@ -0,0 +1,393 @@ +<?xml version="1.0" encoding="utf-8"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" +"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> +<head> +<!-- 2020-07-17 Fri 06:05 --> +<meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> +<meta name="viewport" content="width=device-width, initial-scale=1" /> +<title>COVID-19 PubSeq (part 6)</title> +<meta name="generator" content="Org mode" /> +<meta name="author" content="Pjotr Prins" /> +<style type="text/css"> + <!--/*--><![CDATA[/*><!--*/ + .title { text-align: center; + margin-bottom: .2em; } + .subtitle { text-align: center; + font-size: medium; + font-weight: bold; + margin-top:0; } + .todo { font-family: monospace; color: red; } + .done { font-family: monospace; color: green; } + .priority { font-family: monospace; color: orange; } + .tag { background-color: #eee; font-family: monospace; + padding: 2px; font-size: 80%; font-weight: normal; } + .timestamp { color: #bebebe; } + .timestamp-kwd { color: #5f9ea0; } + .org-right { margin-left: auto; margin-right: 0px; text-align: right; } + .org-left { margin-left: 0px; margin-right: auto; text-align: left; } + .org-center { margin-left: auto; margin-right: auto; text-align: center; } + .underline { text-decoration: underline; } + #postamble p, #preamble p { font-size: 90%; margin: .2em; } + p.verse { margin-left: 3%; } + pre { + border: 1px solid #ccc; + box-shadow: 3px 3px 3px #eee; + padding: 8pt; + font-family: monospace; + overflow: auto; + margin: 1.2em; + } + pre.src { + position: relative; + overflow: visible; + padding-top: 1.2em; + } + pre.src:before { + display: none; + position: absolute; + background-color: white; + top: -10px; + right: 10px; + padding: 3px; + border: 1px solid black; + } + pre.src:hover:before { display: inline;} + /* Languages per Org manual */ + pre.src-asymptote:before { content: 'Asymptote'; } + pre.src-awk:before { content: 'Awk'; } + pre.src-C:before { content: 'C'; } + /* pre.src-C++ doesn't work in CSS */ + pre.src-clojure:before { content: 'Clojure'; } + pre.src-css:before { content: 'CSS'; } + pre.src-D:before { content: 'D'; } + pre.src-ditaa:before { content: 'ditaa'; } + pre.src-dot:before { content: 'Graphviz'; } + pre.src-calc:before { content: 'Emacs Calc'; } + pre.src-emacs-lisp:before { content: 'Emacs Lisp'; } + pre.src-fortran:before { content: 'Fortran'; } + pre.src-gnuplot:before { content: 'gnuplot'; } + pre.src-haskell:before { content: 'Haskell'; } + pre.src-hledger:before { content: 'hledger'; } + pre.src-java:before { content: 'Java'; } + pre.src-js:before { content: 'Javascript'; } + pre.src-latex:before { content: 'LaTeX'; } + pre.src-ledger:before { content: 'Ledger'; } + pre.src-lisp:before { content: 'Lisp'; } + pre.src-lilypond:before { content: 'Lilypond'; } + pre.src-lua:before { content: 'Lua'; } + pre.src-matlab:before { content: 'MATLAB'; } + pre.src-mscgen:before { content: 'Mscgen'; } + pre.src-ocaml:before { content: 'Objective Caml'; } + pre.src-octave:before { content: 'Octave'; } + pre.src-org:before { content: 'Org mode'; } + pre.src-oz:before { content: 'OZ'; } + pre.src-plantuml:before { content: 'Plantuml'; } + pre.src-processing:before { content: 'Processing.js'; } + pre.src-python:before { content: 'Python'; } + pre.src-R:before { content: 'R'; } + pre.src-ruby:before { content: 'Ruby'; } + pre.src-sass:before { content: 'Sass'; } + pre.src-scheme:before { content: 'Scheme'; } + pre.src-screen:before { content: 'Gnu Screen'; } + pre.src-sed:before { content: 'Sed'; } + pre.src-sh:before { content: 'shell'; } + pre.src-sql:before { content: 'SQL'; } + pre.src-sqlite:before { content: 'SQLite'; } + /* additional languages in org.el's org-babel-load-languages alist */ + pre.src-forth:before { content: 'Forth'; } + pre.src-io:before { content: 'IO'; } + pre.src-J:before { content: 'J'; } + pre.src-makefile:before { content: 'Makefile'; } + pre.src-maxima:before { content: 'Maxima'; } + pre.src-perl:before { content: 'Perl'; } + pre.src-picolisp:before { content: 'Pico Lisp'; } + pre.src-scala:before { content: 'Scala'; } + pre.src-shell:before { content: 'Shell Script'; } + pre.src-ebnf2ps:before { content: 'ebfn2ps'; } + /* additional language identifiers per "defun org-babel-execute" + in ob-*.el */ + pre.src-cpp:before { content: 'C++'; } + pre.src-abc:before { content: 'ABC'; } + pre.src-coq:before { content: 'Coq'; } + pre.src-groovy:before { content: 'Groovy'; } + /* additional language identifiers from org-babel-shell-names in + ob-shell.el: ob-shell is the only babel language using a lambda to put + the execution function name together. */ + pre.src-bash:before { content: 'bash'; } + pre.src-csh:before { content: 'csh'; } + pre.src-ash:before { content: 'ash'; } + pre.src-dash:before { content: 'dash'; } + pre.src-ksh:before { content: 'ksh'; } + pre.src-mksh:before { content: 'mksh'; } + pre.src-posh:before { content: 'posh'; } + /* Additional Emacs modes also supported by the LaTeX listings package */ + pre.src-ada:before { content: 'Ada'; } + pre.src-asm:before { content: 'Assembler'; } + pre.src-caml:before { content: 'Caml'; } + pre.src-delphi:before { content: 'Delphi'; } + pre.src-html:before { content: 'HTML'; } + pre.src-idl:before { content: 'IDL'; } + pre.src-mercury:before { content: 'Mercury'; } + pre.src-metapost:before { content: 'MetaPost'; } + pre.src-modula-2:before { content: 'Modula-2'; } + pre.src-pascal:before { content: 'Pascal'; } + pre.src-ps:before { content: 'PostScript'; } + pre.src-prolog:before { content: 'Prolog'; } + pre.src-simula:before { content: 'Simula'; } + pre.src-tcl:before { content: 'tcl'; } + pre.src-tex:before { content: 'TeX'; } + pre.src-plain-tex:before { content: 'Plain TeX'; } + pre.src-verilog:before { content: 'Verilog'; } + pre.src-vhdl:before { content: 'VHDL'; } + pre.src-xml:before { content: 'XML'; } + pre.src-nxml:before { content: 'XML'; } + /* add a generic configuration mode; LaTeX export needs an additional + (add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */ + pre.src-conf:before { content: 'Configuration File'; } + + table { border-collapse:collapse; } + caption.t-above { caption-side: top; } + caption.t-bottom { caption-side: bottom; } + td, th { vertical-align:top; } + th.org-right { text-align: center; } + th.org-left { text-align: center; } + th.org-center { text-align: center; } + td.org-right { text-align: right; } + td.org-left { text-align: left; } + td.org-center { text-align: center; } + dt { font-weight: bold; } + .footpara { display: inline; } + .footdef { margin-bottom: 1em; } + .figure { padding: 1em; } + .figure p { text-align: center; } + .equation-container { + display: table; + text-align: center; + width: 100%; + } + .equation { + vertical-align: middle; + } + .equation-label { + display: table-cell; + text-align: right; + vertical-align: middle; + } + .inlinetask { + padding: 10px; + border: 2px solid gray; + margin: 10px; + background: #ffffcc; + } + #org-div-home-and-up + { text-align: right; font-size: 70%; white-space: nowrap; } + textarea { overflow-x: auto; } + .linenr { font-size: smaller } + .code-highlighted { background-color: #ffff00; } + .org-info-js_info-navigation { border-style: none; } + #org-info-js_console-label + { font-size: 10px; font-weight: bold; white-space: nowrap; } + .org-info-js_search-highlight + { background-color: #ffff00; color: #000000; font-weight: bold; } + .org-svg { width: 90%; } + /*]]>*/--> +</style> +<link rel="Blog stylesheet" type="text/css" href="blog.css" /> +<script type="text/javascript"> +/* +@licstart The following is the entire license notice for the +JavaScript code in this tag. + +Copyright (C) 2012-2020 Free Software Foundation, Inc. + +The JavaScript code in this tag is free software: you can +redistribute it and/or modify it under the terms of the GNU +General Public License (GNU GPL) as published by the Free Software +Foundation, either version 3 of the License, or (at your option) +any later version. The code is distributed WITHOUT ANY WARRANTY; +without even the implied warranty of MERCHANTABILITY or FITNESS +FOR A PARTICULAR PURPOSE. See the GNU GPL for more details. + +As additional permission under GNU GPL version 3 section 7, you +may distribute non-source (e.g., minimized or compacted) forms of +that code without the copy of the GNU GPL normally required by +section 4, provided you include this license notice and a URL +through which recipients can access the Corresponding Source. + + +@licend The above is the entire license notice +for the JavaScript code in this tag. +*/ +<!--/*--><![CDATA[/*><!--*/ + function CodeHighlightOn(elem, id) + { + var target = document.getElementById(id); + if(null != target) { + elem.cacheClassElem = elem.className; + elem.cacheClassTarget = target.className; + target.className = "code-highlighted"; + elem.className = "code-highlighted"; + } + } + function CodeHighlightOff(elem, id) + { + var target = document.getElementById(id); + if(elem.cacheClassElem) + elem.className = elem.cacheClassElem; + if(elem.cacheClassTarget) + target.className = elem.cacheClassTarget; + } +/*]]>*///--> +</script> +</head> +<body> +<div id="content"> +<h1 class="title">COVID-19 PubSeq (part 6)</h1> +<div id="table-of-contents"> +<h2>Table of Contents</h2> +<div id="text-table-of-contents"> +<ul> +<li><a href="#orge6aea9e">1. Generating output for EBI</a></li> +<li><a href="#org95e5e17">2. Defining the EBI study</a></li> +<li><a href="#org9181a73">3. Define the EBI sample</a></li> +<li><a href="#orga29cad0">4. Define the EBI sequence</a></li> +</ul> +</div> +</div> + + +<div id="outline-container-orge6aea9e" class="outline-2"> +<h2 id="orge6aea9e"><span class="section-number-2">1</span> Generating output for EBI</h2> +<div class="outline-text-2" id="text-1"> +<p> +Would it not be great an uploader to PubSeq also can export samples +to, say, EBI? That is what we discuss in this section. The submission +process is somewhat laborious and when you have submitted to PubSeq +why not export the same to EBI too with the least amount of effort? +</p> + +<p> +COVID-19 PubSeq is a data source - both sequence data and metadata - +that can be used to push data to other sources, such as EBI. You can +register <a href="https://ena-docs.readthedocs.io/en/latest/submit/samples/programmatic.html">samples programmatically</a> with a specific XML interface. Note +that (at this point) if you want to submit a sequence (FASTA) it can +only be done through the <a href="https://ena-docs.readthedocs.io/en/latest/submit/general-guide/webin-cli.html">Webin-CLI</a>. Raw data (FASTQ) can go through +the XML interface. +</p> + +<p> +EBI sequence resources are presented through ENA. For example +<a href="https://www.ebi.ac.uk/ena/browser/view/MT394864">Sequence: MT394864.1</a>. +</p> + +<p> +EBI has XML Formats for +</p> + +<ul class="org-ul"> +<li>SUBMISSION</li> +<li>STUDY</li> +<li>SAMPLE</li> +<li>EXPERIMENT</li> +<li>RUN</li> +<li>ANALYSIS</li> +<li>DAC</li> +<li>POLICY</li> +<li>DATASET</li> +<li>PROJECT</li> +</ul> + +<p> +with the schemas listed <a href="ftp://ftp.ebi.ac.uk/pub/databases/ena/doc/xsd/sra_1_5/">here</a>. Since we are submitting sequences we +should follow submitting <a href="https://ena-docs.readthedocs.io/en/latest/submit/assembly.html">full genome assembly guidelines</a> and +<a href="https://ena-docs.readthedocs.io/en/latest/submit/general-guide/programmatic.html">ENA guidelines</a>. The first step is to define the study, next the sample +and finally the sequence (assembly). +</p> +</div> +</div> + +<div id="outline-container-org95e5e17" class="outline-2"> +<h2 id="org95e5e17"><span class="section-number-2">2</span> Defining the EBI study</h2> +<div class="outline-text-2" id="text-2"> +<p> +A study is defined <a href="https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html">here</a> and looks like +</p> + +<div class="org-src-container"> +<pre class="src src-xml"><PROJECT_SET> + <PROJECT alias="COVID-19 Washington DC"> + <TITLE>Sequencing SARS-CoV-2 in the Washington DC area</TITLE> + <DESCRIPTION>This study collects samples from COVID-19 patients in the Washington DC area</DESCRIPTION> + <SUBMISSION_PROJECT> + <SEQUENCING_PROJECT/> + </SUBMISSION_PROJECT> + </PROJECT> +</PROJECT_SET> +</pre> +</div> + +<p> +also a submission 'command' is required looking like +</p> + +<div class="org-src-container"> +<pre class="src src-xml"><SUBMISSION> + <ACTIONS> + <ACTION> + <ADD/> + </ACTION> + <ACTION> + <HOLD HoldUntilDate="TODO: release date"/> + </ACTION> + </ACTIONS> +</SUBMISSION> + +</pre> +</div> + +<p> +The webin system accepts such sources using a command like +</p> + +<pre class="example"> +curl -u username:password -F "SUBMISSION=@submission.xml" \ + -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" +</pre> + + +<p> +as described <a href="https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html#submit-the-xmls-using-curl">here</a>. Note that this is the test server. For the final +version use www.ebi.ac.uk instead of wwwdev.ebi.ac.uk. You may also +need the –insecure switch to circumvent certificate checking. +</p> + +<p> +<i>work in progress (WIP)</i> +</p> +</div> +</div> + +<div id="outline-container-org9181a73" class="outline-2"> +<h2 id="org9181a73"><span class="section-number-2">3</span> Define the EBI sample</h2> +<div class="outline-text-2" id="text-3"> +<p> +<i>work in progress (WIP)</i> +</p> +</div> +</div> + +<div id="outline-container-orga29cad0" class="outline-2"> +<h2 id="orga29cad0"><span class="section-number-2">4</span> Define the EBI sequence</h2> +<div class="outline-text-2" id="text-4"> +<p> +<i>work in progress (WIP)</i> +</p> +</div> +</div> +</div> +<div id="postamble" class="status"> +<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-07-17 Fri 06:05</small>. +</div> +</body> +</html> diff --git a/doc/blog/using-covid-19-pubseq-part6.org b/doc/blog/using-covid-19-pubseq-part6.org new file mode 100644 index 0000000..8964700 --- /dev/null +++ b/doc/blog/using-covid-19-pubseq-part6.org @@ -0,0 +1,102 @@ +#+TITLE: COVID-19 PubSeq (part 6) +#+AUTHOR: Pjotr Prins +# C-c C-e h h publish +# C-c ! insert date (use . for active agenda, C-u C-c ! for date, C-u C-c . for time) +# C-c C-t task rotate +# RSS_IMAGE_URL: http://xxxx.xxxx.free.fr/rss_icon.png + +#+HTML_HEAD: <link rel="Blog stylesheet" type="text/css" href="blog.css" /> + + +* Table of Contents :TOC:noexport: + - [[#generating-output-for-ebi][Generating output for EBI]] + - [[#defining-the-ebi-study][Defining the EBI study]] + - [[#define-the-ebi-sample][Define the EBI sample]] + - [[#define-the-ebi-sequence][Define the EBI sequence]] + +* Generating output for EBI + +Would it not be great an uploader to PubSeq also can export samples +to, say, EBI? That is what we discuss in this section. The submission +process is somewhat laborious and when you have submitted to PubSeq +why not export the same to EBI too with the least amount of effort? + +COVID-19 PubSeq is a data source - both sequence data and metadata - +that can be used to push data to other sources, such as EBI. You can +register [[https://ena-docs.readthedocs.io/en/latest/submit/samples/programmatic.html][samples programmatically]] with a specific XML interface. Note +that (at this point) if you want to submit a sequence (FASTA) it can +only be done through the [[https://ena-docs.readthedocs.io/en/latest/submit/general-guide/webin-cli.html][Webin-CLI]]. Raw data (FASTQ) can go through +the XML interface. + +EBI sequence resources are presented through ENA. For example +[[https://www.ebi.ac.uk/ena/browser/view/MT394864][Sequence: MT394864.1]]. + +EBI has XML Formats for + +- SUBMISSION +- STUDY +- SAMPLE +- EXPERIMENT +- RUN +- ANALYSIS +- DAC +- POLICY +- DATASET +- PROJECT + +with the schemas listed [[ftp://ftp.ebi.ac.uk/pub/databases/ena/doc/xsd/sra_1_5/][here]]. Since we are submitting sequences we +should follow submitting [[https://ena-docs.readthedocs.io/en/latest/submit/assembly.html][full genome assembly guidelines]] and +[[https://ena-docs.readthedocs.io/en/latest/submit/general-guide/programmatic.html][ENA guidelines]]. The first step is to define the study, next the sample +and finally the sequence (assembly). + +* Defining the EBI study + +A study is defined [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html][here]] and looks like + +#+BEGIN_SRC xml +<PROJECT_SET> + <PROJECT alias="COVID-19 Washington DC"> + <TITLE>Sequencing SARS-CoV-2 in the Washington DC area</TITLE> + <DESCRIPTION>This study collects samples from COVID-19 patients in the Washington DC area</DESCRIPTION> + <SUBMISSION_PROJECT> + <SEQUENCING_PROJECT/> + </SUBMISSION_PROJECT> + </PROJECT> +</PROJECT_SET> +#+END_SRC + +also a submission 'command' is required looking like + +#+BEGIN_SRC xml +<SUBMISSION> + <ACTIONS> + <ACTION> + <ADD/> + </ACTION> + <ACTION> + <HOLD HoldUntilDate="TODO: release date"/> + </ACTION> + </ACTIONS> +</SUBMISSION> + +#+END_SRC + +The webin system accepts such sources using a command like + +: curl -u username:password -F "SUBMISSION=@submission.xml" \ +: -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" + +as described [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html#submit-the-xmls-using-curl][here]]. Note that this is the test server. For the final +version use www.ebi.ac.uk instead of wwwdev.ebi.ac.uk. You may also +need the --insecure switch to circumvent certificate checking. + +/work in progress (WIP)/ + +* Define the EBI sample + + +/work in progress (WIP)/ + +* Define the EBI sequence + +/work in progress (WIP)/ |