diff options
author | Pjotr Prins | 2020-08-26 11:14:44 +0100 |
---|---|---|
committer | Pjotr Prins | 2020-08-26 11:15:06 +0100 |
commit | 2acf6a3c466dd296966e2c2c6a7e104e4a40bf31 (patch) | |
tree | 799f46453a6d6db4a53863499b11936ad04f2235 | |
parent | 02d761902d49491f5b85c117dcb37db072be034d (diff) | |
download | bh20-seq-resource-2acf6a3c466dd296966e2c2c6a7e104e4a40bf31.tar.gz bh20-seq-resource-2acf6a3c466dd296966e2c2c6a7e104e4a40bf31.tar.lz bh20-seq-resource-2acf6a3c466dd296966e2c2c6a7e104e4a40bf31.zip |
Docs
-rw-r--r-- | bh20simplewebuploader/static/image/arvados-workflow-output.png | bin | 0 -> 81060 bytes | |||
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part1.html | 137 | ||||
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part1.org | 2 | ||||
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part2.html | 70 | ||||
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part2.org | 27 |
5 files changed, 144 insertions, 92 deletions
diff --git a/bh20simplewebuploader/static/image/arvados-workflow-output.png b/bh20simplewebuploader/static/image/arvados-workflow-output.png Binary files differnew file mode 100644 index 0000000..e15d137 --- /dev/null +++ b/bh20simplewebuploader/static/image/arvados-workflow-output.png diff --git a/doc/blog/using-covid-19-pubseq-part1.html b/doc/blog/using-covid-19-pubseq-part1.html index 5fd86d1..deeb749 100644 --- a/doc/blog/using-covid-19-pubseq-part1.html +++ b/doc/blog/using-covid-19-pubseq-part1.html @@ -3,10 +3,10 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> -<!-- 2020-07-19 Sun 02:32 --> +<!-- 2020-08-26 Wed 05:02 --> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> -<title>COVID-19 PubSeq (part 1)</title> +<title>COVID-19 PubSeq - query metadata (part 1)</title> <meta name="generator" content="Org mode" /> <meta name="author" content="Pjotr Prins" /> <style type="text/css"> @@ -243,25 +243,25 @@ for the JavaScript code in this tag. </head> <body> <div id="content"> -<h1 class="title">COVID-19 PubSeq (part 1)</h1> +<h1 class="title">COVID-19 PubSeq - query metadata (part 1)</h1> <div id="table-of-contents"> <h2>Table of Contents</h2> <div id="text-table-of-contents"> <ul> -<li><a href="#orgb852bf7">1. What does this mean?</a></li> -<li><a href="#orge6db105">2. Fetch sequence data</a></li> -<li><a href="#orgf3b8001">3. Predicates</a></li> -<li><a href="#org11097b0">4. Fetch submitter info and other metadata</a></li> -<li><a href="#org4f8467e">5. Fetch all sequences from Washington state</a></li> -<li><a href="#orge9b18e2">6. Discussion</a></li> -<li><a href="#orga0badf8">7. Acknowledgements</a></li> +<li><a href="#orga5382ca">1. What does this mean?</a></li> +<li><a href="#orgf6c7763">2. Fetch sequence data</a></li> +<li><a href="#org228a8d5">3. Predicates</a></li> +<li><a href="#orgfb34172">4. Fetch submitter info and other metadata</a></li> +<li><a href="#org16f6b8d">5. Fetch all sequences from Washington state</a></li> +<li><a href="#org2a85986">6. Discussion</a></li> +<li><a href="#orgcf3645c">7. Acknowledgements</a></li> </ul> </div> </div> -<div id="outline-container-orgb852bf7" class="outline-2"> -<h2 id="orgb852bf7"><span class="section-number-2">1</span> What does this mean?</h2> +<div id="outline-container-orga5382ca" class="outline-2"> +<h2 id="orga5382ca"><span class="section-number-2">1</span> What does this mean?</h2> <div class="outline-text-2" id="text-1"> <p> This means that when someone uploads a SARS-CoV-2 sequence using one @@ -274,24 +274,24 @@ expressed in a <a href="https://github.com/arvados/bh20-seq-resource/blob/master type: record fields: host_species: - doc: Host species as defined in NCBITaxon, e.g. http://purl.obolibrary.org/obo/NCBITaxon_9606 for Homo sapiens + doc: Host species as defined in NCBITaxon, e.g. http://purl.obolibrary.org/obo/NCBITaxon_<span style="color: #8bc34a;">9606</span> for Homo sapiens type: string jsonldPredicate: - _id: http://www.ebi.ac.uk/efo/EFO_0000532 - _type: "@id" - noLinkCheck: true + _id: http://www.ebi.ac.uk/efo/EFO_<span style="color: #8bc34a;">0000532</span> + _type: <span style="color: #9ccc65;">"@id"</span> + noLinkCheck: <span style="color: #8bc34a;">true</span> host_sex: doc: Sex of the host as defined in PATO, expect male () or female () type: string? jsonldPredicate: - _id: http://purl.obolibrary.org/obo/PATO_0000047 - _type: "@id" - noLinkCheck: true + _id: http://purl.obolibrary.org/obo/PATO_<span style="color: #8bc34a;">0000047</span> + _type: <span style="color: #9ccc65;">"@id"</span> + noLinkCheck: <span style="color: #8bc34a;">true</span> host_age: - doc: Age of the host as number (e.g. 50) + doc: Age of the host as number (e.g. <span style="color: #8bc34a;">50</span>) type: int? jsonldPredicate: - _id: http://purl.obolibrary.org/obo/PATO_0000011 + _id: http://purl.obolibrary.org/obo/PATO_<span style="color: #8bc34a;">0000011</span> </pre> </div> @@ -313,8 +313,8 @@ initiative! </div> </div> -<div id="outline-container-orge6db105" class="outline-2"> -<h2 id="orge6db105"><span class="section-number-2">2</span> Fetch sequence data</h2> +<div id="outline-container-orgf6c7763" class="outline-2"> +<h2 id="orgf6c7763"><span class="section-number-2">2</span> Fetch sequence data</h2> <div class="outline-text-2" id="text-2"> <p> The latest run of the pipeline can be viewed <a href="https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca">here</a>. Each of these @@ -338,8 +338,8 @@ these identifiers throughout. </div> </div> -<div id="outline-container-orgf3b8001" class="outline-2"> -<h2 id="orgf3b8001"><span class="section-number-2">3</span> Predicates</h2> +<div id="outline-container-org228a8d5" class="outline-2"> +<h2 id="org228a8d5"><span class="section-number-2">3</span> Predicates</h2> <div class="outline-text-2" id="text-3"> <p> To explore an RDF dataset, the first query we can do is open and gets @@ -349,7 +349,7 @@ the following in a SPARQL end point </p> <div class="org-src-container"> -<pre class="src src-sql">select distinct ?p +<pre class="src src-sql"><span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?p { ?o ?p ?s } @@ -363,7 +363,7 @@ To get a <a href="http://sparql.genenetwork.org/sparql/?default-graph-uri=&q </p> <div class="org-src-container"> -<pre class="src src-sql">select distinct ?g +<pre class="src src-sql"><span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?g { GRAPH ?g {?s ?p ?o} } @@ -382,9 +382,9 @@ To list all submitters, try </p> <div class="org-src-container"> -<pre class="src src-sql">select distinct ?s +<pre class="src src-sql"><span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?s { - ?o <http://biohackathon.org/bh20-seq-schema#MainSchema/submitter> ?s + ?o <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/submitter"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/submitter">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/submitter">#MainSchema/submitter></a> ?s } </pre> </div> @@ -396,9 +396,9 @@ and by </p> <div class="org-src-container"> -<pre class="src src-sql">select distinct ?s +<pre class="src src-sql"><span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?s { - ?o <http://biohackathon.org/bh20-seq-schema#MainSchema/submitter> ?id . + ?o <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/submitter"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/submitter">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/submitter">#MainSchema/submitter></a> ?id . ?id ?p ?s } </pre> @@ -414,8 +414,8 @@ To lift the full URL out of the query you can use a header like </p> <div class="org-src-container"> -<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> -select distinct ?dataset ?submitter +<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">#MainSchema/></a> +<span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?dataset ?submitter { ?dataset pubseq:submitter ?id . ?id ?p ?submitter @@ -437,8 +437,8 @@ Now we got this far, lets <a href="http://sparql.genenetwork.org/sparql/?default </p> <div class="org-src-container"> -<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> -select (COUNT(distinct ?dataset) as ?num) +<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">#MainSchema/></a> +<span style="color: #fff59d;">select</span> (<span style="color: #ff8A65;">COUNT</span>(<span style="color: #fff59d;">distinct</span> ?dataset) <span style="color: #fff59d;">as</span> ?num) { ?dataset pubseq:submitter ?id . ?id ?p ?submitter @@ -452,16 +452,16 @@ Run this <a href="http://sparql.genenetwork.org/sparql/?default-graph-uri=&q </div> </div> -<div id="outline-container-org11097b0" class="outline-2"> -<h2 id="org11097b0"><span class="section-number-2">4</span> Fetch submitter info and other metadata</h2> +<div id="outline-container-orgfb34172" class="outline-2"> +<h2 id="orgfb34172"><span class="section-number-2">4</span> Fetch submitter info and other metadata</h2> <div class="outline-text-2" id="text-4"> <p> To get datasets with submitters we can do the above </p> <div class="org-src-container"> -<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> -select distinct ?dataset ?p ?submitter +<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">#MainSchema/></a> +<span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?dataset ?p ?submitter { ?dataset pubseq:submitter ?id . ?id ?p ?submitter @@ -486,12 +486,12 @@ Let's focus on one sample with </p> <div class="org-src-container"> -<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> -select distinct ?dataset ?submitter +<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">#MainSchema/></a> +<span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?dataset ?submitter { ?dataset pubseq:submitter ?id . ?id ?p ?submitter . - FILTER(CONTAINS(?submitter,"Roychoudhury")) . + FILTER(<span style="color: #fff59d;">CONTAINS</span>(?submitter,"Roychoudhury")) . } </pre> </div> @@ -502,8 +502,8 @@ see if we can get a sample ID by listing sample predicates </p> <div class="org-src-container"> -<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> -select distinct ?p +<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">#MainSchema/></a> +<span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?p { ?dataset ?p ?o . ?dataset pubseq:submitter ?id . @@ -519,12 +519,12 @@ Let's zoom in on those of Roychoudhury with <div class="org-src-container"> -<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> -select distinct ?sid ?sample ?p1 ?dataset ?submitter +<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/">#MainSchema/></a> +<span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?sid ?sample ?p1 ?dataset ?submitter { ?dataset pubseq:submitter ?id . ?id ?p ?submitter . - FILTER(CONTAINS(?submitter,"Roychoudhury")) . + FILTER(<span style="color: #fff59d;">CONTAINS</span>(?submitter,"Roychoudhury")) . ?dataset pubseq:sample ?sid . ?sid ?p1 ?sample } @@ -542,9 +542,14 @@ this database. Let's focus on one sample "MT326090.1" with predicate </p> <div class="org-src-container"> -<pre class="src src-sql">PREFIX pubseq: <http://biohackathon.org/bh20-seq-schema#MainSchema/> -PREFIX sio: <http://semanticscience.org/resource/> -select distinct ?sample ?p ?o +<pre class="src src-sql"><span style="color: #fff59d;">PREFIX</span> pubseq: <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/> +PREFIX sio: <http://semanticscience.org/resource/"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/> +PREFIX sio: <http://semanticscience.org/resource/">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/> +PREFIX sio: <http://semanticscience.org/resource/">#MainSchema/> +</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/> +PREFIX sio: <http://semanticscience.org/resource/">PREFIX</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/> +PREFIX sio: <http://semanticscience.org/resource/"> sio: <http://semanticscience.org/resource/></a> +<span style="color: #fff59d;">select</span> <span style="color: #fff59d;">distinct</span> ?sample ?p ?o { ?sample sio:SIO_000115 "MT326090.1" . ?sample ?p ?o . @@ -571,8 +576,8 @@ to view/query the database and place the sequence on the <a href="http://covid19 </div> </div> -<div id="outline-container-org4f8467e" class="outline-2"> -<h2 id="org4f8467e"><span class="section-number-2">5</span> Fetch all sequences from Washington state</h2> +<div id="outline-container-org16f6b8d" class="outline-2"> +<h2 id="org16f6b8d"><span class="section-number-2">5</span> Fetch all sequences from Washington state</h2> <div class="outline-text-2" id="text-5"> <p> Now we know how to get at the origin we can do it the other way round @@ -580,10 +585,13 @@ and fetch all sequences referring to Washington state </p> <div class="org-src-container"> -<pre class="src src-sql">select ?seq ?sample +<pre class="src src-sql"><span style="color: #fff59d;">select</span> ?seq ?sample { - ?seq <http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample . - ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q1223> + ?seq <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample . + ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q1223"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample . + ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q1223">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample . + ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q1223">#MainSchema/sample> ?sample . + ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q1223></a> } </pre> </div> @@ -599,10 +607,13 @@ entity is <a href="https://www.wikidata.org/wiki/Q43">Q43</a>: </p> <div class="org-src-container"> -<pre class="src src-sql">select ?seq ?sample +<pre class="src src-sql"><span style="color: #fff59d;">select</span> ?seq ?sample { - ?seq <http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample . - ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q43> + ?seq <a href="http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample . + ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q43"><http://biohackathon.org/bh20-seq-</a><span style="color: #fff59d;"><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample . + ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q43">schema</a></span><a href="http://biohackathon.org/bh20-seq-schema#MainSchema/sample> ?sample . + ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q43">#MainSchema/sample> ?sample . + ?sample <http://purl.obolibrary.org/obo/GAZ_00000448> <http://www.wikidata.org/entity/Q43></a> } </pre> </div> @@ -613,8 +624,8 @@ Run <a href="http://sparql.genenetwork.org/sparql/?default-graph-uri=&query= </div> </div> -<div id="outline-container-orge9b18e2" class="outline-2"> -<h2 id="orge9b18e2"><span class="section-number-2">6</span> Discussion</h2> +<div id="outline-container-org2a85986" class="outline-2"> +<h2 id="org2a85986"><span class="section-number-2">6</span> Discussion</h2> <div class="outline-text-2" id="text-6"> <p> The public sequence uploader collects sequences, raw data and @@ -625,8 +636,8 @@ referenced in publications and origins are citeable. </div> </div> -<div id="outline-container-orga0badf8" class="outline-2"> -<h2 id="orga0badf8"><span class="section-number-2">7</span> Acknowledgements</h2> +<div id="outline-container-orgcf3645c" class="outline-2"> +<h2 id="orgcf3645c"><span class="section-number-2">7</span> Acknowledgements</h2> <div class="outline-text-2" id="text-7"> <p> The overall effort was due to magnificent freely donated input by a @@ -641,7 +652,7 @@ Garrison this initiative would not have existed! </div> </div> <div id="postamble" class="status"> -<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-07-19 Sun 02:32</small>. +<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-08-26 Wed 05:02</small>. </div> </body> </html> diff --git a/doc/blog/using-covid-19-pubseq-part1.org b/doc/blog/using-covid-19-pubseq-part1.org index 9c8a1c0..e41952d 100644 --- a/doc/blog/using-covid-19-pubseq-part1.org +++ b/doc/blog/using-covid-19-pubseq-part1.org @@ -1,4 +1,4 @@ -#+TITLE: COVID-19 PubSeq (part 1) +#+TITLE: COVID-19 PubSeq - query metadata (part 1) #+AUTHOR: Pjotr Prins # C-c C-e h h publish # C-c ! insert date (use . for active agenda, C-u C-c ! for date, C-u C-c . for time) diff --git a/doc/blog/using-covid-19-pubseq-part2.html b/doc/blog/using-covid-19-pubseq-part2.html index b124c89..567980d 100644 --- a/doc/blog/using-covid-19-pubseq-part2.html +++ b/doc/blog/using-covid-19-pubseq-part2.html @@ -3,10 +3,10 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> -<!-- 2020-08-25 Tue 05:55 --> +<!-- 2020-08-26 Wed 05:01 --> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> -<title>COVID-19 PubSeq (part 2)</title> +<title>COVID-19 PubSeq - Arvados</title> <meta name="generator" content="Org mode" /> <meta name="author" content="Pjotr Prins" /> <style type="text/css"> @@ -247,35 +247,44 @@ for the JavaScript code in this tag. | <a accesskey="H" href="http://covid19.genenetwork.org"> HOME </a> </div><div id="content"> -<h1 class="title">COVID-19 PubSeq (part 2)</h1> +<h1 class="title">COVID-19 PubSeq - Arvados</h1> <div id="table-of-contents"> <h2>Table of Contents</h2> <div id="text-table-of-contents"> <ul> -<li><a href="#orgd3ae0e5">1. Finding output of workflows</a></li> -<li><a href="#orgce95d40">2. The Arvados file interface</a></li> -<li><a href="#org95f2c67">3. The PubSeq Arvados shell</a></li> -<li><a href="#orgfba95f0">4. Wiring up CWL</a></li> -<li><a href="#orgdf910f1">5. Using the Arvados API</a></li> +<li><a href="#org6501d83">1. The Arvados Web Server</a></li> +<li><a href="#orgcb7854f">2. The Arvados file interface</a></li> +<li><a href="#orgc8c3ccd">3. The PubSeq Arvados shell</a></li> +<li><a href="#org028c1b4">4. Wiring up CWL</a></li> +<li><a href="#org7cdc8cc">5. Using the Arvados API</a></li> +<li><a href="#org5961211">6. Troubleshooting</a></li> </ul> </div> </div> -<div id="outline-container-orgd3ae0e5" class="outline-2"> -<h2 id="orgd3ae0e5"><span class="section-number-2">1</span> Finding output of workflows</h2> +<div id="outline-container-org6501d83" class="outline-2"> +<h2 id="org6501d83"><span class="section-number-2">1</span> The Arvados Web Server</h2> <div class="outline-text-2" id="text-1"> <p> We are using Arvados to run common workflow language (CWL) pipelines. The most recent output is on display on a <a href="https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca">web page</a> (with time stamp) -and a full list is generated <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">here</a>. It is nice to start up, but for -most users we need a dedicated and themed results page. People don't -want to wade through thousands of output files! +and a full output list is generated <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">here</a>. +</p> + +<p> +Arvados has a web front which allows navigation through input and output data, +workflows and the output of analysis pipelines (here CWL workflows). +</p> + +<p> + <img src="static/image/arvados-workflow-output.png" /> </p> </div> </div> -<div id="outline-container-orgce95d40" class="outline-2"> -<h2 id="orgce95d40"><span class="section-number-2">2</span> The Arvados file interface</h2> + +<div id="outline-container-orgcb7854f" class="outline-2"> +<h2 id="orgcb7854f"><span class="section-number-2">2</span> The Arvados file interface</h2> <div class="outline-text-2" id="text-2"> <p> Arvados has the web server, but it also has a REST API and associated @@ -352,8 +361,8 @@ arv-get 2be6af7b4741f2a5c5f8ff2bc6152d73+1955623+Ab9ad65d7fe958a053b3a57d545839d </div> </div> -<div id="outline-container-org95f2c67" class="outline-2"> -<h2 id="org95f2c67"><span class="section-number-2">3</span> The PubSeq Arvados shell</h2> +<div id="outline-container-orgc8c3ccd" class="outline-2"> +<h2 id="orgc8c3ccd"><span class="section-number-2">3</span> The PubSeq Arvados shell</h2> <div class="outline-text-2" id="text-3"> <p> When you login to Arvados (you can request permission from us) it is @@ -408,8 +417,8 @@ essentially <a href="https://github.com/arvados/bh20-seq-resource/blob/2baa88b76 </div> </div> -<div id="outline-container-orgfba95f0" class="outline-2"> -<h2 id="orgfba95f0"><span class="section-number-2">4</span> Wiring up CWL</h2> +<div id="outline-container-org028c1b4" class="outline-2"> +<h2 id="org028c1b4"><span class="section-number-2">4</span> Wiring up CWL</h2> <div class="outline-text-2" id="text-4"> <p> In above script <code>bh20-seq-analyzer</code> you can see that the <a href="https://www.commonwl.org/">Common @@ -450,8 +459,8 @@ For more see <a href="https://hpc.guix.info/blog/2019/01/creating-a-reproducible </div> </div> -<div id="outline-container-orgdf910f1" class="outline-2"> -<h2 id="orgdf910f1"><span class="section-number-2">5</span> Using the Arvados API</h2> +<div id="outline-container-org7cdc8cc" class="outline-2"> +<h2 id="org7cdc8cc"><span class="section-number-2">5</span> Using the Arvados API</h2> <div class="outline-text-2" id="text-5"> <p> Arvados provides a rich API for accessing internals of the Cloud @@ -466,9 +475,26 @@ get a list of <a href="https://github.com/arvados/bh20-seq-resource/blob/2baa88b </p> </div> </div> + +<div id="outline-container-org5961211" class="outline-2"> +<h2 id="org5961211"><span class="section-number-2">6</span> Troubleshooting</h2> +<div class="outline-text-2" id="text-6"> +<p> +When workflows have errors we should check the logs in Arvados. +</p> + +<p> +Go to the <a href="https://workbench.lugli.arvadosapi.com/projects/lugli-j7d0g-825x3r5vcs41dus">project</a> page for 'COVID-19-BH20 Shared Project' -> 'Public +Sequence Resource'. Click on analysis runs +<a href="https://workbench.lugli.arvadosapi.com/projects/lugli-j7d0g-y4k4uswcqi3ku56">https://workbench.lugli.arvadosapi.com/projects/lugli-j7d0g-y4k4uswcqi3ku56</a> +and 'Subprojects'. Click one of the runs and then on 'Processes' and you'll +see what parts failed. +</p> +</div> +</div> </div> <div id="postamble" class="status"> -<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-08-25 Tue 04:32</small>. +<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-08-26 Wed 05:01</small>. </div> </body> </html> diff --git a/doc/blog/using-covid-19-pubseq-part2.org b/doc/blog/using-covid-19-pubseq-part2.org index c44b5c7..4b827f5 100644 --- a/doc/blog/using-covid-19-pubseq-part2.org +++ b/doc/blog/using-covid-19-pubseq-part2.org @@ -1,4 +1,4 @@ -#+TITLE: COVID-19 PubSeq (part 2) +#+TITLE: COVID-19 PubSeq - Arvados #+AUTHOR: Pjotr Prins # C-c C-e h h publish # C-c ! insert date (use . for active agenda, C-u C-c ! for date, C-u C-c . for time) @@ -9,19 +9,24 @@ #+HTML_HEAD: <link rel="Blog stylesheet" type="text/css" href="blog.css" /> * Table of Contents :TOC:noexport: - - [[#finding-output-of-workflows][Finding output of workflows]] + - [[#the-arvados-web-server][The Arvados Web Server]] - [[#the-arvados-file-interface][The Arvados file interface]] - [[#the-pubseq-arvados-shell][The PubSeq Arvados shell]] - [[#wiring-up-cwl][Wiring up CWL]] - [[#using-the-arvados-api][Using the Arvados API]] + - [[#troubleshooting][Troubleshooting]] -* Finding output of workflows +* The Arvados Web Server We are using Arvados to run common workflow language (CWL) pipelines. The most recent output is on display on a [[https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca][web page]] (with time stamp) -and a full list is generated [[https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/][here]]. It is nice to start up, but for -most users we need a dedicated and themed results page. People don't -want to wade through thousands of output files! +and a full output list is generated [[https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/][here]]. + +Arvados has a web front which allows navigation through input and output data, +workflows and the output of analysis pipelines (here CWL workflows). + +@@html: <img src="static/image/arvados-workflow-output.png" />@@ + * The Arvados file interface @@ -127,3 +132,13 @@ In above script ~bh20-seq-analyzer~ there are examples of querying the [[https://doc.arvados.org/api/index.html][Arvados API]] using the [[https://pypi.org/project/arvados-python-client/][Python Arvados client and libraries]]. For example get a list of [[https://github.com/arvados/bh20-seq-resource/blob/2baa88b766ec540bd34b96599014dd16e393af39/bh20seqanalyzer/main.py#L228][projects]] in Arvados. Main thing is to get the ~ARVADOS-API-HOST~ and ~ARVADOS-API-TOKEN~ right as is shown above. + +* Troubleshooting + +When workflows have errors we should check the logs in Arvados. + +Go to the [[https://workbench.lugli.arvadosapi.com/projects/lugli-j7d0g-825x3r5vcs41dus][project]] page for 'COVID-19-BH20 Shared Project' -> 'Public +Sequence Resource'. Click on analysis runs +https://workbench.lugli.arvadosapi.com/projects/lugli-j7d0g-y4k4uswcqi3ku56 +and 'Subprojects'. Click one of the runs and then on 'Processes' and you'll +see what parts failed. |