aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorlltommy2020-11-11 09:56:12 +0100
committerlltommy2020-11-11 09:56:12 +0100
commitd6aa323b6fc7a82e45cc1df51fc72c2d547146eb (patch)
tree6e8b77bde4dc34fab3fa8804906f3cb821f61dae /doc
parentc5fe5de7e4c77bfb48b1ae2f662c2d9cc120c06e (diff)
parentc872248e43c1c66e5fed8ef341f7b4ac21d63e6f (diff)
downloadbh20-seq-resource-d6aa323b6fc7a82e45cc1df51fc72c2d547146eb.tar.gz
bh20-seq-resource-d6aa323b6fc7a82e45cc1df51fc72c2d547146eb.tar.lz
bh20-seq-resource-d6aa323b6fc7a82e45cc1df51fc72c2d547146eb.zip
Merge branch 'master' of https://github.com/arvados/bh20-seq-resource
Diffstat (limited to 'doc')
-rw-r--r--doc/INSTALL.md48
-rw-r--r--doc/blog/using-covid-19-pubseq-part2.html127
-rw-r--r--doc/blog/using-covid-19-pubseq-part2.org21
-rw-r--r--doc/blog/using-covid-19-pubseq-part3.html261
-rw-r--r--doc/blog/using-covid-19-pubseq-part3.org162
-rw-r--r--doc/web/download.html172
-rw-r--r--doc/web/download.org2
7 files changed, 479 insertions, 314 deletions
diff --git a/doc/INSTALL.md b/doc/INSTALL.md
index df825c6..0367c63 100644
--- a/doc/INSTALL.md
+++ b/doc/INSTALL.md
@@ -31,7 +31,7 @@ arvados-python-client-2.0.1 ciso8601-2.1.3 future-0.18.2 google-api-python-clien
3. Run the tool directly with
```sh
-guix environment guix --ad-hoc git python openssl python-pycurl python-magic nss-certs python-pyshex -- python3 bh20sequploader/main.py example/sequence.fasta example/maximum_metadata_example.yaml
+guix environment guix --ad-hoc git python openssl python-pycurl python-magic nss-certs python-pyshex -- python3 bh20sequploader/main.py example/maximum_metadata_example.yaml example/sequence.fasta
```
Note that python-pyshex is packaged in
@@ -44,6 +44,12 @@ repository. E.g.
env GUIX_PACKAGE_PATH=~/iwrk/opensource/guix/guix-bioinformatics/ ~/opt/guix/bin/guix environment -C guix --ad-hoc git python python-flask python-pyyaml python-pycurl python-magic nss-certs python-pyshex python-pyyaml --network openssl python-pyshex python-pyshexc minimap2 python-schema-salad python-arvados-python-client --share=/export/tmp -- env TMPDIR=/export/tmp python3 bh20sequploader/main.py --help
```
+Latest successful Guix run
+
+```sh
+env GUIX_PACKAGE_PATH=~/iwrk/opensource/guix/guix-bioinformatics/ ~/opt/guix/bin/guix environment guix --ad-hoc git python openssl python-pycurl python-magic nss-certs python-pyshex python-arvados-python-client python-schema-salad minimap2 -- python3 bh20sequploader/main.py scripts/uthsc_samples/yaml/AL_UT14.yaml scripts/uthsc_samples/yaml/AL_UT14.fa
+```
+
### Using the Web Uploader
To run the web uploader in a GNU Guix environment/container run it with something like
@@ -67,3 +73,43 @@ penguin2:~/iwrk/opensource/code/vg/bh20-seq-resource$ env GUIX_PACKAGE_PATH=~/i
```
Note: see above on GUIX_PACKAGE_PATH.
+
+## Run country semantic enrichment script
+
+ cd bh20-seq-resource/scripts/db_enrichment
+ edit input_location.csv
+ guix environment guix --ad-hoc git python nss-certs python-rdflib -- python3 country_enrichment.py
+
+## Run the tests
+
+ guix package -i python-requests python-pandas python-jinja2 python -p ~/opt/python-dev
+ . ~/opt/python-dev/etc/profile
+
+
+## Run Virtuoso-ose
+
+Guix has a package for virtuoso-ose we use
+
+ guix package -i virtuoso-ose -p ~/opt/virtuoso
+
+Create a data dir
+
+ mkdir -p /export/virtuoso/var/lib/virtuoso/db
+ chown $USER /export/virtuoso/var/lib/virtuoso/db
+
+Add an ini file
+
+ cp ~/opt/virtuoso/var/lib/virtuoso/db/virtuoso.ini .config/
+
+And run from the data dir
+
+ cd /export/virtuoso/var/lib/virtuoso/db
+ guix environment --ad-hoc virtuoso-ose -- virtuoso-t -f
+
+Visit http://localhost:8890/sparql
+
+To update the turtle files do
+
+ guix environment -C guix --ad-hoc python python-requests raptor2 curl --network -- python3 ./scripts/update_virtuoso/check_for_updates.py cache.txt dba dba
+
+where dba is the default password.
diff --git a/doc/blog/using-covid-19-pubseq-part2.html b/doc/blog/using-covid-19-pubseq-part2.html
index 567980d..eff6fcd 100644
--- a/doc/blog/using-covid-19-pubseq-part2.html
+++ b/doc/blog/using-covid-19-pubseq-part2.html
@@ -3,7 +3,7 @@
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
-<!-- 2020-08-26 Wed 05:01 -->
+<!-- 2020-11-10 Tue 05:08 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>COVID-19 PubSeq - Arvados</title>
@@ -40,7 +40,7 @@
}
pre.src {
position: relative;
- overflow: visible;
+ overflow: auto;
padding-top: 1.2em;
}
pre.src:before {
@@ -195,50 +195,26 @@
</style>
<link rel="Blog stylesheet" type="text/css" href="blog.css" />
<script type="text/javascript">
-/*
-@licstart The following is the entire license notice for the
-JavaScript code in this tag.
-
-Copyright (C) 2012-2020 Free Software Foundation, Inc.
-
-The JavaScript code in this tag is free software: you can
-redistribute it and/or modify it under the terms of the GNU
-General Public License (GNU GPL) as published by the Free Software
-Foundation, either version 3 of the License, or (at your option)
-any later version. The code is distributed WITHOUT ANY WARRANTY;
-without even the implied warranty of MERCHANTABILITY or FITNESS
-FOR A PARTICULAR PURPOSE. See the GNU GPL for more details.
-
-As additional permission under GNU GPL version 3 section 7, you
-may distribute non-source (e.g., minimized or compacted) forms of
-that code without the copy of the GNU GPL normally required by
-section 4, provided you include this license notice and a URL
-through which recipients can access the Corresponding Source.
-
-
-@licend The above is the entire license notice
-for the JavaScript code in this tag.
-*/
+// @license magnet:?xt=urn:btih:e95b018ef3580986a04669f1b5879592219e2a7a&dn=public-domain.txt Public Domain
<!--/*--><![CDATA[/*><!--*/
- function CodeHighlightOn(elem, id)
- {
- var target = document.getElementById(id);
- if(null != target) {
- elem.cacheClassElem = elem.className;
- elem.cacheClassTarget = target.className;
- target.className = "code-highlighted";
- elem.className = "code-highlighted";
- }
- }
- function CodeHighlightOff(elem, id)
- {
- var target = document.getElementById(id);
- if(elem.cacheClassElem)
- elem.className = elem.cacheClassElem;
- if(elem.cacheClassTarget)
- target.className = elem.cacheClassTarget;
- }
-/*]]>*///-->
+ function CodeHighlightOn(elem, id)
+ {
+ var target = document.getElementById(id);
+ if(null != target) {
+ elem.classList.add("code-highlighted");
+ target.classList.add("code-highlighted");
+ }
+ }
+ function CodeHighlightOff(elem, id)
+ {
+ var target = document.getElementById(id);
+ if(null != target) {
+ elem.classList.remove("code-highlighted");
+ target.classList.remove("code-highlighted");
+ }
+ }
+ /*]]>*///-->
+// @license-end
</script>
</head>
<body>
@@ -252,18 +228,18 @@ for the JavaScript code in this tag.
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
-<li><a href="#org6501d83">1. The Arvados Web Server</a></li>
-<li><a href="#orgcb7854f">2. The Arvados file interface</a></li>
-<li><a href="#orgc8c3ccd">3. The PubSeq Arvados shell</a></li>
-<li><a href="#org028c1b4">4. Wiring up CWL</a></li>
-<li><a href="#org7cdc8cc">5. Using the Arvados API</a></li>
-<li><a href="#org5961211">6. Troubleshooting</a></li>
+<li><a href="#org10ef830">1. The Arvados Web Server</a></li>
+<li><a href="#orgb6a7a42">2. The Arvados file interface</a></li>
+<li><a href="#org0c7b94e">3. The PubSeq Arvados shell</a></li>
+<li><a href="#org756005d">4. Wiring up CWL</a></li>
+<li><a href="#orgf30b46f">5. Using the Arvados API</a></li>
+<li><a href="#org3af3122">6. Troubleshooting</a></li>
</ul>
</div>
</div>
-<div id="outline-container-org6501d83" class="outline-2">
-<h2 id="org6501d83"><span class="section-number-2">1</span> The Arvados Web Server</h2>
+<div id="outline-container-org10ef830" class="outline-2">
+<h2 id="org10ef830"><span class="section-number-2">1</span> The Arvados Web Server</h2>
<div class="outline-text-2" id="text-1">
<p>
We are using Arvados to run common workflow language (CWL) pipelines.
@@ -283,8 +259,8 @@ workflows and the output of analysis pipelines (here CWL workflows).
</div>
-<div id="outline-container-orgcb7854f" class="outline-2">
-<h2 id="orgcb7854f"><span class="section-number-2">2</span> The Arvados file interface</h2>
+<div id="outline-container-orgb6a7a42" class="outline-2">
+<h2 id="orgb6a7a42"><span class="section-number-2">2</span> The Arvados file interface</h2>
<div class="outline-text-2" id="text-2">
<p>
Arvados has the web server, but it also has a REST API and associated
@@ -361,8 +337,8 @@ arv-get 2be6af7b4741f2a5c5f8ff2bc6152d73+1955623+Ab9ad65d7fe958a053b3a57d545839d
</div>
</div>
-<div id="outline-container-orgc8c3ccd" class="outline-2">
-<h2 id="orgc8c3ccd"><span class="section-number-2">3</span> The PubSeq Arvados shell</h2>
+<div id="outline-container-org0c7b94e" class="outline-2">
+<h2 id="org0c7b94e"><span class="section-number-2">3</span> The PubSeq Arvados shell</h2>
<div class="outline-text-2" id="text-3">
<p>
When you login to Arvados (you can request permission from us) it is
@@ -414,11 +390,34 @@ the git repo and starts a new run calling into
/data/pubseq/bh20-seq-resource/venv3/bin/bh20-seq-analyzer which is
essentially <a href="https://github.com/arvados/bh20-seq-resource/blob/2baa88b766ec540bd34b96599014dd16e393af39/bh20seqanalyzer/main.py#L354">monitoring</a> for uploads.
</p>
+
+<p>
+On <code>run --help</code>
+</p>
+
+<pre class="example" id="org93c3a8a">
+optional arguments:
+ -h, --help show this help message and exit
+ --uploader-project UPLOADER_PROJECT
+ --pangenome-analysis-project PANGENOME_ANALYSIS_PROJECT
+ --fastq-project FASTQ_PROJECT
+ --validated-project VALIDATED_PROJECT
+ --workflow-def-project WORKFLOW_DEF_PROJECT
+ --pangenome-workflow-uuid PANGENOME_WORKFLOW_UUID
+ --fastq-workflow-uuid FASTQ_WORKFLOW_UUID
+ --exclude-list EXCLUDE_LIST
+ --latest-result-collection LATEST_RESULT_COLLECTION
+ --kickoff
+ --no-start-analysis
+ --once
+ --print-status PRINT_STATUS
+ --revalidate
+</pre>
</div>
</div>
-<div id="outline-container-org028c1b4" class="outline-2">
-<h2 id="org028c1b4"><span class="section-number-2">4</span> Wiring up CWL</h2>
+<div id="outline-container-org756005d" class="outline-2">
+<h2 id="org756005d"><span class="section-number-2">4</span> Wiring up CWL</h2>
<div class="outline-text-2" id="text-4">
<p>
In above script <code>bh20-seq-analyzer</code> you can see that the <a href="https://www.commonwl.org/">Common
@@ -459,8 +458,8 @@ For more see <a href="https://hpc.guix.info/blog/2019/01/creating-a-reproducible
</div>
</div>
-<div id="outline-container-org7cdc8cc" class="outline-2">
-<h2 id="org7cdc8cc"><span class="section-number-2">5</span> Using the Arvados API</h2>
+<div id="outline-container-orgf30b46f" class="outline-2">
+<h2 id="orgf30b46f"><span class="section-number-2">5</span> Using the Arvados API</h2>
<div class="outline-text-2" id="text-5">
<p>
Arvados provides a rich API for accessing internals of the Cloud
@@ -476,8 +475,8 @@ get a list of <a href="https://github.com/arvados/bh20-seq-resource/blob/2baa88b
</div>
</div>
-<div id="outline-container-org5961211" class="outline-2">
-<h2 id="org5961211"><span class="section-number-2">6</span> Troubleshooting</h2>
+<div id="outline-container-org3af3122" class="outline-2">
+<h2 id="org3af3122"><span class="section-number-2">6</span> Troubleshooting</h2>
<div class="outline-text-2" id="text-6">
<p>
When workflows have errors we should check the logs in Arvados.
@@ -494,7 +493,7 @@ see what parts failed.
</div>
</div>
<div id="postamble" class="status">
-<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-08-26 Wed 05:01</small>.
+<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-11-09 Mon 01:20</small>.
</div>
</body>
</html>
diff --git a/doc/blog/using-covid-19-pubseq-part2.org b/doc/blog/using-covid-19-pubseq-part2.org
index 4b827f5..d7816ba 100644
--- a/doc/blog/using-covid-19-pubseq-part2.org
+++ b/doc/blog/using-covid-19-pubseq-part2.org
@@ -96,6 +96,27 @@ the git repo and starts a new run calling into
/data/pubseq/bh20-seq-resource/venv3/bin/bh20-seq-analyzer which is
essentially [[https://github.com/arvados/bh20-seq-resource/blob/2baa88b766ec540bd34b96599014dd16e393af39/bh20seqanalyzer/main.py#L354][monitoring]] for uploads.
+On ~run --help~
+
+#+begin_example
+optional arguments:
+ -h, --help show this help message and exit
+ --uploader-project UPLOADER_PROJECT
+ --pangenome-analysis-project PANGENOME_ANALYSIS_PROJECT
+ --fastq-project FASTQ_PROJECT
+ --validated-project VALIDATED_PROJECT
+ --workflow-def-project WORKFLOW_DEF_PROJECT
+ --pangenome-workflow-uuid PANGENOME_WORKFLOW_UUID
+ --fastq-workflow-uuid FASTQ_WORKFLOW_UUID
+ --exclude-list EXCLUDE_LIST
+ --latest-result-collection LATEST_RESULT_COLLECTION
+ --kickoff
+ --no-start-analysis
+ --once
+ --print-status PRINT_STATUS
+ --revalidate
+#+end_example
+
* Wiring up CWL
In above script ~bh20-seq-analyzer~ you can see that the [[https://www.commonwl.org/][Common
diff --git a/doc/blog/using-covid-19-pubseq-part3.html b/doc/blog/using-covid-19-pubseq-part3.html
index 788c1d2..b49830b 100644
--- a/doc/blog/using-covid-19-pubseq-part3.html
+++ b/doc/blog/using-covid-19-pubseq-part3.html
@@ -3,7 +3,7 @@
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
-<!-- 2020-10-27 Tue 06:43 -->
+<!-- 2020-11-05 Thu 07:28 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>COVID-19 PubSeq Uploading Data (part 3)</title>
@@ -224,52 +224,66 @@
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
-<li><a href="#orga9eabf3">1. Uploading Data</a></li>
-<li><a href="#org643e745">2. Step 1: Upload sequence</a></li>
-<li><a href="#org0874b9f">3. Step 2: Add metadata</a>
+<li><a href="#org85998fd">1. Introduction</a></li>
+<li><a href="#orge783233">2. Uploading data</a></li>
+<li><a href="#orgc5810d7">3. Step 1: Upload sequence</a></li>
+<li><a href="#org5a4ae99">4. Step 2: Add metadata</a>
<ul>
-<li><a href="#orgaaa44f2">3.1. Obligatory fields</a>
+<li><a href="#orga9824de">4.1. Obligatory fields</a>
<ul>
-<li><a href="#orgf38cdbf">3.1.1. Sample ID (sample_id)</a></li>
-<li><a href="#org34b5b06">3.1.2. Collection date</a></li>
-<li><a href="#org221f1cf">3.1.3. Collection location</a></li>
-<li><a href="#org75d1dad">3.1.4. Sequencing technology</a></li>
-<li><a href="#org990e897">3.1.5. Authors</a></li>
+<li><a href="#org407fde2">4.1.1. Sample ID (sample_id)</a></li>
+<li><a href="#orgee3bb35">4.1.2. Collection date</a></li>
+<li><a href="#org123bf0c">4.1.3. Collection location</a></li>
+<li><a href="#org41b1f83">4.1.4. Sequencing technology</a></li>
+<li><a href="#org9bab62e">4.1.5. Authors</a></li>
</ul>
</li>
-<li><a href="#org959072e">3.2. Optional fields</a>
+<li><a href="#org7071af8">4.2. Optional fields</a>
<ul>
-<li><a href="#org561b754">3.2.1. Host information</a></li>
-<li><a href="#org774a993">3.2.2. Collecting institution</a></li>
-<li><a href="#orgcf096cf">3.2.3. Specimen source</a></li>
-<li><a href="#orgeac0fd8">3.2.4. Source database accession</a></li>
-<li><a href="#org3c0aebd">3.2.5. Strain name</a></li>
+<li><a href="#org2a04fdb">4.2.1. Host information</a></li>
+<li><a href="#orgc4084bc">4.2.2. Collecting institution</a></li>
+<li><a href="#orge552325">4.2.3. Specimen source</a></li>
+<li><a href="#org2577e1f">4.2.4. Source database accession</a></li>
+<li><a href="#org0305fb3">4.2.5. Strain name</a></li>
</ul>
</li>
</ul>
</li>
-<li><a href="#org9f09957">4. Step 3: Submit to COVID-19 PubSeq</a>
+<li><a href="#orgdf67705">5. Step 3: Submit to COVID-19 PubSeq</a>
<ul>
-<li><a href="#org25372da">4.1. Trouble shooting</a></li>
+<li><a href="#orgd33218c">5.1. Trouble shooting</a></li>
</ul>
</li>
-<li><a href="#org8d1b4ad">5. Step 4: Check output</a></li>
-<li><a href="#orgd86b3dc">6. Bulk sequence uploader</a>
+<li><a href="#orgbf4cd0f">6. Step 4: Check output</a></li>
+<li><a href="#orgc8d6fa4">7. Bulk sequence uploader</a>
<ul>
-<li><a href="#orgc4aa7a1">6.1. Run the uploader (CLI)</a></li>
-<li><a href="#org46687b5">6.2. Example: uploading bulk GenBank sequences</a></li>
-<li><a href="#orgbc228bc">6.3. Example: preparing metadata</a></li>
+<li><a href="#org338ebf7">7.1. Run the uploader (CLI)</a></li>
+<li><a href="#org46d5e2f">7.2. Example: uploading bulk GenBank sequences</a></li>
+<li><a href="#orgbfc3f90">7.3. Example: preparing metadata</a></li>
</ul>
</li>
</ul>
</div>
</div>
+<div id="outline-container-org85998fd" class="outline-2">
+<h2 id="org85998fd"><span class="section-number-2">1</span> Introduction</h2>
+<div class="outline-text-2" id="text-1">
+<p>
+In this document we explain how to upload data into COVID-19 PubSeq.
+This can happen through a web page, or through a command line
+script. We'll also show how to parametrize uploads by using templates.
+The procedure is much easier than with other repositories and can be
+fully automated. Once uploaded you can use our export API to prepare
+for other repositories.
+</p>
+</div>
+</div>
-<div id="outline-container-orga9eabf3" class="outline-2">
-<h2 id="orga9eabf3"><span class="section-number-2">1</span> Uploading Data</h2>
-<div class="outline-text-2" id="text-1">
+<div id="outline-container-orge783233" class="outline-2">
+<h2 id="orge783233"><span class="section-number-2">2</span> Uploading data</h2>
+<div class="outline-text-2" id="text-2">
<p>
The COVID-19 PubSeq allows you to upload your SARS-Cov-2 strains to a
public resource for global comparisons. A recompute of the pangenome
@@ -278,9 +292,9 @@ gets triggered on upload. Read the <a href="./about">ABOUT</a> page for more inf
</div>
</div>
-<div id="outline-container-org643e745" class="outline-2">
-<h2 id="org643e745"><span class="section-number-2">2</span> Step 1: Upload sequence</h2>
-<div class="outline-text-2" id="text-2">
+<div id="outline-container-orgc5810d7" class="outline-2">
+<h2 id="orgc5810d7"><span class="section-number-2">3</span> Step 1: Upload sequence</h2>
+<div class="outline-text-2" id="text-3">
<p>
To upload a sequence in the <a href="http://covid19.genenetwork.org/">web upload page</a> hit the browse button and
select the FASTA file on your local hard disk.
@@ -307,9 +321,9 @@ an improved pangenome.
</div>
</div>
-<div id="outline-container-org0874b9f" class="outline-2">
-<h2 id="org0874b9f"><span class="section-number-2">3</span> Step 2: Add metadata</h2>
-<div class="outline-text-2" id="text-3">
+<div id="outline-container-org5a4ae99" class="outline-2">
+<h2 id="org5a4ae99"><span class="section-number-2">4</span> Step 2: Add metadata</h2>
+<div class="outline-text-2" id="text-4">
<p>
The <a href="./">web upload page</a> contains fields for adding metadata. Metadata is
not only important for attribution, is also important for
@@ -334,13 +348,13 @@ the web form. Here we add some extra information.
</p>
</div>
-<div id="outline-container-orgaaa44f2" class="outline-3">
-<h3 id="orgaaa44f2"><span class="section-number-3">3.1</span> Obligatory fields</h3>
-<div class="outline-text-3" id="text-3-1">
+<div id="outline-container-orga9824de" class="outline-3">
+<h3 id="orga9824de"><span class="section-number-3">4.1</span> Obligatory fields</h3>
+<div class="outline-text-3" id="text-4-1">
</div>
-<div id="outline-container-orgf38cdbf" class="outline-4">
-<h4 id="orgf38cdbf"><span class="section-number-4">3.1.1</span> Sample ID (sample_id)</h4>
-<div class="outline-text-4" id="text-3-1-1">
+<div id="outline-container-org407fde2" class="outline-4">
+<h4 id="org407fde2"><span class="section-number-4">4.1.1</span> Sample ID (sample_id)</h4>
+<div class="outline-text-4" id="text-4-1-1">
<p>
This is a string field that defines a unique sample identifier by the
submitter. In addition to sample_id we also have host_id,
@@ -357,18 +371,18 @@ Here we add the GenBank ID MT536190.1.
</div>
</div>
-<div id="outline-container-org34b5b06" class="outline-4">
-<h4 id="org34b5b06"><span class="section-number-4">3.1.2</span> Collection date</h4>
-<div class="outline-text-4" id="text-3-1-2">
+<div id="outline-container-orgee3bb35" class="outline-4">
+<h4 id="orgee3bb35"><span class="section-number-4">4.1.2</span> Collection date</h4>
+<div class="outline-text-4" id="text-4-1-2">
<p>
Estimated collection date. The GenBank page says April 6, 2020.
</p>
</div>
</div>
-<div id="outline-container-org221f1cf" class="outline-4">
-<h4 id="org221f1cf"><span class="section-number-4">3.1.3</span> Collection location</h4>
-<div class="outline-text-4" id="text-3-1-3">
+<div id="outline-container-org123bf0c" class="outline-4">
+<h4 id="org123bf0c"><span class="section-number-4">4.1.3</span> Collection location</h4>
+<div class="outline-text-4" id="text-4-1-3">
<p>
A search on wikidata says Los Angeles is
<a href="https://www.wikidata.org/entity/Q65">https://www.wikidata.org/entity/Q65</a>
@@ -376,18 +390,18 @@ A search on wikidata says Los Angeles is
</div>
</div>
-<div id="outline-container-org75d1dad" class="outline-4">
-<h4 id="org75d1dad"><span class="section-number-4">3.1.4</span> Sequencing technology</h4>
-<div class="outline-text-4" id="text-3-1-4">
+<div id="outline-container-org41b1f83" class="outline-4">
+<h4 id="org41b1f83"><span class="section-number-4">4.1.4</span> Sequencing technology</h4>
+<div class="outline-text-4" id="text-4-1-4">
<p>
GenBank entry says Illumina, so we can fill that in
</p>
</div>
</div>
-<div id="outline-container-org990e897" class="outline-4">
-<h4 id="org990e897"><span class="section-number-4">3.1.5</span> Authors</h4>
-<div class="outline-text-4" id="text-3-1-5">
+<div id="outline-container-org9bab62e" class="outline-4">
+<h4 id="org9bab62e"><span class="section-number-4">4.1.5</span> Authors</h4>
+<div class="outline-text-4" id="text-4-1-5">
<p>
GenBank entry says 'Lamers,S., Nolan,D.J., Rose,R., Cross,S., Moraga
Amador,D., Yang,T., Caruso,L., Navia,W., Von Borstel,L., Hui Zhou,X.,
@@ -397,17 +411,17 @@ Freehan,A. and Garcia-Diaz,J.', so we can fill that in.
</div>
</div>
-<div id="outline-container-org959072e" class="outline-3">
-<h3 id="org959072e"><span class="section-number-3">3.2</span> Optional fields</h3>
-<div class="outline-text-3" id="text-3-2">
+<div id="outline-container-org7071af8" class="outline-3">
+<h3 id="org7071af8"><span class="section-number-3">4.2</span> Optional fields</h3>
+<div class="outline-text-3" id="text-4-2">
<p>
All other fields are optional. But let's see what we can add.
</p>
</div>
-<div id="outline-container-org561b754" class="outline-4">
-<h4 id="org561b754"><span class="section-number-4">3.2.1</span> Host information</h4>
-<div class="outline-text-4" id="text-3-2-1">
+<div id="outline-container-org2a04fdb" class="outline-4">
+<h4 id="org2a04fdb"><span class="section-number-4">4.2.1</span> Host information</h4>
+<div class="outline-text-4" id="text-4-2-1">
<p>
Sadly, not much is known about the host from GenBank. A little
sleuthing renders an interesting paper by some of the authors titled
@@ -420,27 +434,27 @@ did to the person and what the person was like (say age group).
</div>
</div>
-<div id="outline-container-org774a993" class="outline-4">
-<h4 id="org774a993"><span class="section-number-4">3.2.2</span> Collecting institution</h4>
-<div class="outline-text-4" id="text-3-2-2">
+<div id="outline-container-orgc4084bc" class="outline-4">
+<h4 id="orgc4084bc"><span class="section-number-4">4.2.2</span> Collecting institution</h4>
+<div class="outline-text-4" id="text-4-2-2">
<p>
We can fill that in.
</p>
</div>
</div>
-<div id="outline-container-orgcf096cf" class="outline-4">
-<h4 id="orgcf096cf"><span class="section-number-4">3.2.3</span> Specimen source</h4>
-<div class="outline-text-4" id="text-3-2-3">
+<div id="outline-container-orge552325" class="outline-4">
+<h4 id="orge552325"><span class="section-number-4">4.2.3</span> Specimen source</h4>
+<div class="outline-text-4" id="text-4-2-3">
<p>
We have that: nasopharyngeal swab
</p>
</div>
</div>
-<div id="outline-container-orgeac0fd8" class="outline-4">
-<h4 id="orgeac0fd8"><span class="section-number-4">3.2.4</span> Source database accession</h4>
-<div class="outline-text-4" id="text-3-2-4">
+<div id="outline-container-org2577e1f" class="outline-4">
+<h4 id="org2577e1f"><span class="section-number-4">4.2.4</span> Source database accession</h4>
+<div class="outline-text-4" id="text-4-2-4">
<p>
Genbank which is <a href="http://identifiers.org/insdc/MT536190.1#sequence">http://identifiers.org/insdc/MT536190.1#sequence</a>.
Note we plug in our own identifier MT536190.1.
@@ -448,9 +462,9 @@ Note we plug in our own identifier MT536190.1.
</div>
</div>
-<div id="outline-container-org3c0aebd" class="outline-4">
-<h4 id="org3c0aebd"><span class="section-number-4">3.2.5</span> Strain name</h4>
-<div class="outline-text-4" id="text-3-2-5">
+<div id="outline-container-org0305fb3" class="outline-4">
+<h4 id="org0305fb3"><span class="section-number-4">4.2.5</span> Strain name</h4>
+<div class="outline-text-4" id="text-4-2-5">
<p>
SARS-CoV-2/human/USA/LA-BIE-070/2020
</p>
@@ -459,9 +473,9 @@ SARS-CoV-2/human/USA/LA-BIE-070/2020
</div>
</div>
-<div id="outline-container-org9f09957" class="outline-2">
-<h2 id="org9f09957"><span class="section-number-2">4</span> Step 3: Submit to COVID-19 PubSeq</h2>
-<div class="outline-text-2" id="text-4">
+<div id="outline-container-orgdf67705" class="outline-2">
+<h2 id="orgdf67705"><span class="section-number-2">5</span> Step 3: Submit to COVID-19 PubSeq</h2>
+<div class="outline-text-2" id="text-5">
<p>
Once you have the sequence and the metadata together, hit
the 'Add to Pangenome' button. The data will be checked,
@@ -470,9 +484,9 @@ submitted and the workflows should kick in!
</div>
-<div id="outline-container-org25372da" class="outline-3">
-<h3 id="org25372da"><span class="section-number-3">4.1</span> Trouble shooting</h3>
-<div class="outline-text-3" id="text-4-1">
+<div id="outline-container-orgd33218c" class="outline-3">
+<h3 id="orgd33218c"><span class="section-number-3">5.1</span> Trouble shooting</h3>
+<div class="outline-text-3" id="text-5-1">
<p>
We got an error saying: {"stem": "<a href="http://www.wikidata.org/entity/">http://www.wikidata.org/entity/</a>",&#x2026;
which means that our location field was not formed correctly! After
@@ -485,9 +499,9 @@ submit button.
</div>
</div>
-<div id="outline-container-org8d1b4ad" class="outline-2">
-<h2 id="org8d1b4ad"><span class="section-number-2">5</span> Step 4: Check output</h2>
-<div class="outline-text-2" id="text-5">
+<div id="outline-container-orgbf4cd0f" class="outline-2">
+<h2 id="orgbf4cd0f"><span class="section-number-2">6</span> Step 4: Check output</h2>
+<div class="outline-text-2" id="text-6">
<p>
The current pipeline takes 5.5 hours to complete! Once it completes
the updated data can be checked on the <a href="./download">DOWNLOAD</a> page. After completion
@@ -497,9 +511,9 @@ in.
</div>
</div>
-<div id="outline-container-orgd86b3dc" class="outline-2">
-<h2 id="orgd86b3dc"><span class="section-number-2">6</span> Bulk sequence uploader</h2>
-<div class="outline-text-2" id="text-6">
+<div id="outline-container-orgc8d6fa4" class="outline-2">
+<h2 id="orgc8d6fa4"><span class="section-number-2">7</span> Bulk sequence uploader</h2>
+<div class="outline-text-2" id="text-7">
<p>
Above steps require a manual upload of one sequence with metadata.
What if you have a number of sequences you want to upload in bulk?
@@ -510,6 +524,39 @@ the web form and gets validated from the same <a href="https://github.com/arvado
that you need to create/generate for your samples looks like
</p>
+<p>
+A minimal example of metadata looks like
+</p>
+
+<div class="org-src-container">
+<pre class="src src-json">id: placeholder
+
+license:
+ license_type: http://creativecommons.org/licenses/by/<span style="color: #8bc34a;">4.0</span>/
+
+host:
+ host_species: http://purl.obolibrary.org/obo/NCBITaxon_<span style="color: #8bc34a;">9606</span>
+
+sample:
+ sample_id: XX
+ collection_date: <span style="color: #9ccc65;">"2020-01-01"</span>
+ collection_location: http://www.wikidata.org/entity/Q<span style="color: #8bc34a;">148</span>
+
+virus:
+ virus_species: http://purl.obolibrary.org/obo/NCBITaxon_<span style="color: #8bc34a;">2697049</span>
+
+technology:
+ sample_sequencing_technology: [http://www.ebi.ac.uk/efo/EFO_<span style="color: #8bc34a;">0008632</span>]
+
+submitter:
+ authors: [John Doe]
+</pre>
+</div>
+
+<p>
+a more elaborate example (note most fields are optional) may look like
+</p>
+
<div class="org-src-container">
<pre class="src src-json">id: placeholder
@@ -559,11 +606,20 @@ submitter:
additional_submitter_information: Optional free text field for additional information
</pre>
</div>
+
+<p>
+more metadata is yummy. <a href="https://yummydata.org/">Yummydata</a> is useful to a wider community. Note
+that many of the terms in above example are URIs, such as
+host_species: <a href="http://purl.obolibrary.org/obo/NCBITaxon_9606">http://purl.obolibrary.org/obo/NCBITaxon_9606</a>. We use
+web ontologies for these to make the data less ambiguous and more
+FAIR. Check out the option fields as defined in the schema. If it is not listed
+a little bit of web searching may be required or <a href="./contact">contact</a> us.
+</p>
</div>
-<div id="outline-container-orgc4aa7a1" class="outline-3">
-<h3 id="orgc4aa7a1"><span class="section-number-3">6.1</span> Run the uploader (CLI)</h3>
-<div class="outline-text-3" id="text-6-1">
+<div id="outline-container-org338ebf7" class="outline-3">
+<h3 id="org338ebf7"><span class="section-number-3">7.1</span> Run the uploader (CLI)</h3>
+<div class="outline-text-3" id="text-7-1">
<p>
Installing with pip you should be
able to run
@@ -574,7 +630,6 @@ bh20sequploader sequence.fasta metadata.yaml
</pre>
-
<p>
Alternatively the script can be installed from <a href="https://github.com/arvados/bh20-seq-resource#installation">github</a>. Run on the
command line
@@ -617,9 +672,9 @@ The web interface using this exact same script so it should just work
</div>
-<div id="outline-container-org46687b5" class="outline-3">
-<h3 id="org46687b5"><span class="section-number-3">6.2</span> Example: uploading bulk GenBank sequences</h3>
-<div class="outline-text-3" id="text-6-2">
+<div id="outline-container-org46d5e2f" class="outline-3">
+<h3 id="org46d5e2f"><span class="section-number-3">7.2</span> Example: uploading bulk GenBank sequences</h3>
+<div class="outline-text-3" id="text-7-2">
<p>
We also use above script to bulk upload GenBank sequences with a <a href="https://github.com/arvados/bh20-seq-resource/blob/master/scripts/download_genbank_data/from_genbank_to_fasta_and_yaml.py">FASTA
and YAML</a> extractor specific for GenBank. This means that the steps we
@@ -645,14 +700,15 @@ ls $<span style="color: #ffcc80;">dir_fasta_and_yaml</span>/*.yaml | <span style
</div>
-<div id="outline-container-orgbc228bc" class="outline-3">
-<h3 id="orgbc228bc"><span class="section-number-3">6.3</span> Example: preparing metadata</h3>
-<div class="outline-text-3" id="text-6-3">
+<div id="outline-container-orgbfc3f90" class="outline-3">
+<h3 id="orgbfc3f90"><span class="section-number-3">7.3</span> Example: preparing metadata</h3>
+<div class="outline-text-3" id="text-7-3">
<p>
-Usually, metadata are available in tabular format, like spreadsheets. As an example, we provide a script
-<a href="https://github.com/arvados/bh20-seq-resource/tree/master/scripts/esr_samples">esr_samples.py</a> to show you how to parse
-your metadata in YAML files ready for the upload. To execute the script, go in the ~bh20-seq-resource/scripts/esr_samples
-and execute
+Usually, metadata are available in a tabular format, such as
+spreadsheets. As an example, we provide a script <a href="https://github.com/arvados/bh20-seq-resource/tree/master/scripts/esr_samples">esr_samples.py</a> to
+show you how to parse your metadata in YAML files ready for the
+upload. To execute the script, go in the
+~bh20-seq-resource/scripts/esr_samples and execute
</p>
<div class="org-src-container">
@@ -661,14 +717,27 @@ and execute
</div>
<p>
-You will find the YAML files in the `yaml` folder which will be created in the same directory.
+You will find the YAML files in the `yaml` folder which will be
+created in the same directory.
+</p>
+
+<p>
+In the example we use Python pandas to read the spreadsheet into a
+tabular structure. Next we use a <a href="https://github.com/arvados/bh20-seq-resource/blob/master/scripts/esr_samples/template.yaml">template.yaml</a> file that gets filled
+in by <code>esr_samples.py</code> so we get a metadata YAML file for each sample.
+</p>
+
+<p>
+Next run the earlier CLI uploader for each YAML and FASTA combination.
+It can't be much easier than this. For ESR we uploaded a batch of 600
+sequences this way. See <a href="http://covid19.genenetwork.org/resource/20VR0995">example</a>.
</p>
</div>
</div>
</div>
</div>
<div id="postamble" class="status">
-<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-10-27 Tue 06:43</small>.
+<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-11-05 Thu 07:27</small>.
</div>
</body>
</html>
diff --git a/doc/blog/using-covid-19-pubseq-part3.org b/doc/blog/using-covid-19-pubseq-part3.org
index fb68251..d0d6c7f 100644
--- a/doc/blog/using-covid-19-pubseq-part3.org
+++ b/doc/blog/using-covid-19-pubseq-part3.org
@@ -7,10 +7,19 @@
#+HTML_HEAD: <link rel="Blog stylesheet" type="text/css" href="blog.css" />
#+OPTIONS: ^:nil
+* Introduction
+
+In this document we explain how to upload data into COVID-19 PubSeq.
+This can happen through a web page, or through a command line
+script. We'll also show how to parametrize uploads by using templates.
+The procedure is much easier than with other repositories and can be
+fully automated. Once uploaded you can use our export API to prepare
+for other repositories.
* Table of Contents :TOC:noexport:
- - [[#uploading-data][Uploading Data]]
+ - [[#introduction][Introduction]]
+ - [[#uploading-data][Uploading data]]
- [[#step-1-upload-sequence][Step 1: Upload sequence]]
- [[#step-2-add-metadata][Step 2: Add metadata]]
- [[#obligatory-fields][Obligatory fields]]
@@ -23,7 +32,7 @@
- [[#example-uploading-bulk-genbank-sequences][Example: uploading bulk GenBank sequences]]
- [[#example-preparing-metadata][Example: preparing metadata]]
-* Uploading Data
+* Uploading data
The COVID-19 PubSeq allows you to upload your SARS-Cov-2 strains to a
public resource for global comparisons. A recompute of the pangenome
@@ -165,55 +174,91 @@ file an associated metadata in [[https://github.com/arvados/bh20-seq-resource/bl
the web form and gets validated from the same [[https://github.com/arvados/bh20-seq-resource/blob/master/bh20sequploader/bh20seq-schema.yml][schema]] looks. The YAML
that you need to create/generate for your samples looks like
+A minimal example of metadata looks like
+
+#+begin_src json
+ id: placeholder
+
+ license:
+ license_type: http://creativecommons.org/licenses/by/4.0/
+
+ host:
+ host_species: http://purl.obolibrary.org/obo/NCBITaxon_9606
+
+ sample:
+ sample_id: XX
+ collection_date: "2020-01-01"
+ collection_location: http://www.wikidata.org/entity/Q148
+
+ virus:
+ virus_species: http://purl.obolibrary.org/obo/NCBITaxon_2697049
+
+ technology:
+ sample_sequencing_technology: [http://www.ebi.ac.uk/efo/EFO_0008632]
+
+ submitter:
+ authors: [John Doe]
+#+end_src
+
+a more elaborate example (note most fields are optional) may look like
+
#+begin_src json
-id: placeholder
-
-host:
- host_id: XX1
- host_species: http://purl.obolibrary.org/obo/NCBITaxon_9606
- host_sex: http://purl.obolibrary.org/obo/PATO_0000384
- host_age: 20
- host_age_unit: http://purl.obolibrary.org/obo/UO_0000036
- host_health_status: http://purl.obolibrary.org/obo/NCIT_C25269
- host_treatment: Process in which the act is intended to modify or alter host status (Compounds)
- host_vaccination: [vaccines1,vaccine2]
- ethnicity: http://purl.obolibrary.org/obo/HANCESTRO_0010
- additional_host_information: Optional free text field for additional information
-
-sample:
- sample_id: Id of the sample as defined by the submitter
- collector_name: Name of the person that took the sample
- collecting_institution: Institute that was responsible of sampling
- specimen_source: [http://purl.obolibrary.org/obo/NCIT_C155831,http://purl.obolibrary.org/obo/NCIT_C155835]
- collection_date: "2020-01-01"
- collection_location: http://www.wikidata.org/entity/Q148
- sample_storage_conditions: frozen specimen
- source_database_accession: [http://identifiers.org/insdc/LC522350.1#sequence]
- additional_collection_information: Optional free text field for additional information
-
-virus:
- virus_species: http://purl.obolibrary.org/obo/NCBITaxon_2697049
- virus_strain: SARS-CoV-2/human/CHN/HS_8/2020
-
-technology:
- sample_sequencing_technology: [http://www.ebi.ac.uk/efo/EFO_0009173,http://www.ebi.ac.uk/efo/EFO_0009173]
- sequence_assembly_method: Protocol used for assembly
- sequencing_coverage: [70.0, 100.0]
- additional_technology_information: Optional free text field for additional information
-
-submitter:
- authors: [John Doe, Joe Boe, Jonny Oe]
- submitter_name: [John Doe]
- submitter_address: John Doe's address
- originating_lab: John Doe kitchen
- lab_address: John Doe's address
- provider_sample_id: XXX1
- submitter_sample_id: XXX2
- publication: PMID00001113
- submitter_orcid: [https://orcid.org/0000-0000-0000-0000,https://orcid.org/0000-0000-0000-0001]
- additional_submitter_information: Optional free text field for additional information
+ id: placeholder
+
+ host:
+ host_id: XX1
+ host_species: http://purl.obolibrary.org/obo/NCBITaxon_9606
+ host_sex: http://purl.obolibrary.org/obo/PATO_0000384
+ host_age: 20
+ host_age_unit: http://purl.obolibrary.org/obo/UO_0000036
+ host_health_status: http://purl.obolibrary.org/obo/NCIT_C25269
+ host_treatment: Process in which the act is intended to modify or alter host status (Compounds)
+ host_vaccination: [vaccines1,vaccine2]
+ ethnicity: http://purl.obolibrary.org/obo/HANCESTRO_0010
+ additional_host_information: Optional free text field for additional information
+
+ sample:
+ sample_id: Id of the sample as defined by the submitter
+ collector_name: Name of the person that took the sample
+ collecting_institution: Institute that was responsible of sampling
+ specimen_source: [http://purl.obolibrary.org/obo/NCIT_C155831,http://purl.obolibrary.org/obo/NCIT_C155835]
+ collection_date: "2020-01-01"
+ collection_location: http://www.wikidata.org/entity/Q148
+ sample_storage_conditions: frozen specimen
+ source_database_accession: [http://identifiers.org/insdc/LC522350.1#sequence]
+ additional_collection_information: Optional free text field for additional information
+
+ virus:
+ virus_species: http://purl.obolibrary.org/obo/NCBITaxon_2697049
+ virus_strain: SARS-CoV-2/human/CHN/HS_8/2020
+
+ technology:
+ sample_sequencing_technology: [http://www.ebi.ac.uk/efo/EFO_0009173,http://www.ebi.ac.uk/efo/EFO_0009173]
+ sequence_assembly_method: Protocol used for assembly
+ sequencing_coverage: [70.0, 100.0]
+ additional_technology_information: Optional free text field for additional information
+
+ submitter:
+ authors: [John Doe, Joe Boe, Jonny Oe]
+ submitter_name: [John Doe]
+ submitter_address: John Doe's address
+ originating_lab: John Doe kitchen
+ lab_address: John Doe's address
+ provider_sample_id: XXX1
+ submitter_sample_id: XXX2
+ publication: PMID00001113
+ submitter_orcid: [https://orcid.org/0000-0000-0000-0000,https://orcid.org/0000-0000-0000-0001]
+ additional_submitter_information: Optional free text field for additional information
#+end_src
+more metadata is yummy when stored in RDF. [[https://yummydata.org/][Yummydata]] is useful to a wider community. Note
+that many of the terms in above example are URIs, such as
+host_species: http://purl.obolibrary.org/obo/NCBITaxon_9606. We use
+web ontologies for these to make the data less ambiguous and more
+FAIR. Check out the option fields as defined in the schema. If it is not listed,
+check the [[https://github.com/arvados/bh20-seq-resource/blob/master/semantic_enrichment/labels.ttl][labels.ttl]] file. Also,
+a little bit of web searching may be required or [[./contact][contact]] us.
+
** Run the uploader (CLI)
Installing with pip you should be
@@ -221,7 +266,6 @@ able to run
: bh20sequploader sequence.fasta metadata.yaml
-
Alternatively the script can be installed from [[https://github.com/arvados/bh20-seq-resource#installation][github]]. Run on the
command line
@@ -274,13 +318,23 @@ done
** Example: preparing metadata
-Usually, metadata are available in tabular format, like spreadsheets. As an example, we provide a script
-[[https://github.com/arvados/bh20-seq-resource/tree/master/scripts/esr_samples][esr_samples.py]] to show you how to parse
-your metadata in YAML files ready for the upload. To execute the script, go in the ~bh20-seq-resource/scripts/esr_samples
-and execute
+Usually, metadata are available in a tabular format, such as
+spreadsheets. As an example, we provide a script [[https://github.com/arvados/bh20-seq-resource/tree/master/scripts/esr_samples][esr_samples.py]] to
+show you how to parse your metadata in YAML files ready for the
+upload. To execute the script, go in the
+~bh20-seq-resource/scripts/esr_samples and execute
#+BEGIN_SRC sh
python3 esr_samples.py
#+END_SRC
-You will find the YAML files in the `yaml` folder which will be created in the same directory.
+You will find the YAML files in the `yaml` folder which will be
+created in the same directory.
+
+In the example we use Python pandas to read the spreadsheet into a
+tabular structure. Next we use a [[https://github.com/arvados/bh20-seq-resource/blob/master/scripts/esr_samples/template.yaml][template.yaml]] file that gets filled
+in by ~esr_samples.py~ so we get a metadata YAML file for each sample.
+
+Next run the earlier CLI uploader for each YAML and FASTA combination.
+It can't be much easier than this. For ESR we uploaded a batch of 600
+sequences this way writing a few lines of Python [[https://github.com/arvados/bh20-seq-resource/blob/master/scripts/esr_samples/esr_samples.py][code]]. See [[http://covid19.genenetwork.org/resource/20VR0995][example]].
diff --git a/doc/web/download.html b/doc/web/download.html
index 998c87b..2c8b5f7 100644
--- a/doc/web/download.html
+++ b/doc/web/download.html
@@ -3,7 +3,7 @@
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
-<!-- 2020-08-24 Mon 03:08 -->
+<!-- 2020-11-05 Thu 05:26 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Download</title>
@@ -40,7 +40,7 @@
}
pre.src {
position: relative;
- overflow: visible;
+ overflow: auto;
padding-top: 1.2em;
}
pre.src:before {
@@ -194,50 +194,26 @@
/*]]>*/-->
</style>
<script type="text/javascript">
-/*
-@licstart The following is the entire license notice for the
-JavaScript code in this tag.
-
-Copyright (C) 2012-2020 Free Software Foundation, Inc.
-
-The JavaScript code in this tag is free software: you can
-redistribute it and/or modify it under the terms of the GNU
-General Public License (GNU GPL) as published by the Free Software
-Foundation, either version 3 of the License, or (at your option)
-any later version. The code is distributed WITHOUT ANY WARRANTY;
-without even the implied warranty of MERCHANTABILITY or FITNESS
-FOR A PARTICULAR PURPOSE. See the GNU GPL for more details.
-
-As additional permission under GNU GPL version 3 section 7, you
-may distribute non-source (e.g., minimized or compacted) forms of
-that code without the copy of the GNU GPL normally required by
-section 4, provided you include this license notice and a URL
-through which recipients can access the Corresponding Source.
-
-
-@licend The above is the entire license notice
-for the JavaScript code in this tag.
-*/
+// @license magnet:?xt=urn:btih:e95b018ef3580986a04669f1b5879592219e2a7a&dn=public-domain.txt Public Domain
<!--/*--><![CDATA[/*><!--*/
- function CodeHighlightOn(elem, id)
- {
- var target = document.getElementById(id);
- if(null != target) {
- elem.cacheClassElem = elem.className;
- elem.cacheClassTarget = target.className;
- target.className = "code-highlighted";
- elem.className = "code-highlighted";
- }
- }
- function CodeHighlightOff(elem, id)
- {
- var target = document.getElementById(id);
- if(elem.cacheClassElem)
- elem.className = elem.cacheClassElem;
- if(elem.cacheClassTarget)
- target.className = elem.cacheClassTarget;
- }
-/*]]>*///-->
+ function CodeHighlightOn(elem, id)
+ {
+ var target = document.getElementById(id);
+ if(null != target) {
+ elem.classList.add("code-highlighted");
+ target.classList.add("code-highlighted");
+ }
+ }
+ function CodeHighlightOff(elem, id)
+ {
+ var target = document.getElementById(id);
+ if(null != target) {
+ elem.classList.remove("code-highlighted");
+ target.classList.remove("code-highlighted");
+ }
+ }
+ /*]]>*///-->
+// @license-end
</script>
</head>
<body>
@@ -247,35 +223,35 @@ for the JavaScript code in this tag.
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
-<li><a href="#orgc7d7db3">1. Workflow runs</a></li>
-<li><a href="#org3b6bb40">2. FASTA files</a></li>
-<li><a href="#orgef20a06">3. Metadata</a></li>
-<li><a href="#orgaabb9da">4. Pangenome</a>
+<li><a href="#orgf894fd0">1. Workflow runs</a></li>
+<li><a href="#org51b7162">2. FASTA files</a></li>
+<li><a href="#org572ed1f">3. Metadata</a></li>
+<li><a href="#org6b6b3d3">4. Pangenome</a>
<ul>
-<li><a href="#org1171db3">4.1. Pangenome GFA format</a></li>
-<li><a href="#orgfc74439">4.2. Pangenome in ODGI format</a></li>
-<li><a href="#orge6bb923">4.3. Pangenome RDF format</a></li>
-<li><a href="#org5978f5c">4.4. Pangenome Browser format</a></li>
+<li><a href="#org9f4caf2">4.1. Pangenome GFA format</a></li>
+<li><a href="#org110286f">4.2. Pangenome in ODGI format</a></li>
+<li><a href="#orgc83e017">4.3. Pangenome RDF format</a></li>
+<li><a href="#org8b1948b">4.4. Pangenome Browser format</a></li>
</ul>
</li>
-<li><a href="#orgae23127">5. Log of workflow output</a></li>
-<li><a href="#org88613e5">6. All files</a></li>
-<li><a href="#org97e0327">7. Planned</a>
+<li><a href="#org3848ad3">5. Log of workflow output</a></li>
+<li><a href="#org3ecd561">6. All files</a></li>
+<li><a href="#orga83843e">7. Planned</a>
<ul>
-<li><a href="#org35758f9">7.1. Raw sequence data</a></li>
-<li><a href="#orgab1c848">7.2. Multiple Sequence Alignment (MSA)</a></li>
-<li><a href="#orgadb7ade">7.3. Phylogenetic tree</a></li>
-<li><a href="#org3ec62c4">7.4. Protein prediction</a></li>
+<li><a href="#orgcdeb8a1">7.1. Raw sequence data</a></li>
+<li><a href="#org25b78b5">7.2. Multiple Sequence Alignment (MSA)</a></li>
+<li><a href="#org02f524d">7.3. Phylogenetic tree</a></li>
+<li><a href="#org78366fa">7.4. Protein prediction</a></li>
</ul>
</li>
-<li><a href="#orga430457">8. Source code</a></li>
-<li><a href="#org768f91e">9. Citing PubSeq</a></li>
+<li><a href="#org3344c28">8. Source code</a></li>
+<li><a href="#orgdd54c12">9. Citing PubSeq</a></li>
</ul>
</div>
</div>
-<div id="outline-container-orgc7d7db3" class="outline-2">
-<h2 id="orgc7d7db3"><span class="section-number-2">1</span> Workflow runs</h2>
+<div id="outline-container-orgf894fd0" class="outline-2">
+<h2 id="orgf894fd0"><span class="section-number-2">1</span> Workflow runs</h2>
<div class="outline-text-2" id="text-1">
<p>
The last runs can be viewed <a href="https://workbench.lugli.arvadosapi.com/projects/lugli-j7d0g-y4k4uswcqi3ku56#Subprojects">here</a>. If you click on a run you can see
@@ -286,8 +262,8 @@ is listed under <code>Data collections</code>. All current data is listed
</div>
</div>
-<div id="outline-container-org3b6bb40" class="outline-2">
-<h2 id="org3b6bb40"><span class="section-number-2">2</span> FASTA files</h2>
+<div id="outline-container-org51b7162" class="outline-2">
+<h2 id="org51b7162"><span class="section-number-2">2</span> FASTA files</h2>
<div class="outline-text-2" id="text-2">
<p>
The <b>public sequence resource</b> provides all uploaded sequences as
@@ -297,15 +273,15 @@ also provide a single file <a href="https://collections.lugli.arvadosapi.com/c=l
</div>
</div>
-<div id="outline-container-orgef20a06" class="outline-2">
-<h2 id="orgef20a06"><span class="section-number-2">3</span> Metadata</h2>
+<div id="outline-container-org572ed1f" class="outline-2">
+<h2 id="org572ed1f"><span class="section-number-2">3</span> Metadata</h2>
<div class="outline-text-2" id="text-3">
<p>
Metadata can be downloaded as <a href="https://www.w3.org/TR/turtle/">Turtle RDF</a> as a <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/mergedmetadata.ttl">mergedmetadat.ttl</a> which
can be loaded into any RDF triple-store. We provide a Virtuoso SPARQL
endpoint ourselves which can be queried from
<a href="http://sparql.genenetwork.org/sparql/">http://sparql.genenetwork.org/sparql/</a>. Query examples can be found in
-our <a href="https://github.com/arvados/bh20-seq-resource/blob/master/doc/blog/using-covid-19-pubseq-part1.org">BLOG</a>.
+the <a href="https://github.com/arvados/bh20-seq-resource/blob/master/doc/blog/using-covid-19-pubseq-part1.org">DOCS</a>
</p>
<p>
@@ -320,8 +296,8 @@ graph can be downloaded from below Pangenome RDF format.
</div>
</div>
-<div id="outline-container-orgaabb9da" class="outline-2">
-<h2 id="orgaabb9da"><span class="section-number-2">4</span> Pangenome</h2>
+<div id="outline-container-org6b6b3d3" class="outline-2">
+<h2 id="org6b6b3d3"><span class="section-number-2">4</span> Pangenome</h2>
<div class="outline-text-2" id="text-4">
<p>
Pangenome data is made available in multiple guises. Variation graphs
@@ -329,8 +305,8 @@ Pangenome data is made available in multiple guises. Variation graphs
</p>
</div>
-<div id="outline-container-org1171db3" class="outline-3">
-<h3 id="org1171db3"><span class="section-number-3">4.1</span> Pangenome GFA format</h3>
+<div id="outline-container-org9f4caf2" class="outline-3">
+<h3 id="org9f4caf2"><span class="section-number-3">4.1</span> Pangenome GFA format</h3>
<div class="outline-text-3" id="text-4-1">
<p>
<a href="https://github.com/GFA-spec/GFA-spec">GFA</a> is a standard for graphical fragment assembly and consumed
@@ -339,8 +315,8 @@ by tools such as <a href="https://github.com/vgteam/vg">vgtools</a>.
</div>
</div>
-<div id="outline-container-orgfc74439" class="outline-3">
-<h3 id="orgfc74439"><span class="section-number-3">4.2</span> Pangenome in ODGI format</h3>
+<div id="outline-container-org110286f" class="outline-3">
+<h3 id="org110286f"><span class="section-number-3">4.2</span> Pangenome in ODGI format</h3>
<div class="outline-text-3" id="text-4-2">
<p>
<a href="https://github.com/vgteam/odgi">ODGI</a> is a format that supports an optimised dynamic genome/graph
@@ -349,8 +325,8 @@ implementation.
</div>
</div>
-<div id="outline-container-orge6bb923" class="outline-3">
-<h3 id="orge6bb923"><span class="section-number-3">4.3</span> Pangenome RDF format</h3>
+<div id="outline-container-orgc83e017" class="outline-3">
+<h3 id="orgc83e017"><span class="section-number-3">4.3</span> Pangenome RDF format</h3>
<div class="outline-text-3" id="text-4-3">
<p>
An RDF file that includes the sequences themselves in a variation
@@ -361,8 +337,8 @@ graph can be downloaded from
</div>
-<div id="outline-container-org5978f5c" class="outline-3">
-<h3 id="org5978f5c"><span class="section-number-3">4.4</span> Pangenome Browser format</h3>
+<div id="outline-container-org8b1948b" class="outline-3">
+<h3 id="org8b1948b"><span class="section-number-3">4.4</span> Pangenome Browser format</h3>
<div class="outline-text-3" id="text-4-4">
<p>
The many JSON files that are named as
@@ -373,8 +349,8 @@ Pangenome browser.
</div>
</div>
-<div id="outline-container-orgae23127" class="outline-2">
-<h2 id="orgae23127"><span class="section-number-2">5</span> Log of workflow output</h2>
+<div id="outline-container-org3848ad3" class="outline-2">
+<h2 id="org3848ad3"><span class="section-number-2">5</span> Log of workflow output</h2>
<div class="outline-text-2" id="text-5">
<p>
Including in below link is a log file of the last workflow runs.
@@ -382,8 +358,8 @@ Including in below link is a log file of the last workflow runs.
</div>
</div>
-<div id="outline-container-org88613e5" class="outline-2">
-<h2 id="org88613e5"><span class="section-number-2">6</span> All files</h2>
+<div id="outline-container-org3ecd561" class="outline-2">
+<h2 id="org3ecd561"><span class="section-number-2">6</span> All files</h2>
<div class="outline-text-2" id="text-6">
<p>
<a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/</a>
@@ -391,16 +367,16 @@ Including in below link is a log file of the last workflow runs.
</div>
</div>
-<div id="outline-container-org97e0327" class="outline-2">
-<h2 id="org97e0327"><span class="section-number-2">7</span> Planned</h2>
+<div id="outline-container-orga83843e" class="outline-2">
+<h2 id="orga83843e"><span class="section-number-2">7</span> Planned</h2>
<div class="outline-text-2" id="text-7">
<p>
We are planning the add the following output (see also
</p>
</div>
-<div id="outline-container-org35758f9" class="outline-3">
-<h3 id="org35758f9"><span class="section-number-3">7.1</span> Raw sequence data</h3>
+<div id="outline-container-orgcdeb8a1" class="outline-3">
+<h3 id="orgcdeb8a1"><span class="section-number-3">7.1</span> Raw sequence data</h3>
<div class="outline-text-3" id="text-7-1">
<p>
See <a href="https://github.com/arvados/bh20-seq-resource/issues/16">fastq tracker</a> and <a href="https://github.com/arvados/bh20-seq-resource/issues/63">BAM tracker</a>.
@@ -408,8 +384,8 @@ See <a href="https://github.com/arvados/bh20-seq-resource/issues/16">fastq track
</div>
</div>
-<div id="outline-container-orgab1c848" class="outline-3">
-<h3 id="orgab1c848"><span class="section-number-3">7.2</span> Multiple Sequence Alignment (MSA)</h3>
+<div id="outline-container-org25b78b5" class="outline-3">
+<h3 id="org25b78b5"><span class="section-number-3">7.2</span> Multiple Sequence Alignment (MSA)</h3>
<div class="outline-text-3" id="text-7-2">
<p>
See <a href="https://github.com/arvados/bh20-seq-resource/issues/11">MSA tracker</a>.
@@ -417,8 +393,8 @@ See <a href="https://github.com/arvados/bh20-seq-resource/issues/11">MSA tracker
</div>
</div>
-<div id="outline-container-orgadb7ade" class="outline-3">
-<h3 id="orgadb7ade"><span class="section-number-3">7.3</span> Phylogenetic tree</h3>
+<div id="outline-container-org02f524d" class="outline-3">
+<h3 id="org02f524d"><span class="section-number-3">7.3</span> Phylogenetic tree</h3>
<div class="outline-text-3" id="text-7-3">
<p>
See <a href="https://github.com/arvados/bh20-seq-resource/issues/43">Phylo tracker</a>.
@@ -426,8 +402,8 @@ See <a href="https://github.com/arvados/bh20-seq-resource/issues/43">Phylo track
</div>
</div>
-<div id="outline-container-org3ec62c4" class="outline-3">
-<h3 id="org3ec62c4"><span class="section-number-3">7.4</span> Protein prediction</h3>
+<div id="outline-container-org78366fa" class="outline-3">
+<h3 id="org78366fa"><span class="section-number-3">7.4</span> Protein prediction</h3>
<div class="outline-text-3" id="text-7-4">
<p>
We aim to make protein predictions available.
@@ -436,8 +412,8 @@ We aim to make protein predictions available.
</div>
</div>
-<div id="outline-container-orga430457" class="outline-2">
-<h2 id="orga430457"><span class="section-number-2">8</span> Source code</h2>
+<div id="outline-container-org3344c28" class="outline-2">
+<h2 id="org3344c28"><span class="section-number-2">8</span> Source code</h2>
<div class="outline-text-2" id="text-8">
<p>
All source code for this website and tooling is available
@@ -447,8 +423,8 @@ from
</div>
</div>
-<div id="outline-container-org768f91e" class="outline-2">
-<h2 id="org768f91e"><span class="section-number-2">9</span> Citing PubSeq</h2>
+<div id="outline-container-orgdd54c12" class="outline-2">
+<h2 id="orgdd54c12"><span class="section-number-2">9</span> Citing PubSeq</h2>
<div class="outline-text-2" id="text-9">
<p>
See the <a href="./about">FAQ</a>.
@@ -457,7 +433,7 @@ See the <a href="./about">FAQ</a>.
</div>
</div>
<div id="postamble" class="status">
-<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-08-24 Mon 03:07</small>.
+<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-11-05 Thu 05:26</small>.
</div>
</body>
</html>
diff --git a/doc/web/download.org b/doc/web/download.org
index a3f1949..44fbeb1 100644
--- a/doc/web/download.org
+++ b/doc/web/download.org
@@ -39,7 +39,7 @@ Metadata can be downloaded as [[https://www.w3.org/TR/turtle/][Turtle RDF]] as a
can be loaded into any RDF triple-store. We provide a Virtuoso SPARQL
endpoint ourselves which can be queried from
http://sparql.genenetwork.org/sparql/. Query examples can be found in
-our [[https://github.com/arvados/bh20-seq-resource/blob/master/doc/blog/using-covid-19-pubseq-part1.org][BLOG]].
+the [[https://github.com/arvados/bh20-seq-resource/blob/master/doc/blog/using-covid-19-pubseq-part1.org][DOCS]]
The Swiss Institute of Bioinformatics has included this data in
https://covid-19-sparql.expasy.org/ and made it part of [[https://www.uniprot.org/][Uniprot]].