aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorPjotr Prins2020-05-24 11:16:47 -0500
committerPjotr Prins2020-05-24 11:16:47 -0500
commite4738edf99cb96214db066079adae021c25bc059 (patch)
tree2215e5b668d86b08bde67259c976d14560f6f5f1 /doc
parentc3bbd48601cdb4bec510db72bd2296724874f4f3 (diff)
downloadbh20-seq-resource-e4738edf99cb96214db066079adae021c25bc059.tar.gz
bh20-seq-resource-e4738edf99cb96214db066079adae021c25bc059.tar.lz
bh20-seq-resource-e4738edf99cb96214db066079adae021c25bc059.zip
Download page
Diffstat (limited to 'doc')
-rw-r--r--doc/web/about.org15
-rw-r--r--doc/web/download.html375
-rw-r--r--doc/web/download.org69
3 files changed, 454 insertions, 5 deletions
diff --git a/doc/web/about.org b/doc/web/about.org
index fc9d1ff..26b675d 100644
--- a/doc/web/about.org
+++ b/doc/web/about.org
@@ -27,13 +27,15 @@ sequence comparison and protein prediction.
* Who created the public sequence resource?
The *public sequence resource* is an initiative by [[https://github.com/arvados/bh20-seq-resource/graphs/contributors][bioinformatics]] and
-ontology experts who want to create something agile and useful for
-the wider research community. The initiative started at the COVID-19
+ontology experts who want to create something agile and useful for the
+wider research community. The initiative started at the COVID-19
biohackathon in April 2020 and is ongoing. The main project drivers
are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Michael Crusoe (Common
-Workflow Language) and Thomas Liener (consultant, formerly EBI). But
-as this is a free software initiative the project represents major
-work by hundreds of software developers and ontology and data
+Workflow Language), Thomas Liener (consultant, formerly EBI) and
+Jerven Bolleman (Swiss Institute of Bioinformatics).
+
+Notably, as this is a free software initiative, the project represents
+major work by hundreds of software developers and ontology and data
wrangling experts. Thank you everyone!
* How does the public sequence resource compare to other data resources?
@@ -62,6 +64,9 @@ public resources, including GISAID.
3. There is no need to set up pipelines and/or compute clusters
4. All workflows get triggered on uploading a new sequence
4. When someone (you?) improves the software/workflows and everyone benefits
+4. Your data gets automatically integrated with the Swiss Institure of
+ Bioinformatics COVID-19 knowledge base
+ https://covid-19-sparql.expasy.org/ (Elixir Switzerland)
Finally, if you upload your data here we have workflows that output
formatted data suitable for uploading to EBI resources (and soon
diff --git a/doc/web/download.html b/doc/web/download.html
new file mode 100644
index 0000000..879e8d4
--- /dev/null
+++ b/doc/web/download.html
@@ -0,0 +1,375 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
+<head>
+<!-- 2020-05-24 Sun 11:11 -->
+<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<title>Download</title>
+<meta name="generator" content="Org mode" />
+<meta name="author" content="Pjotr Prins" />
+<style type="text/css">
+ <!--/*--><![CDATA[/*><!--*/
+ .title { text-align: center;
+ margin-bottom: .2em; }
+ .subtitle { text-align: center;
+ font-size: medium;
+ font-weight: bold;
+ margin-top:0; }
+ .todo { font-family: monospace; color: red; }
+ .done { font-family: monospace; color: green; }
+ .priority { font-family: monospace; color: orange; }
+ .tag { background-color: #eee; font-family: monospace;
+ padding: 2px; font-size: 80%; font-weight: normal; }
+ .timestamp { color: #bebebe; }
+ .timestamp-kwd { color: #5f9ea0; }
+ .org-right { margin-left: auto; margin-right: 0px; text-align: right; }
+ .org-left { margin-left: 0px; margin-right: auto; text-align: left; }
+ .org-center { margin-left: auto; margin-right: auto; text-align: center; }
+ .underline { text-decoration: underline; }
+ #postamble p, #preamble p { font-size: 90%; margin: .2em; }
+ p.verse { margin-left: 3%; }
+ pre {
+ border: 1px solid #ccc;
+ box-shadow: 3px 3px 3px #eee;
+ padding: 8pt;
+ font-family: monospace;
+ overflow: auto;
+ margin: 1.2em;
+ }
+ pre.src {
+ position: relative;
+ overflow: visible;
+ padding-top: 1.2em;
+ }
+ pre.src:before {
+ display: none;
+ position: absolute;
+ background-color: white;
+ top: -10px;
+ right: 10px;
+ padding: 3px;
+ border: 1px solid black;
+ }
+ pre.src:hover:before { display: inline;}
+ /* Languages per Org manual */
+ pre.src-asymptote:before { content: 'Asymptote'; }
+ pre.src-awk:before { content: 'Awk'; }
+ pre.src-C:before { content: 'C'; }
+ /* pre.src-C++ doesn't work in CSS */
+ pre.src-clojure:before { content: 'Clojure'; }
+ pre.src-css:before { content: 'CSS'; }
+ pre.src-D:before { content: 'D'; }
+ pre.src-ditaa:before { content: 'ditaa'; }
+ pre.src-dot:before { content: 'Graphviz'; }
+ pre.src-calc:before { content: 'Emacs Calc'; }
+ pre.src-emacs-lisp:before { content: 'Emacs Lisp'; }
+ pre.src-fortran:before { content: 'Fortran'; }
+ pre.src-gnuplot:before { content: 'gnuplot'; }
+ pre.src-haskell:before { content: 'Haskell'; }
+ pre.src-hledger:before { content: 'hledger'; }
+ pre.src-java:before { content: 'Java'; }
+ pre.src-js:before { content: 'Javascript'; }
+ pre.src-latex:before { content: 'LaTeX'; }
+ pre.src-ledger:before { content: 'Ledger'; }
+ pre.src-lisp:before { content: 'Lisp'; }
+ pre.src-lilypond:before { content: 'Lilypond'; }
+ pre.src-lua:before { content: 'Lua'; }
+ pre.src-matlab:before { content: 'MATLAB'; }
+ pre.src-mscgen:before { content: 'Mscgen'; }
+ pre.src-ocaml:before { content: 'Objective Caml'; }
+ pre.src-octave:before { content: 'Octave'; }
+ pre.src-org:before { content: 'Org mode'; }
+ pre.src-oz:before { content: 'OZ'; }
+ pre.src-plantuml:before { content: 'Plantuml'; }
+ pre.src-processing:before { content: 'Processing.js'; }
+ pre.src-python:before { content: 'Python'; }
+ pre.src-R:before { content: 'R'; }
+ pre.src-ruby:before { content: 'Ruby'; }
+ pre.src-sass:before { content: 'Sass'; }
+ pre.src-scheme:before { content: 'Scheme'; }
+ pre.src-screen:before { content: 'Gnu Screen'; }
+ pre.src-sed:before { content: 'Sed'; }
+ pre.src-sh:before { content: 'shell'; }
+ pre.src-sql:before { content: 'SQL'; }
+ pre.src-sqlite:before { content: 'SQLite'; }
+ /* additional languages in org.el's org-babel-load-languages alist */
+ pre.src-forth:before { content: 'Forth'; }
+ pre.src-io:before { content: 'IO'; }
+ pre.src-J:before { content: 'J'; }
+ pre.src-makefile:before { content: 'Makefile'; }
+ pre.src-maxima:before { content: 'Maxima'; }
+ pre.src-perl:before { content: 'Perl'; }
+ pre.src-picolisp:before { content: 'Pico Lisp'; }
+ pre.src-scala:before { content: 'Scala'; }
+ pre.src-shell:before { content: 'Shell Script'; }
+ pre.src-ebnf2ps:before { content: 'ebfn2ps'; }
+ /* additional language identifiers per "defun org-babel-execute"
+ in ob-*.el */
+ pre.src-cpp:before { content: 'C++'; }
+ pre.src-abc:before { content: 'ABC'; }
+ pre.src-coq:before { content: 'Coq'; }
+ pre.src-groovy:before { content: 'Groovy'; }
+ /* additional language identifiers from org-babel-shell-names in
+ ob-shell.el: ob-shell is the only babel language using a lambda to put
+ the execution function name together. */
+ pre.src-bash:before { content: 'bash'; }
+ pre.src-csh:before { content: 'csh'; }
+ pre.src-ash:before { content: 'ash'; }
+ pre.src-dash:before { content: 'dash'; }
+ pre.src-ksh:before { content: 'ksh'; }
+ pre.src-mksh:before { content: 'mksh'; }
+ pre.src-posh:before { content: 'posh'; }
+ /* Additional Emacs modes also supported by the LaTeX listings package */
+ pre.src-ada:before { content: 'Ada'; }
+ pre.src-asm:before { content: 'Assembler'; }
+ pre.src-caml:before { content: 'Caml'; }
+ pre.src-delphi:before { content: 'Delphi'; }
+ pre.src-html:before { content: 'HTML'; }
+ pre.src-idl:before { content: 'IDL'; }
+ pre.src-mercury:before { content: 'Mercury'; }
+ pre.src-metapost:before { content: 'MetaPost'; }
+ pre.src-modula-2:before { content: 'Modula-2'; }
+ pre.src-pascal:before { content: 'Pascal'; }
+ pre.src-ps:before { content: 'PostScript'; }
+ pre.src-prolog:before { content: 'Prolog'; }
+ pre.src-simula:before { content: 'Simula'; }
+ pre.src-tcl:before { content: 'tcl'; }
+ pre.src-tex:before { content: 'TeX'; }
+ pre.src-plain-tex:before { content: 'Plain TeX'; }
+ pre.src-verilog:before { content: 'Verilog'; }
+ pre.src-vhdl:before { content: 'VHDL'; }
+ pre.src-xml:before { content: 'XML'; }
+ pre.src-nxml:before { content: 'XML'; }
+ /* add a generic configuration mode; LaTeX export needs an additional
+ (add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */
+ pre.src-conf:before { content: 'Configuration File'; }
+
+ table { border-collapse:collapse; }
+ caption.t-above { caption-side: top; }
+ caption.t-bottom { caption-side: bottom; }
+ td, th { vertical-align:top; }
+ th.org-right { text-align: center; }
+ th.org-left { text-align: center; }
+ th.org-center { text-align: center; }
+ td.org-right { text-align: right; }
+ td.org-left { text-align: left; }
+ td.org-center { text-align: center; }
+ dt { font-weight: bold; }
+ .footpara { display: inline; }
+ .footdef { margin-bottom: 1em; }
+ .figure { padding: 1em; }
+ .figure p { text-align: center; }
+ .equation-container {
+ display: table;
+ text-align: center;
+ width: 100%;
+ }
+ .equation {
+ vertical-align: middle;
+ }
+ .equation-label {
+ display: table-cell;
+ text-align: right;
+ vertical-align: middle;
+ }
+ .inlinetask {
+ padding: 10px;
+ border: 2px solid gray;
+ margin: 10px;
+ background: #ffffcc;
+ }
+ #org-div-home-and-up
+ { text-align: right; font-size: 70%; white-space: nowrap; }
+ textarea { overflow-x: auto; }
+ .linenr { font-size: smaller }
+ .code-highlighted { background-color: #ffff00; }
+ .org-info-js_info-navigation { border-style: none; }
+ #org-info-js_console-label
+ { font-size: 10px; font-weight: bold; white-space: nowrap; }
+ .org-info-js_search-highlight
+ { background-color: #ffff00; color: #000000; font-weight: bold; }
+ .org-svg { width: 90%; }
+ /*]]>*/-->
+</style>
+<script type="text/javascript">
+/*
+@licstart The following is the entire license notice for the
+JavaScript code in this tag.
+
+Copyright (C) 2012-2020 Free Software Foundation, Inc.
+
+The JavaScript code in this tag is free software: you can
+redistribute it and/or modify it under the terms of the GNU
+General Public License (GNU GPL) as published by the Free Software
+Foundation, either version 3 of the License, or (at your option)
+any later version. The code is distributed WITHOUT ANY WARRANTY;
+without even the implied warranty of MERCHANTABILITY or FITNESS
+FOR A PARTICULAR PURPOSE. See the GNU GPL for more details.
+
+As additional permission under GNU GPL version 3 section 7, you
+may distribute non-source (e.g., minimized or compacted) forms of
+that code without the copy of the GNU GPL normally required by
+section 4, provided you include this license notice and a URL
+through which recipients can access the Corresponding Source.
+
+
+@licend The above is the entire license notice
+for the JavaScript code in this tag.
+*/
+<!--/*--><![CDATA[/*><!--*/
+ function CodeHighlightOn(elem, id)
+ {
+ var target = document.getElementById(id);
+ if(null != target) {
+ elem.cacheClassElem = elem.className;
+ elem.cacheClassTarget = target.className;
+ target.className = "code-highlighted";
+ elem.className = "code-highlighted";
+ }
+ }
+ function CodeHighlightOff(elem, id)
+ {
+ var target = document.getElementById(id);
+ if(elem.cacheClassElem)
+ elem.className = elem.cacheClassElem;
+ if(elem.cacheClassTarget)
+ target.className = elem.cacheClassTarget;
+ }
+/*]]>*///-->
+</script>
+</head>
+<body>
+<div id="content">
+<h1 class="title">Download</h1>
+<div id="table-of-contents">
+<h2>Table of Contents</h2>
+<div id="text-table-of-contents">
+<ul>
+<li><a href="#orge184013">1. FASTA files</a></li>
+<li><a href="#orgc0ce14b">2. Metadata</a></li>
+<li><a href="#org52c7997">3. Pangenome</a>
+<ul>
+<li><a href="#orgba61745">3.1. Pangenome GFA format</a></li>
+<li><a href="#org6474dfc">3.2. Pangenome in ODGI format</a></li>
+<li><a href="#orge3a3726">3.3. Pangenome RDF format</a></li>
+<li><a href="#org359cc22">3.4. Pangenome Browser format</a></li>
+</ul>
+</li>
+<li><a href="#org488e901">4. Log of workflow output</a></li>
+<li><a href="#orga53b821">5. All files</a></li>
+</ul>
+</div>
+</div>
+
+<div id="outline-container-orge184013" class="outline-2">
+<h2 id="orge184013"><span class="section-number-2">1</span> FASTA files</h2>
+<div class="outline-text-2" id="text-1">
+<p>
+The <b>public sequence resource</b> provides all uploaded sequences as
+FASTA files. They can be referred to from metadata individually. We
+also provide a single file <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/relabeledSeqs_dedup.fasta">FASTA download</a>.
+</p>
+</div>
+</div>
+
+<div id="outline-container-orgc0ce14b" class="outline-2">
+<h2 id="orgc0ce14b"><span class="section-number-2">2</span> Metadata</h2>
+<div class="outline-text-2" id="text-2">
+<p>
+Metadata can be downloaded as <a href="https://www.w3.org/TR/turtle/">Turtle RDF</a> as a <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/mergedmetadata.ttl">mergedmetadat.ttl</a> which
+can be loaded into any RDF triple-store. We provide a Virtuoso SPARQL
+endpoint ourselves which can be queried from
+<a href="http://sparql.genenetwork.org/sparql/">http://sparql.genenetwork.org/sparql/</a>. Query examples can be found in
+our <a href="https://github.com/arvados/bh20-seq-resource/blob/master/doc/blog/using-covid-19-pubseq-part1.org">BLOG</a>.
+</p>
+
+<p>
+The Swiss Institute of Bioinformatics has included this data in
+<a href="https://covid-19-sparql.expasy.org/">https://covid-19-sparql.expasy.org/</a> and made it part of <a href="https://www.uniprot.org/">Uniprot</a>.
+</p>
+
+<p>
+An RDF file that includes the sequences themselves in a variation
+graph can be downloaded from below Pangenome RDF format.
+</p>
+</div>
+</div>
+
+<div id="outline-container-org52c7997" class="outline-2">
+<h2 id="org52c7997"><span class="section-number-2">3</span> Pangenome</h2>
+<div class="outline-text-2" id="text-3">
+<p>
+Pangenome data is made available in multiple guises. Variation graphs
+(VG) provide a succinct encoding of the sequences of many genomes.
+</p>
+</div>
+
+<div id="outline-container-orgba61745" class="outline-3">
+<h3 id="orgba61745"><span class="section-number-3">3.1</span> Pangenome GFA format</h3>
+<div class="outline-text-3" id="text-3-1">
+<p>
+<a href="https://github.com/GFA-spec/GFA-spec">GFA</a> is a standard for graphical fragment assembly and consumed
+by tools such as <a href="https://github.com/vgteam/vg">vgtools</a>.
+</p>
+</div>
+</div>
+
+<div id="outline-container-org6474dfc" class="outline-3">
+<h3 id="org6474dfc"><span class="section-number-3">3.2</span> Pangenome in ODGI format</h3>
+<div class="outline-text-3" id="text-3-2">
+<p>
+<a href="https://github.com/vgteam/odgi">ODGI</a> is a format that supports an optimized dynamic genome/graph
+implementation.
+</p>
+</div>
+</div>
+
+<div id="outline-container-orge3a3726" class="outline-3">
+<h3 id="orge3a3726"><span class="section-number-3">3.3</span> Pangenome RDF format</h3>
+<div class="outline-text-3" id="text-3-3">
+<p>
+An RDF file that includes the sequences themselves in a variation
+graph can be downloaded from
+<a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">relabeledSeqs-dedup-relabeledSeqs-dedup.ttl.xz</a>.
+</p>
+</div>
+</div>
+
+
+<div id="outline-container-org359cc22" class="outline-3">
+<h3 id="org359cc22"><span class="section-number-3">3.4</span> Pangenome Browser format</h3>
+<div class="outline-text-3" id="text-3-4">
+<p>
+The many JSON files that are named as
+<a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">results/1/chunk001200.bin1.schematic.json</a> are consumed by the
+Pangenome browser.
+</p>
+</div>
+</div>
+</div>
+
+<div id="outline-container-org488e901" class="outline-2">
+<h2 id="org488e901"><span class="section-number-2">4</span> Log of workflow output</h2>
+<div class="outline-text-2" id="text-4">
+<p>
+Including in below link is a log file of the last workflow runs.
+</p>
+</div>
+</div>
+
+<div id="outline-container-orga53b821" class="outline-2">
+<h2 id="orga53b821"><span class="section-number-2">5</span> All files</h2>
+<div class="outline-text-2" id="text-5">
+<p>
+<a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/</a>
+</p>
+</div>
+</div>
+</div>
+<div id="postamble" class="status">
+<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-05-24 Sun 11:11</small>.
+</div>
+</body>
+</html>
diff --git a/doc/web/download.org b/doc/web/download.org
new file mode 100644
index 0000000..498b132
--- /dev/null
+++ b/doc/web/download.org
@@ -0,0 +1,69 @@
+#+TITLE: Download
+#+AUTHOR: Pjotr Prins
+
+* Table of Contents :TOC:noexport:
+ - [[#fasta-files][FASTA files]]
+ - [[#metadata][Metadata]]
+ - [[#pangenome][Pangenome]]
+ - [[#pangenome-gfa-format][Pangenome GFA format]]
+ - [[#pangenome-in-odgi-format][Pangenome in ODGI format]]
+ - [[#pangenome-rdf-format][Pangenome RDF format]]
+ - [[#pangenome-browser-format][Pangenome Browser format]]
+ - [[#log-of-workflow-output][Log of workflow output]]
+ - [[#all-files][All files]]
+
+* FASTA files
+
+The *public sequence resource* provides all uploaded sequences as
+FASTA files. They can be referred to from metadata individually. We
+also provide a single file [[https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/relabeledSeqs_dedup.fasta][FASTA download]].
+
+* Metadata
+
+Metadata can be downloaded as [[https://www.w3.org/TR/turtle/][Turtle RDF]] as a [[https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/mergedmetadata.ttl][mergedmetadat.ttl]] which
+can be loaded into any RDF triple-store. We provide a Virtuoso SPARQL
+endpoint ourselves which can be queried from
+http://sparql.genenetwork.org/sparql/. Query examples can be found in
+our [[https://github.com/arvados/bh20-seq-resource/blob/master/doc/blog/using-covid-19-pubseq-part1.org][BLOG]].
+
+The Swiss Institute of Bioinformatics has included this data in
+https://covid-19-sparql.expasy.org/ and made it part of [[https://www.uniprot.org/][Uniprot]].
+
+An RDF file that includes the sequences themselves in a variation
+graph can be downloaded from below Pangenome RDF format.
+
+* Pangenome
+
+Pangenome data is made available in multiple guises. Variation graphs
+(VG) provide a succinct encoding of the sequences of many genomes.
+
+** Pangenome GFA format
+
+[[https://github.com/GFA-spec/GFA-spec][GFA]] is a standard for graphical fragment assembly and consumed
+by tools such as [[https://github.com/vgteam/vg][vgtools]].
+
+** Pangenome in ODGI format
+
+[[https://github.com/vgteam/odgi][ODGI]] is a format that supports an optimised dynamic genome/graph
+implementation.
+
+** Pangenome RDF format
+
+An RDF file that includes the sequences themselves in a variation
+graph can be downloaded from
+[[https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/][relabeledSeqs-dedup-relabeledSeqs-dedup.ttl.xz]].
+
+
+** Pangenome Browser format
+
+The many JSON files that are named as
+[[https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/][results/1/chunk001200.bin1.schematic.json]] are consumed by the
+Pangenome browser.
+
+* Log of workflow output
+
+Including in below link is a log file of the last workflow runs.
+
+* All files
+
+https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/