<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<!-- 2020-06-12 Fri 04:41 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Download</title>
<meta name="generator" content="Org mode" />
<meta name="author" content="Pjotr Prins" />
<style type="text/css">
<!--/*--><![CDATA[/*><!--*/
.title { text-align: center;
margin-bottom: .2em; }
.subtitle { text-align: center;
font-size: medium;
font-weight: bold;
margin-top:0; }
.todo { font-family: monospace; color: red; }
.done { font-family: monospace; color: green; }
.priority { font-family: monospace; color: orange; }
.tag { background-color: #eee; font-family: monospace;
padding: 2px; font-size: 80%; font-weight: normal; }
.timestamp { color: #bebebe; }
.timestamp-kwd { color: #5f9ea0; }
.org-right { margin-left: auto; margin-right: 0px; text-align: right; }
.org-left { margin-left: 0px; margin-right: auto; text-align: left; }
.org-center { margin-left: auto; margin-right: auto; text-align: center; }
.underline { text-decoration: underline; }
#postamble p, #preamble p { font-size: 90%; margin: .2em; }
p.verse { margin-left: 3%; }
pre {
border: 1px solid #ccc;
box-shadow: 3px 3px 3px #eee;
padding: 8pt;
font-family: monospace;
overflow: auto;
margin: 1.2em;
}
pre.src {
position: relative;
overflow: visible;
padding-top: 1.2em;
}
pre.src:before {
display: none;
position: absolute;
background-color: white;
top: -10px;
right: 10px;
padding: 3px;
border: 1px solid black;
}
pre.src:hover:before { display: inline;}
/* Languages per Org manual */
pre.src-asymptote:before { content: 'Asymptote'; }
pre.src-awk:before { content: 'Awk'; }
pre.src-C:before { content: 'C'; }
/* pre.src-C++ doesn't work in CSS */
pre.src-clojure:before { content: 'Clojure'; }
pre.src-css:before { content: 'CSS'; }
pre.src-D:before { content: 'D'; }
pre.src-ditaa:before { content: 'ditaa'; }
pre.src-dot:before { content: 'Graphviz'; }
pre.src-calc:before { content: 'Emacs Calc'; }
pre.src-emacs-lisp:before { content: 'Emacs Lisp'; }
pre.src-fortran:before { content: 'Fortran'; }
pre.src-gnuplot:before { content: 'gnuplot'; }
pre.src-haskell:before { content: 'Haskell'; }
pre.src-hledger:before { content: 'hledger'; }
pre.src-java:before { content: 'Java'; }
pre.src-js:before { content: 'Javascript'; }
pre.src-latex:before { content: 'LaTeX'; }
pre.src-ledger:before { content: 'Ledger'; }
pre.src-lisp:before { content: 'Lisp'; }
pre.src-lilypond:before { content: 'Lilypond'; }
pre.src-lua:before { content: 'Lua'; }
pre.src-matlab:before { content: 'MATLAB'; }
pre.src-mscgen:before { content: 'Mscgen'; }
pre.src-ocaml:before { content: 'Objective Caml'; }
pre.src-octave:before { content: 'Octave'; }
pre.src-org:before { content: 'Org mode'; }
pre.src-oz:before { content: 'OZ'; }
pre.src-plantuml:before { content: 'Plantuml'; }
pre.src-processing:before { content: 'Processing.js'; }
pre.src-python:before { content: 'Python'; }
pre.src-R:before { content: 'R'; }
pre.src-ruby:before { content: 'Ruby'; }
pre.src-sass:before { content: 'Sass'; }
pre.src-scheme:before { content: 'Scheme'; }
pre.src-screen:before { content: 'Gnu Screen'; }
pre.src-sed:before { content: 'Sed'; }
pre.src-sh:before { content: 'shell'; }
pre.src-sql:before { content: 'SQL'; }
pre.src-sqlite:before { content: 'SQLite'; }
/* additional languages in org.el's org-babel-load-languages alist */
pre.src-forth:before { content: 'Forth'; }
pre.src-io:before { content: 'IO'; }
pre.src-J:before { content: 'J'; }
pre.src-makefile:before { content: 'Makefile'; }
pre.src-maxima:before { content: 'Maxima'; }
pre.src-perl:before { content: 'Perl'; }
pre.src-picolisp:before { content: 'Pico Lisp'; }
pre.src-scala:before { content: 'Scala'; }
pre.src-shell:before { content: 'Shell Script'; }
pre.src-ebnf2ps:before { content: 'ebfn2ps'; }
/* additional language identifiers per "defun org-babel-execute"
in ob-*.el */
pre.src-cpp:before { content: 'C++'; }
pre.src-abc:before { content: 'ABC'; }
pre.src-coq:before { content: 'Coq'; }
pre.src-groovy:before { content: 'Groovy'; }
/* additional language identifiers from org-babel-shell-names in
ob-shell.el: ob-shell is the only babel language using a lambda to put
the execution function name together. */
pre.src-bash:before { content: 'bash'; }
pre.src-csh:before { content: 'csh'; }
pre.src-ash:before { content: 'ash'; }
pre.src-dash:before { content: 'dash'; }
pre.src-ksh:before { content: 'ksh'; }
pre.src-mksh:before { content: 'mksh'; }
pre.src-posh:before { content: 'posh'; }
/* Additional Emacs modes also supported by the LaTeX listings package */
pre.src-ada:before { content: 'Ada'; }
pre.src-asm:before { content: 'Assembler'; }
pre.src-caml:before { content: 'Caml'; }
pre.src-delphi:before { content: 'Delphi'; }
pre.src-html:before { content: 'HTML'; }
pre.src-idl:before { content: 'IDL'; }
pre.src-mercury:before { content: 'Mercury'; }
pre.src-metapost:before { content: 'MetaPost'; }
pre.src-modula-2:before { content: 'Modula-2'; }
pre.src-pascal:before { content: 'Pascal'; }
pre.src-ps:before { content: 'PostScript'; }
pre.src-prolog:before { content: 'Prolog'; }
pre.src-simula:before { content: 'Simula'; }
pre.src-tcl:before { content: 'tcl'; }
pre.src-tex:before { content: 'TeX'; }
pre.src-plain-tex:before { content: 'Plain TeX'; }
pre.src-verilog:before { content: 'Verilog'; }
pre.src-vhdl:before { content: 'VHDL'; }
pre.src-xml:before { content: 'XML'; }
pre.src-nxml:before { content: 'XML'; }
/* add a generic configuration mode; LaTeX export needs an additional
(add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */
pre.src-conf:before { content: 'Configuration File'; }
table { border-collapse:collapse; }
caption.t-above { caption-side: top; }
caption.t-bottom { caption-side: bottom; }
td, th { vertical-align:top; }
th.org-right { text-align: center; }
th.org-left { text-align: center; }
th.org-center { text-align: center; }
td.org-right { text-align: right; }
td.org-left { text-align: left; }
td.org-center { text-align: center; }
dt { font-weight: bold; }
.footpara { display: inline; }
.footdef { margin-bottom: 1em; }
.figure { padding: 1em; }
.figure p { text-align: center; }
.equation-container {
display: table;
text-align: center;
width: 100%;
}
.equation {
vertical-align: middle;
}
.equation-label {
display: table-cell;
text-align: right;
vertical-align: middle;
}
.inlinetask {
padding: 10px;
border: 2px solid gray;
margin: 10px;
background: #ffffcc;
}
#org-div-home-and-up
{ text-align: right; font-size: 70%; white-space: nowrap; }
textarea { overflow-x: auto; }
.linenr { font-size: smaller }
.code-highlighted { background-color: #ffff00; }
.org-info-js_info-navigation { border-style: none; }
#org-info-js_console-label
{ font-size: 10px; font-weight: bold; white-space: nowrap; }
.org-info-js_search-highlight
{ background-color: #ffff00; color: #000000; font-weight: bold; }
.org-svg { width: 90%; }
/*]]>*/-->
</style>
<script type="text/javascript">
/*
@licstart The following is the entire license notice for the
JavaScript code in this tag.
Copyright (C) 2012-2020 Free Software Foundation, Inc.
The JavaScript code in this tag is free software: you can
redistribute it and/or modify it under the terms of the GNU
General Public License (GNU GPL) as published by the Free Software
Foundation, either version 3 of the License, or (at your option)
any later version. The code is distributed WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU GPL for more details.
As additional permission under GNU GPL version 3 section 7, you
may distribute non-source (e.g., minimized or compacted) forms of
that code without the copy of the GNU GPL normally required by
section 4, provided you include this license notice and a URL
through which recipients can access the Corresponding Source.
@licend The above is the entire license notice
for the JavaScript code in this tag.
*/
<!--/*--><![CDATA[/*><!--*/
function CodeHighlightOn(elem, id)
{
var target = document.getElementById(id);
if(null != target) {
elem.cacheClassElem = elem.className;
elem.cacheClassTarget = target.className;
target.className = "code-highlighted";
elem.className = "code-highlighted";
}
}
function CodeHighlightOff(elem, id)
{
var target = document.getElementById(id);
if(elem.cacheClassElem)
elem.className = elem.cacheClassElem;
if(elem.cacheClassTarget)
target.className = elem.cacheClassTarget;
}
/*]]>*///-->
</script>
</head>
<body>
<div id="content">
<h1 class="title">Download</h1>
<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
<li><a href="#orgcd3d82f">1. Workflow runs</a></li>
<li><a href="#org9aff936">2. FASTA files</a></li>
<li><a href="#orgc3e953c">3. Metadata</a></li>
<li><a href="#orgc9c55c4">4. Pangenome</a>
<ul>
<li><a href="#org82c3ce9">4.1. Pangenome GFA format</a></li>
<li><a href="#orgf63e9f7">4.2. Pangenome in ODGI format</a></li>
<li><a href="#org8faf32f">4.3. Pangenome RDF format</a></li>
<li><a href="#org0c452f6">4.4. Pangenome Browser format</a></li>
</ul>
</li>
<li><a href="#org4707094">5. Log of workflow output</a></li>
<li><a href="#orgd4d8f91">6. All files</a></li>
<li><a href="#org237b3cf">7. Planned</a>
<ul>
<li><a href="#org66e03ac">7.1. Raw sequence data</a></li>
<li><a href="#orgdfae1b9">7.2. Multiple Sequence Alignment (MSA)</a></li>
<li><a href="#orgaedc43e">7.3. Phylogenetic tree</a></li>
<li><a href="#org19a6a11">7.4. Protein prediction</a></li>
</ul>
</li>
<li><a href="#org49778b7">8. Source code</a></li>
</ul>
</div>
</div>
<div id="outline-container-orgcd3d82f" class="outline-2">
<h2 id="orgcd3d82f"><span class="section-number-2">1</span> Workflow runs</h2>
<div class="outline-text-2" id="text-1">
<p>
The last runs can be viewed <a href="https://workbench.lugli.arvadosapi.com/projects/lugli-j7d0g-y4k4uswcqi3ku56#Subprojects">here</a>. If you click on a run you can see
the workflows that ran under <code>Processes</code>. Output (also intermediate)
is listed under <code>Data collections</code>. All current data is listed
<a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">here</a>. Note that it takes time for a run to complete and show.
</p>
</div>
</div>
<div id="outline-container-org9aff936" class="outline-2">
<h2 id="org9aff936"><span class="section-number-2">2</span> FASTA files</h2>
<div class="outline-text-2" id="text-2">
<p>
The <b>public sequence resource</b> provides all uploaded sequences as
FASTA files. They can be referred to from metadata individually. We
also provide a single file <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/relabeledSeqs_dedup.fasta">FASTA download</a>.
</p>
</div>
</div>
<div id="outline-container-orgc3e953c" class="outline-2">
<h2 id="orgc3e953c"><span class="section-number-2">3</span> Metadata</h2>
<div class="outline-text-2" id="text-3">
<p>
Metadata can be downloaded as <a href="https://www.w3.org/TR/turtle/">Turtle RDF</a> as a <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/mergedmetadata.ttl">mergedmetadat.ttl</a> which
can be loaded into any RDF triple-store. We provide a Virtuoso SPARQL
endpoint ourselves which can be queried from
<a href="http://sparql.genenetwork.org/sparql/">http://sparql.genenetwork.org/sparql/</a>. Query examples can be found in
our <a href="https://github.com/arvados/bh20-seq-resource/blob/master/doc/blog/using-covid-19-pubseq-part1.org">BLOG</a>.
</p>
<p>
The Swiss Institute of Bioinformatics has included this data in
<a href="https://covid-19-sparql.expasy.org/">https://covid-19-sparql.expasy.org/</a> and made it part of <a href="https://www.uniprot.org/">Uniprot</a>.
</p>
<p>
An RDF file that includes the sequences themselves in a variation
graph can be downloaded from below Pangenome RDF format.
</p>
</div>
</div>
<div id="outline-container-orgc9c55c4" class="outline-2">
<h2 id="orgc9c55c4"><span class="section-number-2">4</span> Pangenome</h2>
<div class="outline-text-2" id="text-4">
<p>
Pangenome data is made available in multiple guises. Variation graphs
(VG) provide a succinct encoding of the sequences of many genomes.
</p>
</div>
<div id="outline-container-org82c3ce9" class="outline-3">
<h3 id="org82c3ce9"><span class="section-number-3">4.1</span> Pangenome GFA format</h3>
<div class="outline-text-3" id="text-4-1">
<p>
<a href="https://github.com/GFA-spec/GFA-spec">GFA</a> is a standard for graphical fragment assembly and consumed
by tools such as <a href="https://github.com/vgteam/vg">vgtools</a>.
</p>
</div>
</div>
<div id="outline-container-orgf63e9f7" class="outline-3">
<h3 id="orgf63e9f7"><span class="section-number-3">4.2</span> Pangenome in ODGI format</h3>
<div class="outline-text-3" id="text-4-2">
<p>
<a href="https://github.com/vgteam/odgi">ODGI</a> is a format that supports an optimised dynamic genome/graph
implementation.
</p>
</div>
</div>
<div id="outline-container-org8faf32f" class="outline-3">
<h3 id="org8faf32f"><span class="section-number-3">4.3</span> Pangenome RDF format</h3>
<div class="outline-text-3" id="text-4-3">
<p>
An RDF file that includes the sequences themselves in a variation
graph can be downloaded from
<a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">relabeledSeqs-dedup-relabeledSeqs-dedup.ttl.xz</a>.
</p>
</div>
</div>
<div id="outline-container-org0c452f6" class="outline-3">
<h3 id="org0c452f6"><span class="section-number-3">4.4</span> Pangenome Browser format</h3>
<div class="outline-text-3" id="text-4-4">
<p>
The many JSON files that are named as
<a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">results/1/chunk001200.bin1.schematic.json</a> are consumed by the
Pangenome browser.
</p>
</div>
</div>
</div>
<div id="outline-container-org4707094" class="outline-2">
<h2 id="org4707094"><span class="section-number-2">5</span> Log of workflow output</h2>
<div class="outline-text-2" id="text-5">
<p>
Including in below link is a log file of the last workflow runs.
</p>
</div>
</div>
<div id="outline-container-orgd4d8f91" class="outline-2">
<h2 id="orgd4d8f91"><span class="section-number-2">6</span> All files</h2>
<div class="outline-text-2" id="text-6">
<p>
<a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/</a>
</p>
</div>
</div>
<div id="outline-container-org237b3cf" class="outline-2">
<h2 id="org237b3cf"><span class="section-number-2">7</span> Planned</h2>
<div class="outline-text-2" id="text-7">
<p>
We are planning the add the following output (see also
</p>
</div>
<div id="outline-container-org66e03ac" class="outline-3">
<h3 id="org66e03ac"><span class="section-number-3">7.1</span> Raw sequence data</h3>
<div class="outline-text-3" id="text-7-1">
<p>
See <a href="https://github.com/arvados/bh20-seq-resource/issues/16">fastq tracker</a> and <a href="https://github.com/arvados/bh20-seq-resource/issues/63">BAM tracker</a>.
</p>
</div>
</div>
<div id="outline-container-orgdfae1b9" class="outline-3">
<h3 id="orgdfae1b9"><span class="section-number-3">7.2</span> Multiple Sequence Alignment (MSA)</h3>
<div class="outline-text-3" id="text-7-2">
<p>
See <a href="https://github.com/arvados/bh20-seq-resource/issues/11">MSA tracker</a>.
</p>
</div>
</div>
<div id="outline-container-orgaedc43e" class="outline-3">
<h3 id="orgaedc43e"><span class="section-number-3">7.3</span> Phylogenetic tree</h3>
<div class="outline-text-3" id="text-7-3">
<p>
See <a href="https://github.com/arvados/bh20-seq-resource/issues/43">Phylo tracker</a>.
</p>
</div>
</div>
<div id="outline-container-org19a6a11" class="outline-3">
<h3 id="org19a6a11"><span class="section-number-3">7.4</span> Protein prediction</h3>
<div class="outline-text-3" id="text-7-4">
<p>
We aim to make protein predictions available.
</p>
</div>
</div>
</div>
<div id="outline-container-org49778b7" class="outline-2">
<h2 id="org49778b7"><span class="section-number-2">8</span> Source code</h2>
<div class="outline-text-2" id="text-8">
<p>
All source code for this website and tooling is available
from
<a href="https://github.com/arvados/bh20-seq-resource">https://github.com/arvados/bh20-seq-resource</a>
</p>
</div>
</div>
</div>
<div id="postamble" class="status">
<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-06-12 Fri 04:41</small>.
</div>
</body>
</html>