<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <!-- 2020-05-24 Sun 11:29 --> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <title>Download</title> <meta name="generator" content="Org mode" /> <meta name="author" content="Pjotr Prins" /> <style type="text/css"> <!--/*--><![CDATA[/*><!--*/ .title { text-align: center; margin-bottom: .2em; } .subtitle { text-align: center; font-size: medium; font-weight: bold; margin-top:0; } .todo { font-family: monospace; color: red; } .done { font-family: monospace; color: green; } .priority { font-family: monospace; color: orange; } .tag { background-color: #eee; font-family: monospace; padding: 2px; font-size: 80%; font-weight: normal; } .timestamp { color: #bebebe; } .timestamp-kwd { color: #5f9ea0; } .org-right { margin-left: auto; margin-right: 0px; text-align: right; } .org-left { margin-left: 0px; margin-right: auto; text-align: left; } .org-center { margin-left: auto; margin-right: auto; text-align: center; } .underline { text-decoration: underline; } #postamble p, #preamble p { font-size: 90%; margin: .2em; } p.verse { margin-left: 3%; } pre { border: 1px solid #ccc; box-shadow: 3px 3px 3px #eee; padding: 8pt; font-family: monospace; overflow: auto; margin: 1.2em; } pre.src { position: relative; overflow: visible; padding-top: 1.2em; } pre.src:before { display: none; position: absolute; background-color: white; top: -10px; right: 10px; padding: 3px; border: 1px solid black; } pre.src:hover:before { display: inline;} /* Languages per Org manual */ pre.src-asymptote:before { content: 'Asymptote'; } pre.src-awk:before { content: 'Awk'; } pre.src-C:before { content: 'C'; } /* pre.src-C++ doesn't work in CSS */ pre.src-clojure:before { content: 'Clojure'; } pre.src-css:before { content: 'CSS'; } pre.src-D:before { content: 'D'; } pre.src-ditaa:before { content: 'ditaa'; } pre.src-dot:before { content: 'Graphviz'; } pre.src-calc:before { content: 'Emacs Calc'; } pre.src-emacs-lisp:before { content: 'Emacs Lisp'; } pre.src-fortran:before { content: 'Fortran'; } pre.src-gnuplot:before { content: 'gnuplot'; } pre.src-haskell:before { content: 'Haskell'; } pre.src-hledger:before { content: 'hledger'; } pre.src-java:before { content: 'Java'; } pre.src-js:before { content: 'Javascript'; } pre.src-latex:before { content: 'LaTeX'; } pre.src-ledger:before { content: 'Ledger'; } pre.src-lisp:before { content: 'Lisp'; } pre.src-lilypond:before { content: 'Lilypond'; } pre.src-lua:before { content: 'Lua'; } pre.src-matlab:before { content: 'MATLAB'; } pre.src-mscgen:before { content: 'Mscgen'; } pre.src-ocaml:before { content: 'Objective Caml'; } pre.src-octave:before { content: 'Octave'; } pre.src-org:before { content: 'Org mode'; } pre.src-oz:before { content: 'OZ'; } pre.src-plantuml:before { content: 'Plantuml'; } pre.src-processing:before { content: 'Processing.js'; } pre.src-python:before { content: 'Python'; } pre.src-R:before { content: 'R'; } pre.src-ruby:before { content: 'Ruby'; } pre.src-sass:before { content: 'Sass'; } pre.src-scheme:before { content: 'Scheme'; } pre.src-screen:before { content: 'Gnu Screen'; } pre.src-sed:before { content: 'Sed'; } pre.src-sh:before { content: 'shell'; } pre.src-sql:before { content: 'SQL'; } pre.src-sqlite:before { content: 'SQLite'; } /* additional languages in org.el's org-babel-load-languages alist */ pre.src-forth:before { content: 'Forth'; } pre.src-io:before { content: 'IO'; } pre.src-J:before { content: 'J'; } pre.src-makefile:before { content: 'Makefile'; } pre.src-maxima:before { content: 'Maxima'; } pre.src-perl:before { content: 'Perl'; } pre.src-picolisp:before { content: 'Pico Lisp'; } pre.src-scala:before { content: 'Scala'; } pre.src-shell:before { content: 'Shell Script'; } pre.src-ebnf2ps:before { content: 'ebfn2ps'; } /* additional language identifiers per "defun org-babel-execute" in ob-*.el */ pre.src-cpp:before { content: 'C++'; } pre.src-abc:before { content: 'ABC'; } pre.src-coq:before { content: 'Coq'; } pre.src-groovy:before { content: 'Groovy'; } /* additional language identifiers from org-babel-shell-names in ob-shell.el: ob-shell is the only babel language using a lambda to put the execution function name together. */ pre.src-bash:before { content: 'bash'; } pre.src-csh:before { content: 'csh'; } pre.src-ash:before { content: 'ash'; } pre.src-dash:before { content: 'dash'; } pre.src-ksh:before { content: 'ksh'; } pre.src-mksh:before { content: 'mksh'; } pre.src-posh:before { content: 'posh'; } /* Additional Emacs modes also supported by the LaTeX listings package */ pre.src-ada:before { content: 'Ada'; } pre.src-asm:before { content: 'Assembler'; } pre.src-caml:before { content: 'Caml'; } pre.src-delphi:before { content: 'Delphi'; } pre.src-html:before { content: 'HTML'; } pre.src-idl:before { content: 'IDL'; } pre.src-mercury:before { content: 'Mercury'; } pre.src-metapost:before { content: 'MetaPost'; } pre.src-modula-2:before { content: 'Modula-2'; } pre.src-pascal:before { content: 'Pascal'; } pre.src-ps:before { content: 'PostScript'; } pre.src-prolog:before { content: 'Prolog'; } pre.src-simula:before { content: 'Simula'; } pre.src-tcl:before { content: 'tcl'; } pre.src-tex:before { content: 'TeX'; } pre.src-plain-tex:before { content: 'Plain TeX'; } pre.src-verilog:before { content: 'Verilog'; } pre.src-vhdl:before { content: 'VHDL'; } pre.src-xml:before { content: 'XML'; } pre.src-nxml:before { content: 'XML'; } /* add a generic configuration mode; LaTeX export needs an additional (add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */ pre.src-conf:before { content: 'Configuration File'; } table { border-collapse:collapse; } caption.t-above { caption-side: top; } caption.t-bottom { caption-side: bottom; } td, th { vertical-align:top; } th.org-right { text-align: center; } th.org-left { text-align: center; } th.org-center { text-align: center; } td.org-right { text-align: right; } td.org-left { text-align: left; } td.org-center { text-align: center; } dt { font-weight: bold; } .footpara { display: inline; } .footdef { margin-bottom: 1em; } .figure { padding: 1em; } .figure p { text-align: center; } .equation-container { display: table; text-align: center; width: 100%; } .equation { vertical-align: middle; } .equation-label { display: table-cell; text-align: right; vertical-align: middle; } .inlinetask { padding: 10px; border: 2px solid gray; margin: 10px; background: #ffffcc; } #org-div-home-and-up { text-align: right; font-size: 70%; white-space: nowrap; } textarea { overflow-x: auto; } .linenr { font-size: smaller } .code-highlighted { background-color: #ffff00; } .org-info-js_info-navigation { border-style: none; } #org-info-js_console-label { font-size: 10px; font-weight: bold; white-space: nowrap; } .org-info-js_search-highlight { background-color: #ffff00; color: #000000; font-weight: bold; } .org-svg { width: 90%; } /*]]>*/--> </style> <script type="text/javascript"> /* @licstart The following is the entire license notice for the JavaScript code in this tag. Copyright (C) 2012-2020 Free Software Foundation, Inc. The JavaScript code in this tag is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License (GNU GPL) as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. The code is distributed WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU GPL for more details. As additional permission under GNU GPL version 3 section 7, you may distribute non-source (e.g., minimized or compacted) forms of that code without the copy of the GNU GPL normally required by section 4, provided you include this license notice and a URL through which recipients can access the Corresponding Source. @licend The above is the entire license notice for the JavaScript code in this tag. */ <!--/*--><![CDATA[/*><!--*/ function CodeHighlightOn(elem, id) { var target = document.getElementById(id); if(null != target) { elem.cacheClassElem = elem.className; elem.cacheClassTarget = target.className; target.className = "code-highlighted"; elem.className = "code-highlighted"; } } function CodeHighlightOff(elem, id) { var target = document.getElementById(id); if(elem.cacheClassElem) elem.className = elem.cacheClassElem; if(elem.cacheClassTarget) target.className = elem.cacheClassTarget; } /*]]>*///--> </script> </head> <body> <div id="content"> <h1 class="title">Download</h1> <div id="table-of-contents"> <h2>Table of Contents</h2> <div id="text-table-of-contents"> <ul> <li><a href="#org42addaa">1. FASTA files</a></li> <li><a href="#org04bd8c7">2. Metadata</a></li> <li><a href="#orgead2f03">3. Pangenome</a> <ul> <li><a href="#org8ea7d40">3.1. Pangenome GFA format</a></li> <li><a href="#orge7808d6">3.2. Pangenome in ODGI format</a></li> <li><a href="#orgaadcde8">3.3. Pangenome RDF format</a></li> <li><a href="#orga3a0408">3.4. Pangenome Browser format</a></li> </ul> </li> <li><a href="#org1bbb7e6">4. Log of workflow output</a></li> <li><a href="#orgd16b2c8">5. All files</a></li> <li><a href="#org9d40ed2">6. Planned</a> <ul> <li><a href="#org70cb9d5">6.1. Raw sequence data</a></li> <li><a href="#org38cfa2e">6.2. Multiple Sequence Alignment (MSA)</a></li> <li><a href="#org507c7dd">6.3. Phylogenetic tree</a></li> <li><a href="#orgca26edf">6.4. Protein prediction</a></li> </ul> </li> </ul> </div> </div> <div id="outline-container-org42addaa" class="outline-2"> <h2 id="org42addaa"><span class="section-number-2">1</span> FASTA files</h2> <div class="outline-text-2" id="text-1"> <p> The <b>public sequence resource</b> provides all uploaded sequences as FASTA files. They can be referred to from metadata individually. We also provide a single file <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/relabeledSeqs_dedup.fasta">FASTA download</a>. </p> </div> </div> <div id="outline-container-org04bd8c7" class="outline-2"> <h2 id="org04bd8c7"><span class="section-number-2">2</span> Metadata</h2> <div class="outline-text-2" id="text-2"> <p> Metadata can be downloaded as <a href="https://www.w3.org/TR/turtle/">Turtle RDF</a> as a <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/mergedmetadata.ttl">mergedmetadat.ttl</a> which can be loaded into any RDF triple-store. We provide a Virtuoso SPARQL endpoint ourselves which can be queried from <a href="http://sparql.genenetwork.org/sparql/">http://sparql.genenetwork.org/sparql/</a>. Query examples can be found in our <a href="https://github.com/arvados/bh20-seq-resource/blob/master/doc/blog/using-covid-19-pubseq-part1.org">BLOG</a>. </p> <p> The Swiss Institute of Bioinformatics has included this data in <a href="https://covid-19-sparql.expasy.org/">https://covid-19-sparql.expasy.org/</a> and made it part of <a href="https://www.uniprot.org/">Uniprot</a>. </p> <p> An RDF file that includes the sequences themselves in a variation graph can be downloaded from below Pangenome RDF format. </p> </div> </div> <div id="outline-container-orgead2f03" class="outline-2"> <h2 id="orgead2f03"><span class="section-number-2">3</span> Pangenome</h2> <div class="outline-text-2" id="text-3"> <p> Pangenome data is made available in multiple guises. Variation graphs (VG) provide a succinct encoding of the sequences of many genomes. </p> </div> <div id="outline-container-org8ea7d40" class="outline-3"> <h3 id="org8ea7d40"><span class="section-number-3">3.1</span> Pangenome GFA format</h3> <div class="outline-text-3" id="text-3-1"> <p> <a href="https://github.com/GFA-spec/GFA-spec">GFA</a> is a standard for graphical fragment assembly and consumed by tools such as <a href="https://github.com/vgteam/vg">vgtools</a>. </p> </div> </div> <div id="outline-container-orge7808d6" class="outline-3"> <h3 id="orge7808d6"><span class="section-number-3">3.2</span> Pangenome in ODGI format</h3> <div class="outline-text-3" id="text-3-2"> <p> <a href="https://github.com/vgteam/odgi">ODGI</a> is a format that supports an optimised dynamic genome/graph implementation. </p> </div> </div> <div id="outline-container-orgaadcde8" class="outline-3"> <h3 id="orgaadcde8"><span class="section-number-3">3.3</span> Pangenome RDF format</h3> <div class="outline-text-3" id="text-3-3"> <p> An RDF file that includes the sequences themselves in a variation graph can be downloaded from <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">relabeledSeqs-dedup-relabeledSeqs-dedup.ttl.xz</a>. </p> </div> </div> <div id="outline-container-orga3a0408" class="outline-3"> <h3 id="orga3a0408"><span class="section-number-3">3.4</span> Pangenome Browser format</h3> <div class="outline-text-3" id="text-3-4"> <p> The many JSON files that are named as <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">results/1/chunk001200.bin1.schematic.json</a> are consumed by the Pangenome browser. </p> </div> </div> </div> <div id="outline-container-org1bbb7e6" class="outline-2"> <h2 id="org1bbb7e6"><span class="section-number-2">4</span> Log of workflow output</h2> <div class="outline-text-2" id="text-4"> <p> Including in below link is a log file of the last workflow runs. </p> </div> </div> <div id="outline-container-orgd16b2c8" class="outline-2"> <h2 id="orgd16b2c8"><span class="section-number-2">5</span> All files</h2> <div class="outline-text-2" id="text-5"> <p> <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/</a> </p> </div> </div> <div id="outline-container-org9d40ed2" class="outline-2"> <h2 id="org9d40ed2"><span class="section-number-2">6</span> Planned</h2> <div class="outline-text-2" id="text-6"> <p> We are planning the add the following output (see also </p> </div> <div id="outline-container-org70cb9d5" class="outline-3"> <h3 id="org70cb9d5"><span class="section-number-3">6.1</span> Raw sequence data</h3> <div class="outline-text-3" id="text-6-1"> <p> See <a href="https://github.com/arvados/bh20-seq-resource/issues/16">fastq tracker</a> and <a href="https://github.com/arvados/bh20-seq-resource/issues/63">BAM tracker</a>. </p> </div> </div> <div id="outline-container-org38cfa2e" class="outline-3"> <h3 id="org38cfa2e"><span class="section-number-3">6.2</span> Multiple Sequence Alignment (MSA)</h3> <div class="outline-text-3" id="text-6-2"> <p> See <a href="https://github.com/arvados/bh20-seq-resource/issues/11">MSA tracker</a>. </p> </div> </div> <div id="outline-container-org507c7dd" class="outline-3"> <h3 id="org507c7dd"><span class="section-number-3">6.3</span> Phylogenetic tree</h3> <div class="outline-text-3" id="text-6-3"> <p> See <a href="https://github.com/arvados/bh20-seq-resource/issues/43">Phylo tracker</a>. </p> </div> </div> <div id="outline-container-orgca26edf" class="outline-3"> <h3 id="orgca26edf"><span class="section-number-3">6.4</span> Protein prediction</h3> <div class="outline-text-3" id="text-6-4"> <p> We aim to make protein predictions available. </p> </div> </div> </div> </div> <div id="postamble" class="status"> <hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-05-24 Sun 11:29</small>. </div> </body> </html>