Merge pull request #99 from AndreaGuarracino/patch-2

several fixes in the website, added links to video talk and poster, new pangenome generation workflow
author: Peter Amstutz 2020-08-05 16:06:11 -0400
committer: GitHub 2020-08-05 16:06:11 -0400
commit: fdb1b012fc04ee07f401541e181e28fe442c9454 (patch)
tree: 8486db1087692dffcea9d93814e436d9cf150b47
parent: 86f31ef60f65a820bf9ac25c3fc01c88f2a9ebfe (diff)
parent: 2d20bf90497588a297ca98a78ee0fbbcadf95569 (diff)
download: bh20-seq-resource-fdb1b012fc04ee07f401541e181e28fe442c9454.tar.gz
bh20-seq-resource-fdb1b012fc04ee07f401541e181e28fe442c9454.tar.lz
bh20-seq-resource-fdb1b012fc04ee07f401541e181e28fe442c9454.zip
19 files changed, 1219 insertions, 598 deletions
diff --git a/README.md b/README.md
index 8c3a589..03e4297 100644
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@ web interface. You can use it to upload the genomes of SARS-CoV-2
 samples to make them publicly and freely available to other
 researchers. For more information see the [paper](./paper/paper.md).
 
-![alt text](./image/website.png "Website")
+![alt text](./image/homepage.png "Website")
 
 To get started, first [install the uploader](#installation), and use the `bh20-seq-uploader` command to [upload your data](#usage).
 
diff --git a/bh20simplewebuploader/static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.pdf b/bh20simplewebuploader/static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.pdf
new file mode 100644
index 0000000..7da8cd6
--- /dev/null
+++ b/bh20simplewebuploader/static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.pdf
Binary files differdiff --git a/bh20simplewebuploader/static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.png b/bh20simplewebuploader/static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.png
new file mode 100644
index 0000000..eae2721
--- /dev/null
+++ b/bh20simplewebuploader/static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.png
Binary files differdiff --git a/bh20simplewebuploader/static/main.css b/bh20simplewebuploader/static/main.css
index bdcc0bc..7c33d9c 100644
--- a/bh20simplewebuploader/static/main.css
+++ b/bh20simplewebuploader/static/main.css
@@ -177,7 +177,7 @@ span.dropt:hover {text-decoration: none; background: #ffffff; z-index: 6; }
 
 .about {
     display: grid;
-    grid-template-columns: 1fr 1fr;
+    grid-template-columns: 1fr 1fr 1fr;
     grid-auto-flow: row;
 }
 
diff --git a/bh20simplewebuploader/templates/blurb.html b/bh20simplewebuploader/templates/blurb.html
index 9eef7c2..067cc3b 100644
--- a/bh20simplewebuploader/templates/blurb.html
+++ b/bh20simplewebuploader/templates/blurb.html
@@ -2,12 +2,12 @@
   This is the COVID-19 Public Sequence Resource (COVID-19 PubSeq) for
   SARS-CoV-2 virus sequences. COVID-19 PubSeq is a repository for
   sequences with a low barrier to entry for uploading sequence data
-  using best practices, including <a href="https://en.wikipedia.org/wiki/FAIR_data">FAIR data</a>. I.e., data published with a creative commons
-  CC0 or CC-4.0 license with metadata using state-of-the art standards
+  using best practices, including <a href="https://en.wikipedia.org/wiki/FAIR_data">FAIR data</a>. Data are published with
+  metadata using state-of-the art standards
   and, perhaps most importantly, providing standardised workflows that
   get triggered on upload, so that results are immediately available
   in standardised data formats.
-
+  
   Your uploaded sequence will automatically be processed and
   incorporated into the public pangenome with metadata using worklows
   from the High Performance Open Biology Lab
diff --git a/bh20simplewebuploader/templates/footer.html b/bh20simplewebuploader/templates/footer.html
index 26ea82a..abf46c3 100644
--- a/bh20simplewebuploader/templates/footer.html
+++ b/bh20simplewebuploader/templates/footer.html
@@ -15,6 +15,11 @@
       </p>
 
     </div>
+    <div>
+      <a href="static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.pdf">
+        <img src=static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.png"  alt="BCC2020 Andrea Guarracino COVID19 PubSeq Poster"/>
+      </a>
+    </div>
     <div class="sponsors">
       <div class="sponsorimg">
         <a href="https://github.com/virtual-biohackathons/covid-19-bh20">
diff --git a/doc/blog/using-covid-19-pubseq-part2.html b/doc/blog/using-covid-19-pubseq-part2.html
index c047441..c041ebe 100644
--- a/doc/blog/using-covid-19-pubseq-part2.html
+++ b/doc/blog/using-covid-19-pubseq-part2.html
@@ -259,39 +259,12 @@ for the JavaScript code in this tag.
 </ul>
 </div>
 </div>
-<p>
-As part of the COVID-19 Biohackathon 2020 we formed a working group to
-create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for
-Corona virus sequences. The general idea is to create a repository
-that has a low barrier to entry for uploading sequence data using best
-practices. I.e., data published with a creative commons 4.0 (CC-4.0)
-license with metadata using state-of-the art standards and, perhaps
-most importantly, providing standardised workflows that get triggered
-on upload, so that results are immediately available in standardised
-data formats.
-</p>
 
 <div id="outline-container-org7942167" class="outline-2">
 <h2 id="org7942167"><span class="section-number-2">1</span> Finding output of workflows</h2>
 <div class="outline-text-2" id="text-1">
-<p>
-As part of the COVID-19 Biohackathon 2020 we formed a working group to
-create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for
-Corona virus sequences. The general idea is to create a repository
-that has a low barrier to entry for uploading sequence data using best
-practices. I.e., data published with a creative commons 4.0 (CC-4.0)
-license with metadata using state-of-the art standards and, perhaps
-most importantly, providing standardised workflows that get triggered
-on upload, so that results are immediately available in standardised
-data formats.
-</p>
-</div>
-</div>
 
-<div id="outline-container-org0022bbe" class="outline-2">
-<h2 id="org0022bbe"><span class="section-number-2">2</span> Introduction</h2>
-<div class="outline-text-2" id="text-2">
-<p>
+ <p>
 We are using Arvados to run common workflow language (CWL) pipelines.
 The most recent output is on display on a <a href="https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca">web page</a> (with time stamp)
 and a full list is generated <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">here</a>. It is nice to start up, but for
@@ -302,7 +275,7 @@ want to wade through thousands of output files!
 </div>
 
 <div id="outline-container-org3929710" class="outline-2">
-<h2 id="org3929710"><span class="section-number-2">3</span> The Arvados file interface</h2>
+<h2 id="org3929710"><span class="section-number-2">2</span> The Arvados file interface</h2>
 <div class="outline-text-2" id="text-3">
 <p>
 Arvados has the web server, but it also has a REST API and associated
@@ -384,7 +357,7 @@ arv-get 2be6af7b4741f2a5c5f8ff2bc6152d73+1955623+Ab9ad65d7fe958a053b3a57d545839d
 </div>
 
 <div id="outline-container-orgc4dba6e" class="outline-2">
-<h2 id="orgc4dba6e"><span class="section-number-2">4</span> Using the Arvados API</h2>
+<h2 id="orgc4dba6e"><span class="section-number-2">3</span> TODO Using the Arvados API</h2>
 </div>
 </div>
 <div id="postamble" class="status">
diff --git a/doc/blog/using-covid-19-pubseq-part2.org b/doc/blog/using-covid-19-pubseq-part2.org
index d2a1cbc..349fd06 100644
--- a/doc/blog/using-covid-19-pubseq-part2.org
+++ b/doc/blog/using-covid-19-pubseq-part2.org
@@ -8,36 +8,13 @@
 #+HTML_LINK_HOME: http://covid19.genenetwork.org
 #+HTML_HEAD: <link rel="Blog stylesheet" type="text/css" href="blog.css" />
 
-As part of the COVID-19 Biohackathon 2020 we formed a working group to
-create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for
-Corona virus sequences. The general idea is to create a repository
-that has a low barrier to entry for uploading sequence data using best
-practices. I.e., data published with a creative commons 4.0 (CC-4.0)
-license with metadata using state-of-the art standards and, perhaps
-most importantly, providing standardised workflows that get triggered
-on upload, so that results are immediately available in standardised
-data formats.
-
 * Table of Contents                                                     :TOC:noexport:
  - [[#finding-output-of-workflows][Finding output of workflows]]
- - [[#introduction][Introduction]]
  - [[#the-arvados-file-interface][The Arvados file interface]]
  - [[#using-the-arvados-api][Using the Arvados API]]
 
 * Finding output of workflows
 
-As part of the COVID-19 Biohackathon 2020 we formed a working group to
-create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for
-Corona virus sequences. The general idea is to create a repository
-that has a low barrier to entry for uploading sequence data using best
-practices. I.e., data published with a creative commons 4.0 (CC-4.0)
-license with metadata using state-of-the art standards and, perhaps
-most importantly, providing standardised workflows that get triggered
-on upload, so that results are immediately available in standardised
-data formats.
-
-* Introduction
-
 We are using Arvados to run common workflow language (CWL) pipelines.
 The most recent output is on display on a [[https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca][web page]] (with time stamp)
 and a full list is generated [[https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/][here]]. It is nice to start up, but for
@@ -81,4 +58,4 @@ its listed UUID:
 
 : arv-get 2be6af7b4741f2a5c5f8ff2bc6152d73+1955623+Ab9ad65d7fe958a053b3a57d545839de18290843a@5ed7f3c5
 
-* Using the Arvados API
+* TODO Using the Arvados API
diff --git a/doc/blog/using-covid-19-pubseq-part3.html b/doc/blog/using-covid-19-pubseq-part3.html
index 91879b0..df4a286 100644
--- a/doc/blog/using-covid-19-pubseq-part3.html
+++ b/doc/blog/using-covid-19-pubseq-part3.html
@@ -625,7 +625,7 @@ The web interface using this exact same script so it should just work
 <h3 id="org39adf09"><span class="section-number-3">6.2</span> Example: uploading bulk GenBank sequences</h3>
 <div class="outline-text-3" id="text-6-2">
 <p>
-We also use above script to bulk upload GenBank sequences with a <a href="https://github.com/arvados/bh20-seq-resource/blob/master/scripts/from_genbank_to_fasta_and_yaml.py">FASTA
+We also use above script to bulk upload GenBank sequences with a <a href="https://github.com/arvados/bh20-seq-resource/blob/master/scripts/download_genbank_data/from_genbank_to_fasta_and_yaml.py">FASTA
 and YAML</a> extractor specific for GenBank. This means that the steps we
 took above for uploading a GenBank sequence are already automated.
 </p>
diff --git a/doc/blog/using-covid-19-pubseq-part3.org b/doc/blog/using-covid-19-pubseq-part3.org
index 03f37ab..e8fee36 100644
--- a/doc/blog/using-covid-19-pubseq-part3.org
+++ b/doc/blog/using-covid-19-pubseq-part3.org
@@ -234,6 +234,6 @@ The web interface using this exact same script so it should just work
 
 ** Example: uploading bulk GenBank sequences
 
-We also use above script to bulk upload GenBank sequences with a [[https://github.com/arvados/bh20-seq-resource/blob/master/scripts/from_genbank_to_fasta_and_yaml.py][FASTA
+We also use above script to bulk upload GenBank sequences with a [[https://github.com/arvados/bh20-seq-resource/blob/master/scripts/download_genbank_data/from_genbank_to_fasta_and_yaml.py][FASTA
 and YAML]] extractor specific for GenBank. This means that the steps we
 took above for uploading a GenBank sequence are already automated.
diff --git a/doc/web/about.html b/doc/web/about.html
index dfd4252..c971a4e 100644
--- a/doc/web/about.html
+++ b/doc/web/about.html
@@ -1,549 +1,964 @@
 <?xml version="1.0" encoding="utf-8"?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
 <head>
-<!-- 2020-07-18 Sat 03:27 -->
-<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
-<meta name="viewport" content="width=device-width, initial-scale=1" />
-<title>About/FAQ</title>
-<meta name="generator" content="Org mode" />
-<meta name="author" content="Pjotr Prins" />
-<style type="text/css">
- <!--/*--><![CDATA[/*><!--*/
-  .title  { text-align: center;
-             margin-bottom: .2em; }
-  .subtitle { text-align: center;
-              font-size: medium;
-              font-weight: bold;
-              margin-top:0; }
-  .todo   { font-family: monospace; color: red; }
-  .done   { font-family: monospace; color: green; }
-  .priority { font-family: monospace; color: orange; }
-  .tag    { background-color: #eee; font-family: monospace;
-            padding: 2px; font-size: 80%; font-weight: normal; }
-  .timestamp { color: #bebebe; }
-  .timestamp-kwd { color: #5f9ea0; }
-  .org-right  { margin-left: auto; margin-right: 0px;  text-align: right; }
-  .org-left   { margin-left: 0px;  margin-right: auto; text-align: left; }
-  .org-center { margin-left: auto; margin-right: auto; text-align: center; }
-  .underline { text-decoration: underline; }
-  #postamble p, #preamble p { font-size: 90%; margin: .2em; }
-  p.verse { margin-left: 3%; }
-  pre {
-    border: 1px solid #ccc;
-    box-shadow: 3px 3px 3px #eee;
-    padding: 8pt;
-    font-family: monospace;
-    overflow: auto;
-    margin: 1.2em;
-  }
-  pre.src {
-    position: relative;
-    overflow: visible;
-    padding-top: 1.2em;
-  }
-  pre.src:before {
-    display: none;
-    position: absolute;
-    background-color: white;
-    top: -10px;
-    right: 10px;
-    padding: 3px;
-    border: 1px solid black;
-  }
-  pre.src:hover:before { display: inline;}
-  /* Languages per Org manual */
-  pre.src-asymptote:before { content: 'Asymptote'; }
-  pre.src-awk:before { content: 'Awk'; }
-  pre.src-C:before { content: 'C'; }
-  /* pre.src-C++ doesn't work in CSS */
-  pre.src-clojure:before { content: 'Clojure'; }
-  pre.src-css:before { content: 'CSS'; }
-  pre.src-D:before { content: 'D'; }
-  pre.src-ditaa:before { content: 'ditaa'; }
-  pre.src-dot:before { content: 'Graphviz'; }
-  pre.src-calc:before { content: 'Emacs Calc'; }
-  pre.src-emacs-lisp:before { content: 'Emacs Lisp'; }
-  pre.src-fortran:before { content: 'Fortran'; }
-  pre.src-gnuplot:before { content: 'gnuplot'; }
-  pre.src-haskell:before { content: 'Haskell'; }
-  pre.src-hledger:before { content: 'hledger'; }
-  pre.src-java:before { content: 'Java'; }
-  pre.src-js:before { content: 'Javascript'; }
-  pre.src-latex:before { content: 'LaTeX'; }
-  pre.src-ledger:before { content: 'Ledger'; }
-  pre.src-lisp:before { content: 'Lisp'; }
-  pre.src-lilypond:before { content: 'Lilypond'; }
-  pre.src-lua:before { content: 'Lua'; }
-  pre.src-matlab:before { content: 'MATLAB'; }
-  pre.src-mscgen:before { content: 'Mscgen'; }
-  pre.src-ocaml:before { content: 'Objective Caml'; }
-  pre.src-octave:before { content: 'Octave'; }
-  pre.src-org:before { content: 'Org mode'; }
-  pre.src-oz:before { content: 'OZ'; }
-  pre.src-plantuml:before { content: 'Plantuml'; }
-  pre.src-processing:before { content: 'Processing.js'; }
-  pre.src-python:before { content: 'Python'; }
-  pre.src-R:before { content: 'R'; }
-  pre.src-ruby:before { content: 'Ruby'; }
-  pre.src-sass:before { content: 'Sass'; }
-  pre.src-scheme:before { content: 'Scheme'; }
-  pre.src-screen:before { content: 'Gnu Screen'; }
-  pre.src-sed:before { content: 'Sed'; }
-  pre.src-sh:before { content: 'shell'; }
-  pre.src-sql:before { content: 'SQL'; }
-  pre.src-sqlite:before { content: 'SQLite'; }
-  /* additional languages in org.el's org-babel-load-languages alist */
-  pre.src-forth:before { content: 'Forth'; }
-  pre.src-io:before { content: 'IO'; }
-  pre.src-J:before { content: 'J'; }
-  pre.src-makefile:before { content: 'Makefile'; }
-  pre.src-maxima:before { content: 'Maxima'; }
-  pre.src-perl:before { content: 'Perl'; }
-  pre.src-picolisp:before { content: 'Pico Lisp'; }
-  pre.src-scala:before { content: 'Scala'; }
-  pre.src-shell:before { content: 'Shell Script'; }
-  pre.src-ebnf2ps:before { content: 'ebfn2ps'; }
-  /* additional language identifiers per "defun org-babel-execute"
-       in ob-*.el */
-  pre.src-cpp:before  { content: 'C++'; }
-  pre.src-abc:before  { content: 'ABC'; }
-  pre.src-coq:before  { content: 'Coq'; }
-  pre.src-groovy:before  { content: 'Groovy'; }
-  /* additional language identifiers from org-babel-shell-names in
-     ob-shell.el: ob-shell is the only babel language using a lambda to put
-     the execution function name together. */
-  pre.src-bash:before  { content: 'bash'; }
-  pre.src-csh:before  { content: 'csh'; }
-  pre.src-ash:before  { content: 'ash'; }
-  pre.src-dash:before  { content: 'dash'; }
-  pre.src-ksh:before  { content: 'ksh'; }
-  pre.src-mksh:before  { content: 'mksh'; }
-  pre.src-posh:before  { content: 'posh'; }
-  /* Additional Emacs modes also supported by the LaTeX listings package */
-  pre.src-ada:before { content: 'Ada'; }
-  pre.src-asm:before { content: 'Assembler'; }
-  pre.src-caml:before { content: 'Caml'; }
-  pre.src-delphi:before { content: 'Delphi'; }
-  pre.src-html:before { content: 'HTML'; }
-  pre.src-idl:before { content: 'IDL'; }
-  pre.src-mercury:before { content: 'Mercury'; }
-  pre.src-metapost:before { content: 'MetaPost'; }
-  pre.src-modula-2:before { content: 'Modula-2'; }
-  pre.src-pascal:before { content: 'Pascal'; }
-  pre.src-ps:before { content: 'PostScript'; }
-  pre.src-prolog:before { content: 'Prolog'; }
-  pre.src-simula:before { content: 'Simula'; }
-  pre.src-tcl:before { content: 'tcl'; }
-  pre.src-tex:before { content: 'TeX'; }
-  pre.src-plain-tex:before { content: 'Plain TeX'; }
-  pre.src-verilog:before { content: 'Verilog'; }
-  pre.src-vhdl:before { content: 'VHDL'; }
-  pre.src-xml:before { content: 'XML'; }
-  pre.src-nxml:before { content: 'XML'; }
-  /* add a generic configuration mode; LaTeX export needs an additional
-     (add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */
-  pre.src-conf:before { content: 'Configuration File'; }
-
-  table { border-collapse:collapse; }
-  caption.t-above { caption-side: top; }
-  caption.t-bottom { caption-side: bottom; }
-  td, th { vertical-align:top;  }
-  th.org-right  { text-align: center;  }
-  th.org-left   { text-align: center;   }
-  th.org-center { text-align: center; }
-  td.org-right  { text-align: right;  }
-  td.org-left   { text-align: left;   }
-  td.org-center { text-align: center; }
-  dt { font-weight: bold; }
-  .footpara { display: inline; }
-  .footdef  { margin-bottom: 1em; }
-  .figure { padding: 1em; }
-  .figure p { text-align: center; }
-  .equation-container {
-    display: table;
-    text-align: center;
-    width: 100%;
-  }
-  .equation {
-    vertical-align: middle;
-  }
-  .equation-label {
-    display: table-cell;
-    text-align: right;
-    vertical-align: middle;
-  }
-  .inlinetask {
-    padding: 10px;
-    border: 2px solid gray;
-    margin: 10px;
-    background: #ffffcc;
-  }
-  #org-div-home-and-up
-   { text-align: right; font-size: 70%; white-space: nowrap; }
-  textarea { overflow-x: auto; }
-  .linenr { font-size: smaller }
-  .code-highlighted { background-color: #ffff00; }
-  .org-info-js_info-navigation { border-style: none; }
-  #org-info-js_console-label
-    { font-size: 10px; font-weight: bold; white-space: nowrap; }
-  .org-info-js_search-highlight
-    { background-color: #ffff00; color: #000000; font-weight: bold; }
-  .org-svg { width: 90%; }
-  /*]]>*/-->
-</style>
-<script type="text/javascript">
-/*
-@licstart  The following is the entire license notice for the
-JavaScript code in this tag.
-
-Copyright (C) 2012-2020 Free Software Foundation, Inc.
-
-The JavaScript code in this tag is free software: you can
-redistribute it and/or modify it under the terms of the GNU
-General Public License (GNU GPL) as published by the Free Software
-Foundation, either version 3 of the License, or (at your option)
-any later version.  The code is distributed WITHOUT ANY WARRANTY;
-without even the implied warranty of MERCHANTABILITY or FITNESS
-FOR A PARTICULAR PURPOSE.  See the GNU GPL for more details.
-
-As additional permission under GNU GPL version 3 section 7, you
-may distribute non-source (e.g., minimized or compacted) forms of
-that code without the copy of the GNU GPL normally required by
-section 4, provided you include this license notice and a URL
-through which recipients can access the Corresponding Source.
-
-
-@licend  The above is the entire license notice
-for the JavaScript code in this tag.
-*/
-<!--/*--><![CDATA[/*><!--*/
- function CodeHighlightOn(elem, id)
- {
-   var target = document.getElementById(id);
-   if(null != target) {
-     elem.cacheClassElem = elem.className;
-     elem.cacheClassTarget = target.className;
-     target.className = "code-highlighted";
-     elem.className   = "code-highlighted";
-   }
- }
- function CodeHighlightOff(elem, id)
- {
-   var target = document.getElementById(id);
-   if(elem.cacheClassElem)
-     elem.className = elem.cacheClassElem;
-   if(elem.cacheClassTarget)
-     target.className = elem.cacheClassTarget;
- }
-/*]]>*///-->
-</script>
+    <!-- 2020-07-18 Sat 03:27 -->
+    <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
+    <meta name="viewport" content="width=device-width, initial-scale=1"/>
+    <title>About/FAQ</title>
+    <meta name="generator" content="Org mode"/>
+    <meta name="author" content="Pjotr Prins"/>
+    <style type="text/css">
+        <!-- /*--><![CDATA[/*><!--*/
+        .title {
+            text-align: center;
+            margin-bottom: .2em;
+        }
+
+        .subtitle {
+            text-align: center;
+            font-size: medium;
+            font-weight: bold;
+            margin-top: 0;
+        }
+
+        .todo {
+            font-family: monospace;
+            color: red;
+        }
+
+        .done {
+            font-family: monospace;
+            color: green;
+        }
+
+        .priority {
+            font-family: monospace;
+            color: orange;
+        }
+
+        .tag {
+            background-color: #eee;
+            font-family: monospace;
+            padding: 2px;
+            font-size: 80%;
+            font-weight: normal;
+        }
+
+        .timestamp {
+            color: #bebebe;
+        }
+
+        .timestamp-kwd {
+            color: #5f9ea0;
+        }
+
+        .org-right {
+            margin-left: auto;
+            margin-right: 0px;
+            text-align: right;
+        }
+
+        .org-left {
+            margin-left: 0px;
+            margin-right: auto;
+            text-align: left;
+        }
+
+        .org-center {
+            margin-left: auto;
+            margin-right: auto;
+            text-align: center;
+        }
+
+        .underline {
+            text-decoration: underline;
+        }
+
+        #postamble p, #preamble p {
+            font-size: 90%;
+            margin: .2em;
+        }
+
+        p.verse {
+            margin-left: 3%;
+        }
+
+        pre {
+            border: 1px solid #ccc;
+            box-shadow: 3px 3px 3px #eee;
+            padding: 8pt;
+            font-family: monospace;
+            overflow: auto;
+            margin: 1.2em;
+        }
+
+        pre.src {
+            position: relative;
+            overflow: visible;
+            padding-top: 1.2em;
+        }
+
+        pre.src:before {
+            display: none;
+            position: absolute;
+            background-color: white;
+            top: -10px;
+            right: 10px;
+            padding: 3px;
+            border: 1px solid black;
+        }
+
+        pre.src:hover:before {
+            display: inline;
+        }
+
+        /* Languages per Org manual */
+        pre.src-asymptote:before {
+            content: 'Asymptote';
+        }
+
+        pre.src-awk:before {
+            content: 'Awk';
+        }
+
+        pre.src-C:before {
+            content: 'C';
+        }
+
+        /* pre.src-C++ doesn't work in CSS */
+        pre.src-clojure:before {
+            content: 'Clojure';
+        }
+
+        pre.src-css:before {
+            content: 'CSS';
+        }
+
+        pre.src-D:before {
+            content: 'D';
+        }
+
+        pre.src-ditaa:before {
+            content: 'ditaa';
+        }
+
+        pre.src-dot:before {
+            content: 'Graphviz';
+        }
+
+        pre.src-calc:before {
+            content: 'Emacs Calc';
+        }
+
+        pre.src-emacs-lisp:before {
+            content: 'Emacs Lisp';
+        }
+
+        pre.src-fortran:before {
+            content: 'Fortran';
+        }
+
+        pre.src-gnuplot:before {
+            content: 'gnuplot';
+        }
+
+        pre.src-haskell:before {
+            content: 'Haskell';
+        }
+
+        pre.src-hledger:before {
+            content: 'hledger';
+        }
+
+        pre.src-java:before {
+            content: 'Java';
+        }
+
+        pre.src-js:before {
+            content: 'Javascript';
+        }
+
+        pre.src-latex:before {
+            content: 'LaTeX';
+        }
+
+        pre.src-ledger:before {
+            content: 'Ledger';
+        }
+
+        pre.src-lisp:before {
+            content: 'Lisp';
+        }
+
+        pre.src-lilypond:before {
+            content: 'Lilypond';
+        }
+
+        pre.src-lua:before {
+            content: 'Lua';
+        }
+
+        pre.src-matlab:before {
+            content: 'MATLAB';
+        }
+
+        pre.src-mscgen:before {
+            content: 'Mscgen';
+        }
+
+        pre.src-ocaml:before {
+            content: 'Objective Caml';
+        }
+
+        pre.src-octave:before {
+            content: 'Octave';
+        }
+
+        pre.src-org:before {
+            content: 'Org mode';
+        }
+
+        pre.src-oz:before {
+            content: 'OZ';
+        }
+
+        pre.src-plantuml:before {
+            content: 'Plantuml';
+        }
+
+        pre.src-processing:before {
+            content: 'Processing.js';
+        }
+
+        pre.src-python:before {
+            content: 'Python';
+        }
+
+        pre.src-R:before {
+            content: 'R';
+        }
+
+        pre.src-ruby:before {
+            content: 'Ruby';
+        }
+
+        pre.src-sass:before {
+            content: 'Sass';
+        }
+
+        pre.src-scheme:before {
+            content: 'Scheme';
+        }
+
+        pre.src-screen:before {
+            content: 'Gnu Screen';
+        }
+
+        pre.src-sed:before {
+            content: 'Sed';
+        }
+
+        pre.src-sh:before {
+            content: 'shell';
+        }
+
+        pre.src-sql:before {
+            content: 'SQL';
+        }
+
+        pre.src-sqlite:before {
+            content: 'SQLite';
+        }
+
+        /* additional languages in org.el's org-babel-load-languages alist */
+        pre.src-forth:before {
+            content: 'Forth';
+        }
+
+        pre.src-io:before {
+            content: 'IO';
+        }
+
+        pre.src-J:before {
+            content: 'J';
+        }
+
+        pre.src-makefile:before {
+            content: 'Makefile';
+        }
+
+        pre.src-maxima:before {
+            content: 'Maxima';
+        }
+
+        pre.src-perl:before {
+            content: 'Perl';
+        }
+
+        pre.src-picolisp:before {
+            content: 'Pico Lisp';
+        }
+
+        pre.src-scala:before {
+            content: 'Scala';
+        }
+
+        pre.src-shell:before {
+            content: 'Shell Script';
+        }
+
+        pre.src-ebnf2ps:before {
+            content: 'ebfn2ps';
+        }
+
+        /* additional language identifiers per "defun org-babel-execute"
+             in ob-*.el */
+        pre.src-cpp:before {
+            content: 'C++';
+        }
+
+        pre.src-abc:before {
+            content: 'ABC';
+        }
+
+        pre.src-coq:before {
+            content: 'Coq';
+        }
+
+        pre.src-groovy:before {
+            content: 'Groovy';
+        }
+
+        /* additional language identifiers from org-babel-shell-names in
+           ob-shell.el: ob-shell is the only babel language using a lambda to put
+           the execution function name together. */
+        pre.src-bash:before {
+            content: 'bash';
+        }
+
+        pre.src-csh:before {
+            content: 'csh';
+        }
+
+        pre.src-ash:before {
+            content: 'ash';
+        }
+
+        pre.src-dash:before {
+            content: 'dash';
+        }
+
+        pre.src-ksh:before {
+            content: 'ksh';
+        }
+
+        pre.src-mksh:before {
+            content: 'mksh';
+        }
+
+        pre.src-posh:before {
+            content: 'posh';
+        }
+
+        /* Additional Emacs modes also supported by the LaTeX listings package */
+        pre.src-ada:before {
+            content: 'Ada';
+        }
+
+        pre.src-asm:before {
+            content: 'Assembler';
+        }
+
+        pre.src-caml:before {
+            content: 'Caml';
+        }
+
+        pre.src-delphi:before {
+            content: 'Delphi';
+        }
+
+        pre.src-html:before {
+            content: 'HTML';
+        }
+
+        pre.src-idl:before {
+            content: 'IDL';
+        }
+
+        pre.src-mercury:before {
+            content: 'Mercury';
+        }
+
+        pre.src-metapost:before {
+            content: 'MetaPost';
+        }
+
+        pre.src-modula-2:before {
+            content: 'Modula-2';
+        }
+
+        pre.src-pascal:before {
+            content: 'Pascal';
+        }
+
+        pre.src-ps:before {
+            content: 'PostScript';
+        }
+
+        pre.src-prolog:before {
+            content: 'Prolog';
+        }
+
+        pre.src-simula:before {
+            content: 'Simula';
+        }
+
+        pre.src-tcl:before {
+            content: 'tcl';
+        }
+
+        pre.src-tex:before {
+            content: 'TeX';
+        }
+
+        pre.src-plain-tex:before {
+            content: 'Plain TeX';
+        }
+
+        pre.src-verilog:before {
+            content: 'Verilog';
+        }
+
+        pre.src-vhdl:before {
+            content: 'VHDL';
+        }
+
+        pre.src-xml:before {
+            content: 'XML';
+        }
+
+        pre.src-nxml:before {
+            content: 'XML';
+        }
+
+        /* add a generic configuration mode; LaTeX export needs an additional
+           (add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */
+        pre.src-conf:before {
+            content: 'Configuration File';
+        }
+
+        table {
+            border-collapse: collapse;
+        }
+
+        caption.t-above {
+            caption-side: top;
+        }
+
+        caption.t-bottom {
+            caption-side: bottom;
+        }
+
+        td, th {
+            vertical-align: top;
+        }
+
+        th.org-right {
+            text-align: center;
+        }
+
+        th.org-left {
+            text-align: center;
+        }
+
+        th.org-center {
+            text-align: center;
+        }
+
+        td.org-right {
+            text-align: right;
+        }
+
+        td.org-left {
+            text-align: left;
+        }
+
+        td.org-center {
+            text-align: center;
+        }
+
+        dt {
+            font-weight: bold;
+        }
+
+        .footpara {
+            display: inline;
+        }
+
+        .footdef {
+            margin-bottom: 1em;
+        }
+
+        .figure {
+            padding: 1em;
+        }
+
+        .figure p {
+            text-align: center;
+        }
+
+        .equation-container {
+            display: table;
+            text-align: center;
+            width: 100%;
+        }
+
+        .equation {
+            vertical-align: middle;
+        }
+
+        .equation-label {
+            display: table-cell;
+            text-align: right;
+            vertical-align: middle;
+        }
+
+        .inlinetask {
+            padding: 10px;
+            border: 2px solid gray;
+            margin: 10px;
+            background: #ffffcc;
+        }
+
+        #org-div-home-and-up {
+            text-align: right;
+            font-size: 70%;
+            white-space: nowrap;
+        }
+
+        textarea {
+            overflow-x: auto;
+        }
+
+        .linenr {
+            font-size: smaller
+        }
+
+        .code-highlighted {
+            background-color: #ffff00;
+        }
+
+        .org-info-js_info-navigation {
+            border-style: none;
+        }
+
+        #org-info-js_console-label {
+            font-size: 10px;
+            font-weight: bold;
+            white-space: nowrap;
+        }
+
+        .org-info-js_search-highlight {
+            background-color: #ffff00;
+            color: #000000;
+            font-weight: bold;
+        }
+
+        .org-svg {
+            width: 90%;
+        }
+
+        /*]]>*/
+        -->
+    </style>
+    <script type="text/javascript">
+        /*
+        @licstart  The following is the entire license notice for the
+        JavaScript code in this tag.
+
+        Copyright (C) 2012-2020 Free Software Foundation, Inc.
+
+        The JavaScript code in this tag is free software: you can
+        redistribute it and/or modify it under the terms of the GNU
+        General Public License (GNU GPL) as published by the Free Software
+        Foundation, either version 3 of the License, or (at your option)
+        any later version.  The code is distributed WITHOUT ANY WARRANTY;
+        without even the implied warranty of MERCHANTABILITY or FITNESS
+        FOR A PARTICULAR PURPOSE.  See the GNU GPL for more details.
+
+        As additional permission under GNU GPL version 3 section 7, you
+        may distribute non-source (e.g., minimized or compacted) forms of
+        that code without the copy of the GNU GPL normally required by
+        section 4, provided you include this license notice and a URL
+        through which recipients can access the Corresponding Source.
+
+
+        @licend  The above is the entire license notice
+        for the JavaScript code in this tag.
+        */
+        <!--/*--><![CDATA[/*><!--*/
+        function CodeHighlightOn(elem, id) {
+            var target = document.getElementById(id);
+            if (null != target) {
+                elem.cacheClassElem = elem.className;
+                elem.cacheClassTarget = target.className;
+                target.className = "code-highlighted";
+                elem.className = "code-highlighted";
+            }
+        }
+
+        function CodeHighlightOff(elem, id) {
+            var target = document.getElementById(id);
+            if (elem.cacheClassElem)
+                elem.className = elem.cacheClassElem;
+            if (elem.cacheClassTarget)
+                target.className = elem.cacheClassTarget;
+        }
+
+        /*]]>*///-->
+    </script>
 </head>
 <body>
 <div id="content">
-<h1 class="title">About/FAQ</h1>
-<div id="table-of-contents">
-<h2>Table of Contents</h2>
-<div id="text-table-of-contents">
-<ul>
-<li><a href="#org0db9061">1. What is the 'public sequence resource' about?</a></li>
-<li><a href="#org983877d">2. Who created the public sequence resource?</a></li>
-<li><a href="#org83093c3">3. How does the public sequence resource compare to other data resources?</a></li>
-<li><a href="#org9b31fd4">4. Why should I upload my data here?</a></li>
-<li><a href="#org4e92cb5">5. Why should I not upload by data here?</a></li>
-<li><a href="#orgdfe72f6">6. How does the public sequence resource work?</a></li>
-<li><a href="#orgd0c5abb">7. Who uses the public sequence resource?</a></li>
-<li><a href="#org56f4a54">8. How can I contribute?</a></li>
-<li><a href="#org2240ef7">9. Is this about open data?</a></li>
-<li><a href="#orgbb655e0">10. Is this about free software?</a></li>
-<li><a href="#org4e779f4">11. How do I upload raw data?</a></li>
-<li><a href="#org83f6b7b">12. How do I change metadata?</a></li>
-<li><a href="#org1bc6dab">13. How do I change the work flows?</a></li>
-<li><a href="#org1140d62">14. How do I change the source code?</a></li>
-<li><a href="#orge182714">15. Should I choose CC-BY or CC0?</a></li>
-<li><a href="#orgf4a692b">16. How do I deal with private data and privacy?</a></li>
-<li><a href="#org7757574">17. How do I communicate with you?</a></li>
-<li><a href="#org194006f">18. Who are the sponsors?</a></li>
-</ul>
-</div>
-</div>
-
-<div id="outline-container-org0db9061" class="outline-2">
-<h2 id="org0db9061"><span class="section-number-2">1</span> What is the 'public sequence resource' about?</h2>
-<div class="outline-text-2" id="text-1">
-<p>
-The <b>public sequence resource</b> aims to provide a generic and useful
-resource for COVID-19 research.  The focus is on providing the best
-possible sequence data with associated metadata that can be used for
-sequence comparison and protein prediction.
-</p>
-</div>
-</div>
-
-<div id="outline-container-org983877d" class="outline-2">
-<h2 id="org983877d"><span class="section-number-2">2</span> Who created the public sequence resource?</h2>
-<div class="outline-text-2" id="text-2">
-<p>
-The <b>public sequence resource</b> is an initiative by <a href="https://github.com/arvados/bh20-seq-resource/graphs/contributors">bioinformatics</a> and
-ontology experts who want to create something agile and useful for the
-wider research community. The initiative started at the COVID-19
-biohackathon in April 2020 and is ongoing. The main project drivers
-are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Andrea Guarracino
-(University of Rome Tor Vergata), Michael Crusoe (Common Workflow
-Language), Thomas Liener (consultant, formerly EBI), Erik Garrison
-(UCSC) and Jerven Bolleman (Swiss Institute of Bioinformatics).
-</p>
-
-<p>
-Notably, as this is a free software initiative, the project represents
-major work by hundreds of software developers and ontology and data
-wrangling experts. Thank you everyone!
-</p>
-</div>
-</div>
-
-<div id="outline-container-org83093c3" class="outline-2">
-<h2 id="org83093c3"><span class="section-number-2">3</span> How does the public sequence resource compare to other data resources?</h2>
-<div class="outline-text-2" id="text-3">
-<p>
-The short version is that we use state-of-the-art practices in
-bioinformatics using agile methods. Unlike the resources from large
-institutes we can improve things on a dime and anyone can contribute
-to building out this resource! Sequences from GenBank, EBI/ENA and
-others are regularly added to PubSeq. We encourage people to everyone
-to submit on PubSeq because of its superior live tooling and metadata
-support (see the next question).
-</p>
-
-<p>
-Importantly: all data is published under either the <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons
-4.0 attribution license</a> or the <a href="https://creativecommons.org/share-your-work/public-domain/cc0/">CC0 “No Rights Reserved” license</a> which
-means it data can be published and workflows can run in public
-environments allowing for improved access for research and
-reproducible results. This contrasts with some other public resources,
-such as GISAID.
-</p>
-</div>
-</div>
-
-<div id="outline-container-org9b31fd4" class="outline-2">
-<h2 id="org9b31fd4"><span class="section-number-2">4</span> Why should I upload my data here?</h2>
-<div class="outline-text-2" id="text-4">
-<ol class="org-ol">
-<li>We champion truly shareable data without licensing restrictions - with proper
-attribution</li>
-<li>We provide full metadata support using state-of-the-art ontology's</li>
-<li>We provide a web-based sequence uploader and a command-line version
-for bulk uploads</li>
-<li>We provide a live SPARQL end-point for all metadata</li>
-<li>We provide free data analysis and sequence comparison triggered on data upload</li>
-<li>We do real work for you, with this <a href="https://workbench.lugli.arvadosapi.com/container_requests/lugli-xvhdp-bhhk4nxx1lch5od">link</a> you can see the last
-run took 5.5 hours!</li>
-<li>We provide free downloads of all computed output</li>
-<li>There is no need to set up pipelines and/or compute clusters</li>
-<li>All workflows get triggered on uploading a new sequence</li>
-<li>When someone (you?) improves the software/workflows and everyone benefits</li>
-<li>Your data gets automatically integrated with the Swiss Institure of
-Bioinformatics COVID-19 knowledge base
-<a href="https://covid-19-sparql.expasy.org/">https://covid-19-sparql.expasy.org/</a> (Elixir Switzerland)</li>
-<li>Your data will be used to develop drug targets</li>
-</ol>
-
-<p>
-Finally, if you upload your data here we have workflows that output
-formatted data suitable for <a href="http://covid19.genenetwork.org/blog?id=using-covid-19-pubseq-part6">uploading to EBI resources</a> (and soon
-others). Uploading your data here get your data ready for upload to
-multiple resources.
-</p>
-</div>
-</div>
-
-<div id="outline-container-org4e92cb5" class="outline-2">
-<h2 id="org4e92cb5"><span class="section-number-2">5</span> Why should I not upload by data here?</h2>
-<div class="outline-text-2" id="text-5">
-<p>
-Funny question.  There are only good reasons to upload your data here
-and make it available to the widest audience possible.
-</p>
-
-<p>
-In fact, you can upload your data here as well as to other
-resources. It is your data after all. No one can prevent you from
-uploading your data to multiple resources.
-</p>
-
-<p>
-We recommend uploading to EBI and NCBI resources using our data
-conversion tools. It means you only enter data once and make the
-process smooth. You can also use our command line data uploader
-for bulk uploads!
-</p>
-</div>
-</div>
-
-<div id="outline-container-orgdfe72f6" class="outline-2">
-<h2 id="orgdfe72f6"><span class="section-number-2">6</span> How does the public sequence resource work?</h2>
-<div class="outline-text-2" id="text-6">
-<p>
-On uploading a sequence with metadata it will automatically be
-processed and incorporated into the public pangenome with metadata
-using workflows from the High Performance Open Biology Lab defined
-<a href="https://github.com/hpobio-lab/viral-analysis/tree/master/cwl/pangenome-generate">here</a>.
-</p>
-</div>
-</div>
-
-<div id="outline-container-orgd0c5abb" class="outline-2">
-<h2 id="orgd0c5abb"><span class="section-number-2">7</span> Who uses the public sequence resource?</h2>
-<div class="outline-text-2" id="text-7">
-<p>
-The Swiss Institute of Bioinformatics has included this data in
-<a href="https://covid-19-sparql.expasy.org/">https://covid-19-sparql.expasy.org/</a> and made it part of <a href="https://www.uniprot.org/">Uniprot</a>.
-</p>
-
-<p>
-The Pantograph <a href="https://graph-genome.github.io/">viewer</a> uses PubSeq data for their visualisations.
-</p>
-
-<p>
-<a href="https://uthsc.edu">UTHSC</a> (USA), <a href="https://www.esr.cri.nz/">ESR</a> (New Zealand) and <a href="https://www.ornl.gov/news/ornl-fight-against-covid-19">ORNL</a> (USA) use COVID-19 PubSeq data
-for monitoring, protein prediction and drug development.
-</p>
-</div>
-</div>
-
-<div id="outline-container-org56f4a54" class="outline-2">
-<h2 id="org56f4a54"><span class="section-number-2">8</span> How can I contribute?</h2>
-<div class="outline-text-2" id="text-8">
-<p>
-You can contribute by submitting sequences, updating metadata, submit
-issues on our issue tracker, and more importantly add functionality.
-See 'How do I change the source code' below. Read through our online
-documentation at <a href="http://covid19.genenetwork.org/blog">http://covid19.genenetwork.org/blog</a> as a starting
-point.
-</p>
-</div>
-</div>
-
-<div id="outline-container-org2240ef7" class="outline-2">
-<h2 id="org2240ef7"><span class="section-number-2">9</span> Is this about open data?</h2>
-<div class="outline-text-2" id="text-9">
-<p>
-All data is published under a <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons 4.0 attribution license</a>
-(CC-BY-4.0). You can download the raw and published (GFA/RDF/FASTA)
-data and store it for further processing.
-</p>
-</div>
-</div>
-
-<div id="outline-container-orgbb655e0" class="outline-2">
-<h2 id="orgbb655e0"><span class="section-number-2">10</span> Is this about free software?</h2>
-<div class="outline-text-2" id="text-10">
-<p>
-Absolutely. Free software allows for fully reproducible pipelines. You
-can take our workflows and data and run it elsewhere!
-</p>
-</div>
-</div>
-
-<div id="outline-container-org4e779f4" class="outline-2">
-<h2 id="org4e779f4"><span class="section-number-2">11</span> How do I upload raw data?</h2>
-<div class="outline-text-2" id="text-11">
-<p>
-We are preparing raw sequence data pipelines (fastq and BAM). The
-reason is that we want the best data possible for downstream analysis
-(including protein prediction and test development). The current
-approach where people publish final sequences of SARS-CoV-2 is lacking
-because it hides how this sequence was created. For reasons of
-reproducible and improved results we want/need to work with the raw
-sequence reads (both short reads and long reads) and take alternative
-assembly variations into consideration. This is all work in progress.
-</p>
-</div>
-</div>
-
-<div id="outline-container-org83f6b7b" class="outline-2">
-<h2 id="org83f6b7b"><span class="section-number-2">12</span> How do I change metadata?</h2>
-<div class="outline-text-2" id="text-12">
-<p>
-See the <a href="http://covid19.genenetwork.org/blog">http://covid19.genenetwork.org/blog</a>!
-</p>
-</div>
-</div>
-
-<div id="outline-container-org1bc6dab" class="outline-2">
-<h2 id="org1bc6dab"><span class="section-number-2">13</span> How do I change the work flows?</h2>
-<div class="outline-text-2" id="text-13">
-<p>
-Workflows are on <a href="https://github.com/arvados/bh20-seq-resource/tree/master/workflows">github</a> and can be modified. See also the BLOG
-<a href="http://covid19.genenetwork.org/blog">http://covid19.genenetwork.org/blog</a> on workflows.
-</p>
-</div>
-</div>
-
-<div id="outline-container-org1140d62" class="outline-2">
-<h2 id="org1140d62"><span class="section-number-2">14</span> How do I change the source code?</h2>
-<div class="outline-text-2" id="text-14">
-<p>
-Go to our <a href="https://github.com/arvados/bh20-seq-resource">source code repositories</a>, fork/clone the repository, change
-something and submit a <a href="https://github.com/arvados/bh20-seq-resource/pulls">pull request</a> (PR). That easy! Check out how
-many PRs we already merged.
-</p>
-</div>
-</div>
-
-<div id="outline-container-orge182714" class="outline-2">
-<h2 id="orge182714"><span class="section-number-2">15</span> Should I choose CC-BY or CC0?</h2>
-<div class="outline-text-2" id="text-15">
-<p>
-Restrictive data licenses are hampering data sharing and reproducible
-research. CC0 is the preferred license because it gives researchers
-the most freedom. Since we provide metadata there is no reason for
-others not to honour your work. We also provide CC-BY as an option
-because we know people like the attribution clause.
-</p>
-
-<p>
-In all honesty: we prefer both data and software to be free.
-</p>
-</div>
-</div>
-
-<div id="outline-container-orgf4a692b" class="outline-2">
-<h2 id="orgf4a692b"><span class="section-number-2">16</span> How do I deal with private data and privacy?</h2>
-<div class="outline-text-2" id="text-16">
-<p>
-A public sequence resource is about public data. Metadata can refer to
-private data. You can use your own (anonymous) identifiers.  We also
-plan to combine identifiers with clinical data stored securely at
-<a href="https://redcap-covid19.elixir-luxembourg.org/redcap/">REDCap</a>. See the relevant <a href="https://github.com/arvados/bh20-seq-resource/issues/21">tracker</a> for more information and contributing.
-</p>
-</div>
-</div>
-
-<div id="outline-container-org7757574" class="outline-2">
-<h2 id="org7757574"><span class="section-number-2">17</span> How do I communicate with you?</h2>
-<div class="outline-text-2" id="text-17">
-<p>
-We use a <a href="https://gitter.im/arvados/pubseq?utm_source=share-link&amp;utm_medium=link&amp;utm_campaign=share-link">gitter channel</a> you can join.
-</p>
-</div>
-</div>
-
-<div id="outline-container-org194006f" class="outline-2">
-<h2 id="org194006f"><span class="section-number-2">18</span> Who are the sponsors?</h2>
-<div class="outline-text-2" id="text-18">
-<p>
-The main sponsors are listed in the footer. In addition to the time
-generously donated by many contributors we also acknowledge Amazon AWS
-for donating COVID-19 related compute time.
-</p>
-</div>
-</div>
+    <h1 class="title">About/FAQ</h1>
+    <div id="table-of-contents">
+        <h2>Table of Contents</h2>
+        <div id="text-table-of-contents">
+            <ul>
+                <li><a href="#org0db9061">1. What is the 'public sequence resource' about?</a></li>
+                <li><a href="#org983877d">2. Who created the public sequence resource?</a></li>
+                <li><a href="#org83093c3">3. How does the public sequence resource compare to other data resources?</a>
+                </li>
+                <li><a href="#org9b31fd4">4. Why should I upload my data here?</a></li>
+                <li><a href="#org4e92cb5">5. Why should I not upload by data here?</a></li>
+                <li><a href="#orgdfe72f6">6. How does the public sequence resource work?</a></li>
+                <li><a href="#orgd0c5abb">7. Who uses the public sequence resource?</a></li>
+                <li><a href="#org56f4a54">8. How can I contribute?</a></li>
+                <li><a href="#org2240ef7">9. Is this about open data?</a></li>
+                <li><a href="#orgbb655e0">10. Is this about free software?</a></li>
+                <li><a href="#org4e779f4">11. How do I upload raw data?</a></li>
+                <li><a href="#org83f6b7b">12. How do I change metadata?</a></li>
+                <li><a href="#org1bc6dab">13. How do I change the work flows?</a></li>
+                <li><a href="#org1140d62">14. How do I change the source code?</a></li>
+                <li><a href="#orge182714">15. Should I choose CC-BY or CC0?</a></li>
+                <li><a href="#orgf4a692b">16. How do I deal with private data and privacy?</a></li>
+                <li><a href="#org7757574">17. How do I communicate with you?</a></li>
+                <li><a href="#org194006f">18. Who are the sponsors?</a></li>
+            </ul>
+        </div>
+    </div>
+
+    <div id="outline-container-org0db9061" class="outline-2">
+        <h2 id="org0db9061"><span class="section-number-2">1</span> What is the 'public sequence resource' about?</h2>
+        <div class="outline-text-2" id="text-1">
+            <p>
+                The <b>public sequence resource</b> aims to provide a generic and useful
+                resource for COVID-19 research. The focus is on providing the best
+                possible sequence data with associated metadata that can be used for
+                sequence comparison and protein prediction.
+            </p>
+            <p>
+                We were at the <strong>Bioinformatics Community Conference 2020</strong>! Have a look at the
+                <a href="https://bcc2020.sched.com/event/coLw">video talk</a></li>
+                (<a href="https://drive.google.com/file/d/1skXHwVKM_gl73-_4giYIOQ1IlC5X5uBo/view?usp=sharing">alternative link</a>)
+                and the <a href="https://drive.google.com/file/d/1vyEgfvSqhM9yIwWZ6Iys-QxhxtVxPSdp/view?usp=sharing">poster</a>.
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-org983877d" class="outline-2">
+        <h2 id="org983877d"><span class="section-number-2">2</span> Who created the public sequence resource?</h2>
+        <div class="outline-text-2" id="text-2">
+            <p>
+                The <b>public sequence resource</b> is an initiative by <a
+                    href="https://github.com/arvados/bh20-seq-resource/graphs/contributors">bioinformatics</a> and
+                ontology experts who want to create something agile and useful for the
+                wider research community. The initiative started at the COVID-19
+                biohackathon in April 2020 and is ongoing. The main project drivers
+                are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Andrea Guarracino
+                (University of Rome Tor Vergata), Michael Crusoe (Common Workflow
+                Language), Thomas Liener (consultant, formerly EBI), Erik Garrison
+                (UCSC) and Jerven Bolleman (Swiss Institute of Bioinformatics).
+            </p>
+
+            <p>
+                Notably, as this is a free software initiative, the project represents
+                major work by hundreds of software developers and ontology and data
+                wrangling experts. Thank you everyone!
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-org83093c3" class="outline-2">
+        <h2 id="org83093c3"><span class="section-number-2">3</span> How does the public sequence resource compare to
+            other data resources?</h2>
+        <div class="outline-text-2" id="text-3">
+            <p>
+                The short version is that we use state-of-the-art practices in
+                bioinformatics using agile methods. Unlike the resources from large
+                institutes we can improve things on a dime and anyone can contribute
+                to building out this resource! Sequences from GenBank, EBI/ENA and
+                others are regularly added to PubSeq. We encourage people to everyone
+                to submit on PubSeq because of its superior live tooling and metadata
+                support (see the next question).
+            </p>
+
+            <p>
+                Importantly: all data is published under either the <a
+                    href="https://creativecommons.org/licenses/by/4.0/">Creative Commons
+                4.0 attribution license</a> or the <a
+                    href="https://creativecommons.org/share-your-work/public-domain/cc0/">CC0 “No Rights Reserved”
+                license</a> which
+                means it data can be published and workflows can run in public
+                environments allowing for improved access for research and
+                reproducible results. This contrasts with some other public resources,
+                such as GISAID.
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-org9b31fd4" class="outline-2">
+        <h2 id="org9b31fd4"><span class="section-number-2">4</span> Why should I upload my data here?</h2>
+        <div class="outline-text-2" id="text-4">
+            <ol class="org-ol">
+                <li>We champion truly shareable data without licensing restrictions - with proper
+                    attribution
+                </li>
+                <li>We provide full metadata support using state-of-the-art ontology's</li>
+                <li>We provide a web-based sequence uploader and a command-line version
+                    for bulk uploads
+                </li>
+                <li>We provide a live SPARQL end-point for all metadata</li>
+                <li>We provide free data analysis and sequence comparison triggered on data upload</li>
+                <li>We do real work for you, with this <a
+                        href="https://workbench.lugli.arvadosapi.com/container_requests/lugli-xvhdp-bhhk4nxx1lch5od">link</a>
+                    you can see the last
+                    run took 5.5 hours!
+                </li>
+                <li>We provide free downloads of all computed output</li>
+                <li>There is no need to set up pipelines and/or compute clusters</li>
+                <li>All workflows get triggered on uploading a new sequence</li>
+                <li>When someone (you?) improves the software/workflows and everyone benefits</li>
+                <li>Your data gets automatically integrated with the Swiss Institure of
+                    Bioinformatics COVID-19 knowledge base
+                    <a href="https://covid-19-sparql.expasy.org/">https://covid-19-sparql.expasy.org/</a> (Elixir
+                    Switzerland)
+                </li>
+                <li>Your data will be used to develop drug targets</li>
+            </ol>
+
+            <p>
+                Finally, if you upload your data here we have workflows that output
+                formatted data suitable for <a
+                    href="http://covid19.genenetwork.org/blog?id=using-covid-19-pubseq-part6">uploading to EBI
+                resources</a> (and soon
+                others). Uploading your data here get your data ready for upload to
+                multiple resources.
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-org4e92cb5" class="outline-2">
+        <h2 id="org4e92cb5"><span class="section-number-2">5</span> Why should I not upload by data here?</h2>
+        <div class="outline-text-2" id="text-5">
+            <p>
+                Funny question. There are only good reasons to upload your data here
+                and make it available to the widest audience possible.
+            </p>
+
+            <p>
+                In fact, you can upload your data here as well as to other
+                resources. It is your data after all. No one can prevent you from
+                uploading your data to multiple resources.
+            </p>
+
+            <p>
+                We recommend uploading to EBI and NCBI resources using our data
+                conversion tools. It means you only enter data once and make the
+                process smooth. You can also use our command line data uploader
+                for bulk uploads!
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-orgdfe72f6" class="outline-2">
+        <h2 id="orgdfe72f6"><span class="section-number-2">6</span> How does the public sequence resource work?</h2>
+        <div class="outline-text-2" id="text-6">
+            <p>
+                On uploading a sequence with metadata it will automatically be
+                processed and incorporated into the public pangenome with metadata
+                using workflows from the High Performance Open Biology Lab defined
+                <a href="https://github.com/hpobio-lab/viral-analysis/tree/master/cwl/pangenome-generate">here</a>.
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-orgd0c5abb" class="outline-2">
+        <h2 id="orgd0c5abb"><span class="section-number-2">7</span> Who uses the public sequence resource?</h2>
+        <div class="outline-text-2" id="text-7">
+            <p>
+                The Swiss Institute of Bioinformatics has included this data in
+                <a href="https://covid-19-sparql.expasy.org/">https://covid-19-sparql.expasy.org/</a> and made it part
+                of <a href="https://www.uniprot.org/">Uniprot</a>.
+            </p>
+
+            <p>
+                The Pantograph <a href="https://graph-genome.github.io/">viewer</a> uses PubSeq data for their
+                visualisations.
+            </p>
+
+            <p>
+                <a href="https://uthsc.edu">UTHSC</a> (USA), <a href="https://www.esr.cri.nz/">ESR</a> (New Zealand) and
+                <a href="https://www.ornl.gov/news/ornl-fight-against-covid-19">ORNL</a> (USA) use COVID-19 PubSeq data
+                for monitoring, protein prediction and drug development.
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-org56f4a54" class="outline-2">
+        <h2 id="org56f4a54"><span class="section-number-2">8</span> How can I contribute?</h2>
+        <div class="outline-text-2" id="text-8">
+            <p>
+                You can contribute by submitting sequences, updating metadata, submit
+                issues on our issue tracker, and more importantly add functionality.
+                See 'How do I change the source code' below. Read through our online
+                documentation at <a href="http://covid19.genenetwork.org/blog">http://covid19.genenetwork.org/blog</a>
+                as a starting
+                point.
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-org2240ef7" class="outline-2">
+        <h2 id="org2240ef7"><span class="section-number-2">9</span> Is this about open data?</h2>
+        <div class="outline-text-2" id="text-9">
+            <p>
+                All data is published under a <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons
+                4.0 attribution license</a>
+                (CC-BY-4.0). You can download the raw and published (GFA/RDF/FASTA)
+                data and store it for further processing.
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-orgbb655e0" class="outline-2">
+        <h2 id="orgbb655e0"><span class="section-number-2">10</span> Is this about free software?</h2>
+        <div class="outline-text-2" id="text-10">
+            <p>
+                Absolutely. Free software allows for fully reproducible pipelines. You
+                can take our workflows and data and run it elsewhere!
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-org4e779f4" class="outline-2">
+        <h2 id="org4e779f4"><span class="section-number-2">11</span> How do I upload raw data?</h2>
+        <div class="outline-text-2" id="text-11">
+            <p>
+                We are preparing raw sequence data pipelines (fastq and BAM). The
+                reason is that we want the best data possible for downstream analysis
+                (including protein prediction and test development). The current
+                approach where people publish final sequences of SARS-CoV-2 is lacking
+                because it hides how this sequence was created. For reasons of
+                reproducible and improved results we want/need to work with the raw
+                sequence reads (both short reads and long reads) and take alternative
+                assembly variations into consideration. This is all work in progress.
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-org83f6b7b" class="outline-2">
+        <h2 id="org83f6b7b"><span class="section-number-2">12</span> How do I change metadata?</h2>
+        <div class="outline-text-2" id="text-12">
+            <p>
+                See the <a href="http://covid19.genenetwork.org/blog">http://covid19.genenetwork.org/blog</a>!
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-org1bc6dab" class="outline-2">
+        <h2 id="org1bc6dab"><span class="section-number-2">13</span> How do I change the work flows?</h2>
+        <div class="outline-text-2" id="text-13">
+            <p>
+                Workflows are on <a href="https://github.com/arvados/bh20-seq-resource/tree/master/workflows">github</a>
+                and can be modified. See also the BLOG
+                <a href="http://covid19.genenetwork.org/blog">http://covid19.genenetwork.org/blog</a> on workflows.
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-org1140d62" class="outline-2">
+        <h2 id="org1140d62"><span class="section-number-2">14</span> How do I change the source code?</h2>
+        <div class="outline-text-2" id="text-14">
+            <p>
+                Go to our <a href="https://github.com/arvados/bh20-seq-resource">source code repositories</a>,
+                fork/clone the repository, change
+                something and submit a <a href="https://github.com/arvados/bh20-seq-resource/pulls">pull request</a>
+                (PR). That easy! Check out how
+                many PRs we already merged.
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-orge182714" class="outline-2">
+        <h2 id="orge182714"><span class="section-number-2">15</span> Should I choose CC-BY or CC0?</h2>
+        <div class="outline-text-2" id="text-15">
+            <p>
+                Restrictive data licenses are hampering data sharing and reproducible
+                research. CC0 is the preferred license because it gives researchers
+                the most freedom. Since we provide metadata there is no reason for
+                others not to honour your work. We also provide CC-BY as an option
+                because we know people like the attribution clause.
+            </p>
+
+            <p>
+                In all honesty: we prefer both data and software to be free.
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-orgf4a692b" class="outline-2">
+        <h2 id="orgf4a692b"><span class="section-number-2">16</span> How do I deal with private data and privacy?</h2>
+        <div class="outline-text-2" id="text-16">
+            <p>
+                A public sequence resource is about public data. Metadata can refer to
+                private data. You can use your own (anonymous) identifiers. We also
+                plan to combine identifiers with clinical data stored securely at
+                <a href="https://redcap-covid19.elixir-luxembourg.org/redcap/">REDCap</a>. See the relevant <a
+                    href="https://github.com/arvados/bh20-seq-resource/issues/21">tracker</a> for more information and
+                contributing.
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-org7757574" class="outline-2">
+        <h2 id="org7757574"><span class="section-number-2">17</span> How do I communicate with you?</h2>
+        <div class="outline-text-2" id="text-17">
+            <p>
+                We use a <a
+                    href="https://gitter.im/arvados/pubseq?utm_source=share-link&amp;utm_medium=link&amp;utm_campaign=share-link">gitter
+                channel</a> you can join.
+            </p>
+        </div>
+    </div>
+
+    <div id="outline-container-org194006f" class="outline-2">
+        <h2 id="org194006f"><span class="section-number-2">18</span> Who are the sponsors?</h2>
+        <div class="outline-text-2" id="text-18">
+            <p>
+                The main sponsors are listed in the footer. In addition to the time
+                generously donated by many contributors we also acknowledge Amazon AWS
+                for donating COVID-19 related compute time.
+            </p>
+        </div>
+    </div>
 </div>
 <div id="postamble" class="status">
-<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-07-18 Sat 03:27</small>.
+    <hr>
+    <small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs
+        org-mode and a healthy dose of Lisp!<br/>Modified 2020-07-18 Sat 03:27</small>.
 </div>
 </body>
 </html>
diff --git a/doc/web/about.org b/doc/web/about.org
index 39fb667..29a80bf 100644
--- a/doc/web/about.org
+++ b/doc/web/about.org
@@ -17,7 +17,10 @@
  - [[#how-do-i-change-the-work-flows][How do I change the work flows?]]
  - [[#how-do-i-change-the-source-code][How do I change the source code?]]
  - [[#should-i-choose-cc-by-or-cc0][Should I choose CC-BY or CC0?]]
+ - [[#are-there-also-variant-in-the-RDF-databases]][Are there also variant in the RDF databases?]
  - [[#how-do-i-deal-with-private-data-and-privacy][How do I deal with private data and privacy?]]
+ - [[#do-you-have-any-checks-or-concerns-if-human-sequence-accidentally-submitted-to-your-service-as-part-of-a-fastq][Do you have any checks or concerns if human sequence accidentally submitted to your service as part of a fastq?]
+ - [[#does-PubSeq-support-only-SARS-CoV-2=data]][Does PubSeq support only SARS-CoV-2 data?]
  - [[#how-do-i-communicate-with-you][How do I communicate with you?]]
  - [[#who-are-the-sponsors][Who are the sponsors?]]
 
@@ -28,6 +31,8 @@ resource for COVID-19 research.  The focus is on providing the best
 possible sequence data with associated metadata that can be used for
 sequence comparison and protein prediction.
 
+We were at the *Bioinformatics Community Conference 2020*! Have a look at the [[https://bcc2020.sched.com/event/coLw]][video talk] ([[https://drive.google.com/file/d/1skXHwVKM_gl73-_4giYIOQ1IlC5X5uBo/view?usp=sharing]][alternative link]) and the [[https://drive.google.com/file/d/1vyEgfvSqhM9yIwWZ6Iys-QxhxtVxPSdp/view?usp=sharing]][poster].
+
 * Who created the public sequence resource?
 
 The *public sequence resource* is an initiative by [[https://github.com/arvados/bh20-seq-resource/graphs/contributors][bioinformatics]] and
@@ -171,6 +176,12 @@ because we know people like the attribution clause.
 
 In all honesty: we prefer both data and software to be free.
 
+* Are there also variant in the RDF databases? *
+
+We do output a RDF file with the pangenome built in, and you can parse it because it has variants implicitly.
+
+We are also writing tools to generate VCF files directly from the pangenome.
+
 * How do I deal with private data and privacy?
 
 A public sequence resource is about public data. Metadata can refer to
@@ -178,6 +189,15 @@ private data. You can use your own (anonymous) identifiers.  We also
 plan to combine identifiers with clinical data stored securely at
 [[https://redcap-covid19.elixir-luxembourg.org/redcap/][REDCap]]. See the relevant [[https://github.com/arvados/bh20-seq-resource/issues/21][tracker]] for more information and contributing.
 
+* Do you have any checks or concerns if human sequence accidentally submitted to your service as part of a fastq? *
+
+We are planning to remove reads that match the human reference.
+
+* Does PubSeq support only SARS-CoV-2 data? *
+
+To date, PubSeq is a resource specific to SARS-CoV-2, but we are designing it to be able to support other species in the future.
+
+
 * How do I communicate with you?
 
 We use a [[https://gitter.im/arvados/pubseq?utm_source=share-link&utm_medium=link&utm_campaign=share-link][gitter channel]] you can join.
diff --git a/image/homepage.png b/image/homepage.png
new file mode 100644
index 0000000..f66f9fd
--- /dev/null
+++ b/image/homepage.png
Binary files differdiff --git a/image/website.png b/image/website.png
deleted file mode 100644
index fa57ca5..0000000
--- a/image/website.png
+++ /dev/null
Binary files differdiff --git a/workflows/pangenome-generate/odgi-build-from-spoa-gfa.cwl b/workflows/pangenome-generate/odgi-build-from-spoa-gfa.cwl
new file mode 100644
index 0000000..2459ce7
--- /dev/null
+++ b/workflows/pangenome-generate/odgi-build-from-spoa-gfa.cwl
@@ -0,0 +1,29 @@
+cwlVersion: v1.1
+class: CommandLineTool
+inputs:
+  inputGFA: File
+outputs:
+  odgiGraph:
+    type: File
+    outputBinding:
+      glob: $(inputs.inputGFA.nameroot).unchop.sorted.odgi
+requirements:
+  InlineJavascriptRequirement: {}
+  ShellCommandRequirement: {}
+hints:
+  DockerRequirement:
+    dockerPull: "quay.io/biocontainers/odgi:v0.3--py37h8b12597_0"
+  ResourceRequirement:
+    coresMin: 4
+    ramMin: $(7 * 1024)
+    outdirMin: $(Math.ceil((inputs.inputGFA.size/(1024*1024*1024)+1) * 2))
+  InitialWorkDirRequirement:
+    listing:
+      - entry: $(inputs.inputGFA)
+        writable: true
+arguments: [odgi, build, -g, $(inputs.inputGFA), -o, -,
+            {shellQuote: false, valueFrom: "|"},
+            odgi, unchop, -i, -, -o, -,
+            {shellQuote: false, valueFrom: "|"},
+            odgi, sort, -i, -, -p, s, -o, $(inputs.inputGFA.nameroot).unchop.sorted.odgi
+           ]
diff --git a/workflows/pangenome-generate/pangenome-generate_spoa.cwl b/workflows/pangenome-generate/pangenome-generate_spoa.cwl
new file mode 100644
index 0000000..958ffb6
--- /dev/null
+++ b/workflows/pangenome-generate/pangenome-generate_spoa.cwl
@@ -0,0 +1,122 @@
+#!/usr/bin/env cwl-runner
+cwlVersion: v1.1
+class: Workflow
+requirements:
+  ScatterFeatureRequirement: {}
+  StepInputExpressionRequirement: {}
+inputs:
+  inputReads: File[]
+  metadata: File[]
+  metadataSchema: File
+  subjects: string[]
+  exclude: File?
+  bin_widths:
+    type: int[]
+    default: [ 1, 4, 16, 64, 256, 1000, 4000, 16000]
+    doc: width of each bin in basepairs along the graph vector
+  cells_per_file:
+    type: int
+    default: 100
+    doc: Cells per file on component_segmentation
+outputs:
+  odgiGraph:
+    type: File
+    outputSource: buildGraph/odgiGraph
+  odgiPNG:
+    type: File
+    outputSource: vizGraph/graph_image
+  spoaGFA:
+    type: File
+    outputSource: induceGraph/spoaGFA
+  odgiRDF:
+    type: File
+    outputSource: odgi2rdf/rdf
+  readsMergeDedup:
+    type: File
+    outputSource: dedup/reads_dedup
+  mergedMetadata:
+    type: File
+    outputSource: mergeMetadata/merged
+  indexed_paths:
+    type: File
+    outputSource: index_paths/indexed_paths
+  colinear_components:
+    type: Directory
+    outputSource: segment_components/colinear_components
+steps:
+  relabel:
+    in:
+      readsFA: inputReads
+      subjects: subjects
+      exclude: exclude
+    out: [relabeledSeqs, originalLabels]
+    run: relabel-seqs.cwl
+  dedup:
+    in: {reads: relabel/relabeledSeqs}
+    out: [reads_dedup, dups]
+    run: ../tools/seqkit/seqkit_rmdup.cwl
+  sort_by_quality_and_len:
+    in: {reads: dedup/reads_dedup}
+    out: [reads_sorted_by_quality_and_len]
+    run: sort_fasta_by_quality_and_len.cwl
+  induceGraph:
+    in:
+      readsFA: sort_by_quality_and_len/reads_sorted_by_quality_and_len
+    out: [spoaGFA]
+    run: spoa.cwl
+  buildGraph:
+    in: {inputGFA: induceGraph/spoaGFA}
+    out: [odgiGraph]
+    run: odgi-build-from-spoa-gfa.cwl
+  vizGraph:
+    in:
+      sparse_graph_index: buildGraph/odgiGraph
+      width:
+        default: 50000
+      height:
+        default: 500
+      path_per_row:
+        default: true
+      path_height:
+        default: 4
+    out: [graph_image]
+    run: ../tools/odgi/odgi_viz.cwl
+  odgi2rdf:
+    in: {odgi: buildGraph/odgiGraph}
+    out: [rdf]
+    run: odgi_to_rdf.cwl
+  mergeMetadata:
+    in:
+      metadata: metadata
+      metadataSchema: metadataSchema
+      subjects: subjects
+      dups: dedup/dups
+      originalLabels: relabel/originalLabels
+    out: [merged]
+    run: merge-metadata.cwl
+  bin_paths:
+    run: ../tools/odgi/odgi_bin.cwl
+    in:
+      sparse_graph_index: buildGraph/odgiGraph
+      bin_width: bin_widths
+    scatter: bin_width
+    out: [ bins, pangenome_sequence ]
+  index_paths:
+    label: Create path index
+    run: ../tools/odgi/odgi_pathindex.cwl
+    in:
+      sparse_graph_index: buildGraph/odgiGraph
+    out: [ indexed_paths ]
+  segment_components:
+    label: Run component segmentation
+    run: ../tools/graph-genome-segmentation/component_segmentation.cwl
+    in:
+      bins: bin_paths/bins
+      cells_per_file: cells_per_file
+      pangenome_sequence:
+        source: bin_paths/pangenome_sequence
+        valueFrom: $(self[0])
+        # the bin_paths step is scattered over the bin_width array, but always using the same sparse_graph_index
+        # the pangenome_sequence that is extracted is exactly the same for the same sparse_graph_index
+        # regardless of bin_width, so we take the first pangenome_sequence as input for this step
+    out: [ colinear_components ]
diff --git a/workflows/pangenome-generate/sort_fasta_by_quality_and_len.cwl b/workflows/pangenome-generate/sort_fasta_by_quality_and_len.cwl
new file mode 100644
index 0000000..59f027e
--- /dev/null
+++ b/workflows/pangenome-generate/sort_fasta_by_quality_and_len.cwl
@@ -0,0 +1,18 @@
+cwlVersion: v1.1
+class: CommandLineTool
+inputs:
+  readsFA:
+    type: File
+    inputBinding: {position: 2}
+  script:
+    type: File
+    inputBinding: {position: 1}
+    default: {class: File, location: sort_fasta_by_quality_and_len.py}
+stdout: $(inputs.readsFA.nameroot).sorted_by_quality_and_len.fasta
+outputs:
+  sortedReadsFA:
+    type: stdout
+requirements:
+  InlineJavascriptRequirement: {}
+  ShellCommandRequirement: {}
+baseCommand: [python]
diff --git a/workflows/pangenome-generate/sort_fasta_by_quality_and_len.py b/workflows/pangenome-generate/sort_fasta_by_quality_and_len.py
new file mode 100644
index 0000000..e48fd68
--- /dev/null
+++ b/workflows/pangenome-generate/sort_fasta_by_quality_and_len.py
@@ -0,0 +1,35 @@
+#!/usr/bin/env python3
+
+# Sort the sequences by quality (percentage of number of N bases not called, descending) and by length (descending).
+# The best sequence is the longest one, with no uncalled bases.
+
+import os
+import sys
+import gzip
+
+def open_gzipsafe(path_file):
+    if path_file.endswith('.gz'):
+    	return gzip.open(path_file, 'rt')
+    else:
+        return open(path_file)
+
+path_fasta = sys.argv[1]
+
+header_to_seq_dict = {}
+header_percCalledBases_seqLength_list = []
+
+with open_gzipsafe(path_fasta) as f:
+    for fasta in f.read().strip('\n>').split('>'):
+        header = fasta.strip('\n').split('\n')[0]
+
+        header_to_seq_dict[
+            header
+        ] = ''.join(fasta.strip('\n').split('\n')[1:])
+
+        seq_len = len(header_to_seq_dict[header])
+        header_percCalledBases_seqLength_list.append([
+            header, header_to_seq_dict[header].count('N'), (seq_len - header_to_seq_dict[header].count('N'))/seq_len, seq_len
+        ])
+
+for header, x, percCalledBases, seqLength_list in sorted(header_percCalledBases_seqLength_list, key=lambda x: (x[-2], x[-1]), reverse = True):
+    sys.stdout.write('>{}\n{}\n'.format(header, header_to_seq_dict[header]))
diff --git a/workflows/pangenome-generate/spoa.cwl b/workflows/pangenome-generate/spoa.cwl
new file mode 100644
index 0000000..1e390d8
--- /dev/null
+++ b/workflows/pangenome-generate/spoa.cwl
@@ -0,0 +1,27 @@
+cwlVersion: v1.1
+class: CommandLineTool
+inputs:
+  readsFA: File
+stdout: $(inputs.readsFA.nameroot).g6.gfa
+script:
+    type: File
+    default: {class: File, location: relabel-seqs.py}
+outputs:
+  spoaGFA:
+    type: stdout
+requirements:
+  InlineJavascriptRequirement: {}
+  ShellCommandRequirement: {}
+hints:
+  DockerRequirement:
+    dockerPull: "quay.io/biocontainers/spoa:3.0.2--hc9558a2_0"
+  ResourceRequirement:
+    coresMin: 1
+    ramMin: $(15 * 1024)
+    outdirMin: $(Math.ceil(inputs.readsFA.size/(1024*1024*1024) + 20))
+baseCommand: spoa
+arguments: [
+    $(inputs.readsFA),
+    -G,
+    -g, '-6'
+]
author	Peter Amstutz	2020-08-05 16:06:11 -0400
committer	GitHub	2020-08-05 16:06:11 -0400
commit	fdb1b012fc04ee07f401541e181e28fe442c9454 (patch)
tree	8486db1087692dffcea9d93814e436d9cf150b47
parent	86f31ef60f65a820bf9ac25c3fc01c88f2a9ebfe (diff)
parent	2d20bf90497588a297ca98a78ee0fbbcadf95569 (diff)
download	bh20-seq-resource-fdb1b012fc04ee07f401541e181e28fe442c9454.tar.gz bh20-seq-resource-fdb1b012fc04ee07f401541e181e28fe442c9454.tar.lz bh20-seq-resource-fdb1b012fc04ee07f401541e181e28fe442c9454.zip