diff options
author | Peter Amstutz | 2020-08-05 16:06:11 -0400 |
---|---|---|
committer | GitHub | 2020-08-05 16:06:11 -0400 |
commit | fdb1b012fc04ee07f401541e181e28fe442c9454 (patch) | |
tree | 8486db1087692dffcea9d93814e436d9cf150b47 | |
parent | 86f31ef60f65a820bf9ac25c3fc01c88f2a9ebfe (diff) | |
parent | 2d20bf90497588a297ca98a78ee0fbbcadf95569 (diff) | |
download | bh20-seq-resource-fdb1b012fc04ee07f401541e181e28fe442c9454.tar.gz bh20-seq-resource-fdb1b012fc04ee07f401541e181e28fe442c9454.tar.lz bh20-seq-resource-fdb1b012fc04ee07f401541e181e28fe442c9454.zip |
Merge pull request #99 from AndreaGuarracino/patch-2
several fixes in the website, added links to video talk and poster, new pangenome generation workflow
19 files changed, 1219 insertions, 598 deletions
@@ -9,7 +9,7 @@ web interface. You can use it to upload the genomes of SARS-CoV-2 samples to make them publicly and freely available to other researchers. For more information see the [paper](./paper/paper.md). -![alt text](./image/website.png "Website") +![alt text](./image/homepage.png "Website") To get started, first [install the uploader](#installation), and use the `bh20-seq-uploader` command to [upload your data](#usage). diff --git a/bh20simplewebuploader/static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.pdf b/bh20simplewebuploader/static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.pdf Binary files differnew file mode 100644 index 0000000..7da8cd6 --- /dev/null +++ b/bh20simplewebuploader/static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.pdf diff --git a/bh20simplewebuploader/static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.png b/bh20simplewebuploader/static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.png Binary files differnew file mode 100644 index 0000000..eae2721 --- /dev/null +++ b/bh20simplewebuploader/static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.png diff --git a/bh20simplewebuploader/static/main.css b/bh20simplewebuploader/static/main.css index bdcc0bc..7c33d9c 100644 --- a/bh20simplewebuploader/static/main.css +++ b/bh20simplewebuploader/static/main.css @@ -177,7 +177,7 @@ span.dropt:hover {text-decoration: none; background: #ffffff; z-index: 6; } .about { display: grid; - grid-template-columns: 1fr 1fr; + grid-template-columns: 1fr 1fr 1fr; grid-auto-flow: row; } diff --git a/bh20simplewebuploader/templates/blurb.html b/bh20simplewebuploader/templates/blurb.html index 9eef7c2..067cc3b 100644 --- a/bh20simplewebuploader/templates/blurb.html +++ b/bh20simplewebuploader/templates/blurb.html @@ -2,12 +2,12 @@ This is the COVID-19 Public Sequence Resource (COVID-19 PubSeq) for SARS-CoV-2 virus sequences. COVID-19 PubSeq is a repository for sequences with a low barrier to entry for uploading sequence data - using best practices, including <a href="https://en.wikipedia.org/wiki/FAIR_data">FAIR data</a>. I.e., data published with a creative commons - CC0 or CC-4.0 license with metadata using state-of-the art standards + using best practices, including <a href="https://en.wikipedia.org/wiki/FAIR_data">FAIR data</a>. Data are published with + metadata using state-of-the art standards and, perhaps most importantly, providing standardised workflows that get triggered on upload, so that results are immediately available in standardised data formats. - + Your uploaded sequence will automatically be processed and incorporated into the public pangenome with metadata using worklows from the High Performance Open Biology Lab diff --git a/bh20simplewebuploader/templates/footer.html b/bh20simplewebuploader/templates/footer.html index 26ea82a..abf46c3 100644 --- a/bh20simplewebuploader/templates/footer.html +++ b/bh20simplewebuploader/templates/footer.html @@ -15,6 +15,11 @@ </p> </div> + <div> + <a href="static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.pdf"> + <img src=static/image/BCC2020_AndreaGuarracino_COVID19PubSeq_Poster.png" alt="BCC2020 Andrea Guarracino COVID19 PubSeq Poster"/> + </a> + </div> <div class="sponsors"> <div class="sponsorimg"> <a href="https://github.com/virtual-biohackathons/covid-19-bh20"> diff --git a/doc/blog/using-covid-19-pubseq-part2.html b/doc/blog/using-covid-19-pubseq-part2.html index c047441..c041ebe 100644 --- a/doc/blog/using-covid-19-pubseq-part2.html +++ b/doc/blog/using-covid-19-pubseq-part2.html @@ -259,39 +259,12 @@ for the JavaScript code in this tag. </ul> </div> </div> -<p> -As part of the COVID-19 Biohackathon 2020 we formed a working group to -create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for -Corona virus sequences. The general idea is to create a repository -that has a low barrier to entry for uploading sequence data using best -practices. I.e., data published with a creative commons 4.0 (CC-4.0) -license with metadata using state-of-the art standards and, perhaps -most importantly, providing standardised workflows that get triggered -on upload, so that results are immediately available in standardised -data formats. -</p> <div id="outline-container-org7942167" class="outline-2"> <h2 id="org7942167"><span class="section-number-2">1</span> Finding output of workflows</h2> <div class="outline-text-2" id="text-1"> -<p> -As part of the COVID-19 Biohackathon 2020 we formed a working group to -create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for -Corona virus sequences. The general idea is to create a repository -that has a low barrier to entry for uploading sequence data using best -practices. I.e., data published with a creative commons 4.0 (CC-4.0) -license with metadata using state-of-the art standards and, perhaps -most importantly, providing standardised workflows that get triggered -on upload, so that results are immediately available in standardised -data formats. -</p> -</div> -</div> -<div id="outline-container-org0022bbe" class="outline-2"> -<h2 id="org0022bbe"><span class="section-number-2">2</span> Introduction</h2> -<div class="outline-text-2" id="text-2"> -<p> + <p> We are using Arvados to run common workflow language (CWL) pipelines. The most recent output is on display on a <a href="https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca">web page</a> (with time stamp) and a full list is generated <a href="https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/">here</a>. It is nice to start up, but for @@ -302,7 +275,7 @@ want to wade through thousands of output files! </div> <div id="outline-container-org3929710" class="outline-2"> -<h2 id="org3929710"><span class="section-number-2">3</span> The Arvados file interface</h2> +<h2 id="org3929710"><span class="section-number-2">2</span> The Arvados file interface</h2> <div class="outline-text-2" id="text-3"> <p> Arvados has the web server, but it also has a REST API and associated @@ -384,7 +357,7 @@ arv-get 2be6af7b4741f2a5c5f8ff2bc6152d73+1955623+Ab9ad65d7fe958a053b3a57d545839d </div> <div id="outline-container-orgc4dba6e" class="outline-2"> -<h2 id="orgc4dba6e"><span class="section-number-2">4</span> Using the Arvados API</h2> +<h2 id="orgc4dba6e"><span class="section-number-2">3</span> TODO Using the Arvados API</h2> </div> </div> <div id="postamble" class="status"> diff --git a/doc/blog/using-covid-19-pubseq-part2.org b/doc/blog/using-covid-19-pubseq-part2.org index d2a1cbc..349fd06 100644 --- a/doc/blog/using-covid-19-pubseq-part2.org +++ b/doc/blog/using-covid-19-pubseq-part2.org @@ -8,36 +8,13 @@ #+HTML_LINK_HOME: http://covid19.genenetwork.org #+HTML_HEAD: <link rel="Blog stylesheet" type="text/css" href="blog.css" /> -As part of the COVID-19 Biohackathon 2020 we formed a working group to -create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for -Corona virus sequences. The general idea is to create a repository -that has a low barrier to entry for uploading sequence data using best -practices. I.e., data published with a creative commons 4.0 (CC-4.0) -license with metadata using state-of-the art standards and, perhaps -most importantly, providing standardised workflows that get triggered -on upload, so that results are immediately available in standardised -data formats. - * Table of Contents :TOC:noexport: - [[#finding-output-of-workflows][Finding output of workflows]] - - [[#introduction][Introduction]] - [[#the-arvados-file-interface][The Arvados file interface]] - [[#using-the-arvados-api][Using the Arvados API]] * Finding output of workflows -As part of the COVID-19 Biohackathon 2020 we formed a working group to -create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for -Corona virus sequences. The general idea is to create a repository -that has a low barrier to entry for uploading sequence data using best -practices. I.e., data published with a creative commons 4.0 (CC-4.0) -license with metadata using state-of-the art standards and, perhaps -most importantly, providing standardised workflows that get triggered -on upload, so that results are immediately available in standardised -data formats. - -* Introduction - We are using Arvados to run common workflow language (CWL) pipelines. The most recent output is on display on a [[https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca][web page]] (with time stamp) and a full list is generated [[https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/][here]]. It is nice to start up, but for @@ -81,4 +58,4 @@ its listed UUID: : arv-get 2be6af7b4741f2a5c5f8ff2bc6152d73+1955623+Ab9ad65d7fe958a053b3a57d545839de18290843a@5ed7f3c5 -* Using the Arvados API +* TODO Using the Arvados API diff --git a/doc/blog/using-covid-19-pubseq-part3.html b/doc/blog/using-covid-19-pubseq-part3.html index 91879b0..df4a286 100644 --- a/doc/blog/using-covid-19-pubseq-part3.html +++ b/doc/blog/using-covid-19-pubseq-part3.html @@ -625,7 +625,7 @@ The web interface using this exact same script so it should just work <h3 id="org39adf09"><span class="section-number-3">6.2</span> Example: uploading bulk GenBank sequences</h3> <div class="outline-text-3" id="text-6-2"> <p> -We also use above script to bulk upload GenBank sequences with a <a href="https://github.com/arvados/bh20-seq-resource/blob/master/scripts/from_genbank_to_fasta_and_yaml.py">FASTA +We also use above script to bulk upload GenBank sequences with a <a href="https://github.com/arvados/bh20-seq-resource/blob/master/scripts/download_genbank_data/from_genbank_to_fasta_and_yaml.py">FASTA and YAML</a> extractor specific for GenBank. This means that the steps we took above for uploading a GenBank sequence are already automated. </p> diff --git a/doc/blog/using-covid-19-pubseq-part3.org b/doc/blog/using-covid-19-pubseq-part3.org index 03f37ab..e8fee36 100644 --- a/doc/blog/using-covid-19-pubseq-part3.org +++ b/doc/blog/using-covid-19-pubseq-part3.org @@ -234,6 +234,6 @@ The web interface using this exact same script so it should just work ** Example: uploading bulk GenBank sequences -We also use above script to bulk upload GenBank sequences with a [[https://github.com/arvados/bh20-seq-resource/blob/master/scripts/from_genbank_to_fasta_and_yaml.py][FASTA +We also use above script to bulk upload GenBank sequences with a [[https://github.com/arvados/bh20-seq-resource/blob/master/scripts/download_genbank_data/from_genbank_to_fasta_and_yaml.py][FASTA and YAML]] extractor specific for GenBank. This means that the steps we took above for uploading a GenBank sequence are already automated. diff --git a/doc/web/about.html b/doc/web/about.html index dfd4252..c971a4e 100644 --- a/doc/web/about.html +++ b/doc/web/about.html @@ -1,549 +1,964 @@ <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" -"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> -<!-- 2020-07-18 Sat 03:27 --> -<meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> -<meta name="viewport" content="width=device-width, initial-scale=1" /> -<title>About/FAQ</title> -<meta name="generator" content="Org mode" /> -<meta name="author" content="Pjotr Prins" /> -<style type="text/css"> - <!--/*--><![CDATA[/*><!--*/ - .title { text-align: center; - margin-bottom: .2em; } - .subtitle { text-align: center; - font-size: medium; - font-weight: bold; - margin-top:0; } - .todo { font-family: monospace; color: red; } - .done { font-family: monospace; color: green; } - .priority { font-family: monospace; color: orange; } - .tag { background-color: #eee; font-family: monospace; - padding: 2px; font-size: 80%; font-weight: normal; } - .timestamp { color: #bebebe; } - .timestamp-kwd { color: #5f9ea0; } - .org-right { margin-left: auto; margin-right: 0px; text-align: right; } - .org-left { margin-left: 0px; margin-right: auto; text-align: left; } - .org-center { margin-left: auto; margin-right: auto; text-align: center; } - .underline { text-decoration: underline; } - #postamble p, #preamble p { font-size: 90%; margin: .2em; } - p.verse { margin-left: 3%; } - pre { - border: 1px solid #ccc; - box-shadow: 3px 3px 3px #eee; - padding: 8pt; - font-family: monospace; - overflow: auto; - margin: 1.2em; - } - pre.src { - position: relative; - overflow: visible; - padding-top: 1.2em; - } - pre.src:before { - display: none; - position: absolute; - background-color: white; - top: -10px; - right: 10px; - padding: 3px; - border: 1px solid black; - } - pre.src:hover:before { display: inline;} - /* Languages per Org manual */ - pre.src-asymptote:before { content: 'Asymptote'; } - pre.src-awk:before { content: 'Awk'; } - pre.src-C:before { content: 'C'; } - /* pre.src-C++ doesn't work in CSS */ - pre.src-clojure:before { content: 'Clojure'; } - pre.src-css:before { content: 'CSS'; } - pre.src-D:before { content: 'D'; } - pre.src-ditaa:before { content: 'ditaa'; } - pre.src-dot:before { content: 'Graphviz'; } - pre.src-calc:before { content: 'Emacs Calc'; } - pre.src-emacs-lisp:before { content: 'Emacs Lisp'; } - pre.src-fortran:before { content: 'Fortran'; } - pre.src-gnuplot:before { content: 'gnuplot'; } - pre.src-haskell:before { content: 'Haskell'; } - pre.src-hledger:before { content: 'hledger'; } - pre.src-java:before { content: 'Java'; } - pre.src-js:before { content: 'Javascript'; } - pre.src-latex:before { content: 'LaTeX'; } - pre.src-ledger:before { content: 'Ledger'; } - pre.src-lisp:before { content: 'Lisp'; } - pre.src-lilypond:before { content: 'Lilypond'; } - pre.src-lua:before { content: 'Lua'; } - pre.src-matlab:before { content: 'MATLAB'; } - pre.src-mscgen:before { content: 'Mscgen'; } - pre.src-ocaml:before { content: 'Objective Caml'; } - pre.src-octave:before { content: 'Octave'; } - pre.src-org:before { content: 'Org mode'; } - pre.src-oz:before { content: 'OZ'; } - pre.src-plantuml:before { content: 'Plantuml'; } - pre.src-processing:before { content: 'Processing.js'; } - pre.src-python:before { content: 'Python'; } - pre.src-R:before { content: 'R'; } - pre.src-ruby:before { content: 'Ruby'; } - pre.src-sass:before { content: 'Sass'; } - pre.src-scheme:before { content: 'Scheme'; } - pre.src-screen:before { content: 'Gnu Screen'; } - pre.src-sed:before { content: 'Sed'; } - pre.src-sh:before { content: 'shell'; } - pre.src-sql:before { content: 'SQL'; } - pre.src-sqlite:before { content: 'SQLite'; } - /* additional languages in org.el's org-babel-load-languages alist */ - pre.src-forth:before { content: 'Forth'; } - pre.src-io:before { content: 'IO'; } - pre.src-J:before { content: 'J'; } - pre.src-makefile:before { content: 'Makefile'; } - pre.src-maxima:before { content: 'Maxima'; } - pre.src-perl:before { content: 'Perl'; } - pre.src-picolisp:before { content: 'Pico Lisp'; } - pre.src-scala:before { content: 'Scala'; } - pre.src-shell:before { content: 'Shell Script'; } - pre.src-ebnf2ps:before { content: 'ebfn2ps'; } - /* additional language identifiers per "defun org-babel-execute" - in ob-*.el */ - pre.src-cpp:before { content: 'C++'; } - pre.src-abc:before { content: 'ABC'; } - pre.src-coq:before { content: 'Coq'; } - pre.src-groovy:before { content: 'Groovy'; } - /* additional language identifiers from org-babel-shell-names in - ob-shell.el: ob-shell is the only babel language using a lambda to put - the execution function name together. */ - pre.src-bash:before { content: 'bash'; } - pre.src-csh:before { content: 'csh'; } - pre.src-ash:before { content: 'ash'; } - pre.src-dash:before { content: 'dash'; } - pre.src-ksh:before { content: 'ksh'; } - pre.src-mksh:before { content: 'mksh'; } - pre.src-posh:before { content: 'posh'; } - /* Additional Emacs modes also supported by the LaTeX listings package */ - pre.src-ada:before { content: 'Ada'; } - pre.src-asm:before { content: 'Assembler'; } - pre.src-caml:before { content: 'Caml'; } - pre.src-delphi:before { content: 'Delphi'; } - pre.src-html:before { content: 'HTML'; } - pre.src-idl:before { content: 'IDL'; } - pre.src-mercury:before { content: 'Mercury'; } - pre.src-metapost:before { content: 'MetaPost'; } - pre.src-modula-2:before { content: 'Modula-2'; } - pre.src-pascal:before { content: 'Pascal'; } - pre.src-ps:before { content: 'PostScript'; } - pre.src-prolog:before { content: 'Prolog'; } - pre.src-simula:before { content: 'Simula'; } - pre.src-tcl:before { content: 'tcl'; } - pre.src-tex:before { content: 'TeX'; } - pre.src-plain-tex:before { content: 'Plain TeX'; } - pre.src-verilog:before { content: 'Verilog'; } - pre.src-vhdl:before { content: 'VHDL'; } - pre.src-xml:before { content: 'XML'; } - pre.src-nxml:before { content: 'XML'; } - /* add a generic configuration mode; LaTeX export needs an additional - (add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */ - pre.src-conf:before { content: 'Configuration File'; } - - table { border-collapse:collapse; } - caption.t-above { caption-side: top; } - caption.t-bottom { caption-side: bottom; } - td, th { vertical-align:top; } - th.org-right { text-align: center; } - th.org-left { text-align: center; } - th.org-center { text-align: center; } - td.org-right { text-align: right; } - td.org-left { text-align: left; } - td.org-center { text-align: center; } - dt { font-weight: bold; } - .footpara { display: inline; } - .footdef { margin-bottom: 1em; } - .figure { padding: 1em; } - .figure p { text-align: center; } - .equation-container { - display: table; - text-align: center; - width: 100%; - } - .equation { - vertical-align: middle; - } - .equation-label { - display: table-cell; - text-align: right; - vertical-align: middle; - } - .inlinetask { - padding: 10px; - border: 2px solid gray; - margin: 10px; - background: #ffffcc; - } - #org-div-home-and-up - { text-align: right; font-size: 70%; white-space: nowrap; } - textarea { overflow-x: auto; } - .linenr { font-size: smaller } - .code-highlighted { background-color: #ffff00; } - .org-info-js_info-navigation { border-style: none; } - #org-info-js_console-label - { font-size: 10px; font-weight: bold; white-space: nowrap; } - .org-info-js_search-highlight - { background-color: #ffff00; color: #000000; font-weight: bold; } - .org-svg { width: 90%; } - /*]]>*/--> -</style> -<script type="text/javascript"> -/* -@licstart The following is the entire license notice for the -JavaScript code in this tag. - -Copyright (C) 2012-2020 Free Software Foundation, Inc. - -The JavaScript code in this tag is free software: you can -redistribute it and/or modify it under the terms of the GNU -General Public License (GNU GPL) as published by the Free Software -Foundation, either version 3 of the License, or (at your option) -any later version. The code is distributed WITHOUT ANY WARRANTY; -without even the implied warranty of MERCHANTABILITY or FITNESS -FOR A PARTICULAR PURPOSE. See the GNU GPL for more details. - -As additional permission under GNU GPL version 3 section 7, you -may distribute non-source (e.g., minimized or compacted) forms of -that code without the copy of the GNU GPL normally required by -section 4, provided you include this license notice and a URL -through which recipients can access the Corresponding Source. - - -@licend The above is the entire license notice -for the JavaScript code in this tag. -*/ -<!--/*--><![CDATA[/*><!--*/ - function CodeHighlightOn(elem, id) - { - var target = document.getElementById(id); - if(null != target) { - elem.cacheClassElem = elem.className; - elem.cacheClassTarget = target.className; - target.className = "code-highlighted"; - elem.className = "code-highlighted"; - } - } - function CodeHighlightOff(elem, id) - { - var target = document.getElementById(id); - if(elem.cacheClassElem) - elem.className = elem.cacheClassElem; - if(elem.cacheClassTarget) - target.className = elem.cacheClassTarget; - } -/*]]>*///--> -</script> + <!-- 2020-07-18 Sat 03:27 --> + <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> + <meta name="viewport" content="width=device-width, initial-scale=1"/> + <title>About/FAQ</title> + <meta name="generator" content="Org mode"/> + <meta name="author" content="Pjotr Prins"/> + <style type="text/css"> + <!-- /*--><![CDATA[/*><!--*/ + .title { + text-align: center; + margin-bottom: .2em; + } + + .subtitle { + text-align: center; + font-size: medium; + font-weight: bold; + margin-top: 0; + } + + .todo { + font-family: monospace; + color: red; + } + + .done { + font-family: monospace; + color: green; + } + + .priority { + font-family: monospace; + color: orange; + } + + .tag { + background-color: #eee; + font-family: monospace; + padding: 2px; + font-size: 80%; + font-weight: normal; + } + + .timestamp { + color: #bebebe; + } + + .timestamp-kwd { + color: #5f9ea0; + } + + .org-right { + margin-left: auto; + margin-right: 0px; + text-align: right; + } + + .org-left { + margin-left: 0px; + margin-right: auto; + text-align: left; + } + + .org-center { + margin-left: auto; + margin-right: auto; + text-align: center; + } + + .underline { + text-decoration: underline; + } + + #postamble p, #preamble p { + font-size: 90%; + margin: .2em; + } + + p.verse { + margin-left: 3%; + } + + pre { + border: 1px solid #ccc; + box-shadow: 3px 3px 3px #eee; + padding: 8pt; + font-family: monospace; + overflow: auto; + margin: 1.2em; + } + + pre.src { + position: relative; + overflow: visible; + padding-top: 1.2em; + } + + pre.src:before { + display: none; + position: absolute; + background-color: white; + top: -10px; + right: 10px; + padding: 3px; + border: 1px solid black; + } + + pre.src:hover:before { + display: inline; + } + + /* Languages per Org manual */ + pre.src-asymptote:before { + content: 'Asymptote'; + } + + pre.src-awk:before { + content: 'Awk'; + } + + pre.src-C:before { + content: 'C'; + } + + /* pre.src-C++ doesn't work in CSS */ + pre.src-clojure:before { + content: 'Clojure'; + } + + pre.src-css:before { + content: 'CSS'; + } + + pre.src-D:before { + content: 'D'; + } + + pre.src-ditaa:before { + content: 'ditaa'; + } + + pre.src-dot:before { + content: 'Graphviz'; + } + + pre.src-calc:before { + content: 'Emacs Calc'; + } + + pre.src-emacs-lisp:before { + content: 'Emacs Lisp'; + } + + pre.src-fortran:before { + content: 'Fortran'; + } + + pre.src-gnuplot:before { + content: 'gnuplot'; + } + + pre.src-haskell:before { + content: 'Haskell'; + } + + pre.src-hledger:before { + content: 'hledger'; + } + + pre.src-java:before { + content: 'Java'; + } + + pre.src-js:before { + content: 'Javascript'; + } + + pre.src-latex:before { + content: 'LaTeX'; + } + + pre.src-ledger:before { + content: 'Ledger'; + } + + pre.src-lisp:before { + content: 'Lisp'; + } + + pre.src-lilypond:before { + content: 'Lilypond'; + } + + pre.src-lua:before { + content: 'Lua'; + } + + pre.src-matlab:before { + content: 'MATLAB'; + } + + pre.src-mscgen:before { + content: 'Mscgen'; + } + + pre.src-ocaml:before { + content: 'Objective Caml'; + } + + pre.src-octave:before { + content: 'Octave'; + } + + pre.src-org:before { + content: 'Org mode'; + } + + pre.src-oz:before { + content: 'OZ'; + } + + pre.src-plantuml:before { + content: 'Plantuml'; + } + + pre.src-processing:before { + content: 'Processing.js'; + } + + pre.src-python:before { + content: 'Python'; + } + + pre.src-R:before { + content: 'R'; + } + + pre.src-ruby:before { + content: 'Ruby'; + } + + pre.src-sass:before { + content: 'Sass'; + } + + pre.src-scheme:before { + content: 'Scheme'; + } + + pre.src-screen:before { + content: 'Gnu Screen'; + } + + pre.src-sed:before { + content: 'Sed'; + } + + pre.src-sh:before { + content: 'shell'; + } + + pre.src-sql:before { + content: 'SQL'; + } + + pre.src-sqlite:before { + content: 'SQLite'; + } + + /* additional languages in org.el's org-babel-load-languages alist */ + pre.src-forth:before { + content: 'Forth'; + } + + pre.src-io:before { + content: 'IO'; + } + + pre.src-J:before { + content: 'J'; + } + + pre.src-makefile:before { + content: 'Makefile'; + } + + pre.src-maxima:before { + content: 'Maxima'; + } + + pre.src-perl:before { + content: 'Perl'; + } + + pre.src-picolisp:before { + content: 'Pico Lisp'; + } + + pre.src-scala:before { + content: 'Scala'; + } + + pre.src-shell:before { + content: 'Shell Script'; + } + + pre.src-ebnf2ps:before { + content: 'ebfn2ps'; + } + + /* additional language identifiers per "defun org-babel-execute" + in ob-*.el */ + pre.src-cpp:before { + content: 'C++'; + } + + pre.src-abc:before { + content: 'ABC'; + } + + pre.src-coq:before { + content: 'Coq'; + } + + pre.src-groovy:before { + content: 'Groovy'; + } + + /* additional language identifiers from org-babel-shell-names in + ob-shell.el: ob-shell is the only babel language using a lambda to put + the execution function name together. */ + pre.src-bash:before { + content: 'bash'; + } + + pre.src-csh:before { + content: 'csh'; + } + + pre.src-ash:before { + content: 'ash'; + } + + pre.src-dash:before { + content: 'dash'; + } + + pre.src-ksh:before { + content: 'ksh'; + } + + pre.src-mksh:before { + content: 'mksh'; + } + + pre.src-posh:before { + content: 'posh'; + } + + /* Additional Emacs modes also supported by the LaTeX listings package */ + pre.src-ada:before { + content: 'Ada'; + } + + pre.src-asm:before { + content: 'Assembler'; + } + + pre.src-caml:before { + content: 'Caml'; + } + + pre.src-delphi:before { + content: 'Delphi'; + } + + pre.src-html:before { + content: 'HTML'; + } + + pre.src-idl:before { + content: 'IDL'; + } + + pre.src-mercury:before { + content: 'Mercury'; + } + + pre.src-metapost:before { + content: 'MetaPost'; + } + + pre.src-modula-2:before { + content: 'Modula-2'; + } + + pre.src-pascal:before { + content: 'Pascal'; + } + + pre.src-ps:before { + content: 'PostScript'; + } + + pre.src-prolog:before { + content: 'Prolog'; + } + + pre.src-simula:before { + content: 'Simula'; + } + + pre.src-tcl:before { + content: 'tcl'; + } + + pre.src-tex:before { + content: 'TeX'; + } + + pre.src-plain-tex:before { + content: 'Plain TeX'; + } + + pre.src-verilog:before { + content: 'Verilog'; + } + + pre.src-vhdl:before { + content: 'VHDL'; + } + + pre.src-xml:before { + content: 'XML'; + } + + pre.src-nxml:before { + content: 'XML'; + } + + /* add a generic configuration mode; LaTeX export needs an additional + (add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */ + pre.src-conf:before { + content: 'Configuration File'; + } + + table { + border-collapse: collapse; + } + + caption.t-above { + caption-side: top; + } + + caption.t-bottom { + caption-side: bottom; + } + + td, th { + vertical-align: top; + } + + th.org-right { + text-align: center; + } + + th.org-left { + text-align: center; + } + + th.org-center { + text-align: center; + } + + td.org-right { + text-align: right; + } + + td.org-left { + text-align: left; + } + + td.org-center { + text-align: center; + } + + dt { + font-weight: bold; + } + + .footpara { + display: inline; + } + + .footdef { + margin-bottom: 1em; + } + + .figure { + padding: 1em; + } + + .figure p { + text-align: center; + } + + .equation-container { + display: table; + text-align: center; + width: 100%; + } + + .equation { + vertical-align: middle; + } + + .equation-label { + display: table-cell; + text-align: right; + vertical-align: middle; + } + + .inlinetask { + padding: 10px; + border: 2px solid gray; + margin: 10px; + background: #ffffcc; + } + + #org-div-home-and-up { + text-align: right; + font-size: 70%; + white-space: nowrap; + } + + textarea { + overflow-x: auto; + } + + .linenr { + font-size: smaller + } + + .code-highlighted { + background-color: #ffff00; + } + + .org-info-js_info-navigation { + border-style: none; + } + + #org-info-js_console-label { + font-size: 10px; + font-weight: bold; + white-space: nowrap; + } + + .org-info-js_search-highlight { + background-color: #ffff00; + color: #000000; + font-weight: bold; + } + + .org-svg { + width: 90%; + } + + /*]]>*/ + --> + </style> + <script type="text/javascript"> + /* + @licstart The following is the entire license notice for the + JavaScript code in this tag. + + Copyright (C) 2012-2020 Free Software Foundation, Inc. + + The JavaScript code in this tag is free software: you can + redistribute it and/or modify it under the terms of the GNU + General Public License (GNU GPL) as published by the Free Software + Foundation, either version 3 of the License, or (at your option) + any later version. The code is distributed WITHOUT ANY WARRANTY; + without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU GPL for more details. + + As additional permission under GNU GPL version 3 section 7, you + may distribute non-source (e.g., minimized or compacted) forms of + that code without the copy of the GNU GPL normally required by + section 4, provided you include this license notice and a URL + through which recipients can access the Corresponding Source. + + + @licend The above is the entire license notice + for the JavaScript code in this tag. + */ + <!--/*--><![CDATA[/*><!--*/ + function CodeHighlightOn(elem, id) { + var target = document.getElementById(id); + if (null != target) { + elem.cacheClassElem = elem.className; + elem.cacheClassTarget = target.className; + target.className = "code-highlighted"; + elem.className = "code-highlighted"; + } + } + + function CodeHighlightOff(elem, id) { + var target = document.getElementById(id); + if (elem.cacheClassElem) + elem.className = elem.cacheClassElem; + if (elem.cacheClassTarget) + target.className = elem.cacheClassTarget; + } + + /*]]>*///--> + </script> </head> <body> <div id="content"> -<h1 class="title">About/FAQ</h1> -<div id="table-of-contents"> -<h2>Table of Contents</h2> -<div id="text-table-of-contents"> -<ul> -<li><a href="#org0db9061">1. What is the 'public sequence resource' about?</a></li> -<li><a href="#org983877d">2. Who created the public sequence resource?</a></li> -<li><a href="#org83093c3">3. How does the public sequence resource compare to other data resources?</a></li> -<li><a href="#org9b31fd4">4. Why should I upload my data here?</a></li> -<li><a href="#org4e92cb5">5. Why should I not upload by data here?</a></li> -<li><a href="#orgdfe72f6">6. How does the public sequence resource work?</a></li> -<li><a href="#orgd0c5abb">7. Who uses the public sequence resource?</a></li> -<li><a href="#org56f4a54">8. How can I contribute?</a></li> -<li><a href="#org2240ef7">9. Is this about open data?</a></li> -<li><a href="#orgbb655e0">10. Is this about free software?</a></li> -<li><a href="#org4e779f4">11. How do I upload raw data?</a></li> -<li><a href="#org83f6b7b">12. How do I change metadata?</a></li> -<li><a href="#org1bc6dab">13. How do I change the work flows?</a></li> -<li><a href="#org1140d62">14. How do I change the source code?</a></li> -<li><a href="#orge182714">15. Should I choose CC-BY or CC0?</a></li> -<li><a href="#orgf4a692b">16. How do I deal with private data and privacy?</a></li> -<li><a href="#org7757574">17. How do I communicate with you?</a></li> -<li><a href="#org194006f">18. Who are the sponsors?</a></li> -</ul> -</div> -</div> - -<div id="outline-container-org0db9061" class="outline-2"> -<h2 id="org0db9061"><span class="section-number-2">1</span> What is the 'public sequence resource' about?</h2> -<div class="outline-text-2" id="text-1"> -<p> -The <b>public sequence resource</b> aims to provide a generic and useful -resource for COVID-19 research. The focus is on providing the best -possible sequence data with associated metadata that can be used for -sequence comparison and protein prediction. -</p> -</div> -</div> - -<div id="outline-container-org983877d" class="outline-2"> -<h2 id="org983877d"><span class="section-number-2">2</span> Who created the public sequence resource?</h2> -<div class="outline-text-2" id="text-2"> -<p> -The <b>public sequence resource</b> is an initiative by <a href="https://github.com/arvados/bh20-seq-resource/graphs/contributors">bioinformatics</a> and -ontology experts who want to create something agile and useful for the -wider research community. The initiative started at the COVID-19 -biohackathon in April 2020 and is ongoing. The main project drivers -are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Andrea Guarracino -(University of Rome Tor Vergata), Michael Crusoe (Common Workflow -Language), Thomas Liener (consultant, formerly EBI), Erik Garrison -(UCSC) and Jerven Bolleman (Swiss Institute of Bioinformatics). -</p> - -<p> -Notably, as this is a free software initiative, the project represents -major work by hundreds of software developers and ontology and data -wrangling experts. Thank you everyone! -</p> -</div> -</div> - -<div id="outline-container-org83093c3" class="outline-2"> -<h2 id="org83093c3"><span class="section-number-2">3</span> How does the public sequence resource compare to other data resources?</h2> -<div class="outline-text-2" id="text-3"> -<p> -The short version is that we use state-of-the-art practices in -bioinformatics using agile methods. Unlike the resources from large -institutes we can improve things on a dime and anyone can contribute -to building out this resource! Sequences from GenBank, EBI/ENA and -others are regularly added to PubSeq. We encourage people to everyone -to submit on PubSeq because of its superior live tooling and metadata -support (see the next question). -</p> - -<p> -Importantly: all data is published under either the <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons -4.0 attribution license</a> or the <a href="https://creativecommons.org/share-your-work/public-domain/cc0/">CC0 “No Rights Reserved” license</a> which -means it data can be published and workflows can run in public -environments allowing for improved access for research and -reproducible results. This contrasts with some other public resources, -such as GISAID. -</p> -</div> -</div> - -<div id="outline-container-org9b31fd4" class="outline-2"> -<h2 id="org9b31fd4"><span class="section-number-2">4</span> Why should I upload my data here?</h2> -<div class="outline-text-2" id="text-4"> -<ol class="org-ol"> -<li>We champion truly shareable data without licensing restrictions - with proper -attribution</li> -<li>We provide full metadata support using state-of-the-art ontology's</li> -<li>We provide a web-based sequence uploader and a command-line version -for bulk uploads</li> -<li>We provide a live SPARQL end-point for all metadata</li> -<li>We provide free data analysis and sequence comparison triggered on data upload</li> -<li>We do real work for you, with this <a href="https://workbench.lugli.arvadosapi.com/container_requests/lugli-xvhdp-bhhk4nxx1lch5od">link</a> you can see the last -run took 5.5 hours!</li> -<li>We provide free downloads of all computed output</li> -<li>There is no need to set up pipelines and/or compute clusters</li> -<li>All workflows get triggered on uploading a new sequence</li> -<li>When someone (you?) improves the software/workflows and everyone benefits</li> -<li>Your data gets automatically integrated with the Swiss Institure of -Bioinformatics COVID-19 knowledge base -<a href="https://covid-19-sparql.expasy.org/">https://covid-19-sparql.expasy.org/</a> (Elixir Switzerland)</li> -<li>Your data will be used to develop drug targets</li> -</ol> - -<p> -Finally, if you upload your data here we have workflows that output -formatted data suitable for <a href="http://covid19.genenetwork.org/blog?id=using-covid-19-pubseq-part6">uploading to EBI resources</a> (and soon -others). Uploading your data here get your data ready for upload to -multiple resources. -</p> -</div> -</div> - -<div id="outline-container-org4e92cb5" class="outline-2"> -<h2 id="org4e92cb5"><span class="section-number-2">5</span> Why should I not upload by data here?</h2> -<div class="outline-text-2" id="text-5"> -<p> -Funny question. There are only good reasons to upload your data here -and make it available to the widest audience possible. -</p> - -<p> -In fact, you can upload your data here as well as to other -resources. It is your data after all. No one can prevent you from -uploading your data to multiple resources. -</p> - -<p> -We recommend uploading to EBI and NCBI resources using our data -conversion tools. It means you only enter data once and make the -process smooth. You can also use our command line data uploader -for bulk uploads! -</p> -</div> -</div> - -<div id="outline-container-orgdfe72f6" class="outline-2"> -<h2 id="orgdfe72f6"><span class="section-number-2">6</span> How does the public sequence resource work?</h2> -<div class="outline-text-2" id="text-6"> -<p> -On uploading a sequence with metadata it will automatically be -processed and incorporated into the public pangenome with metadata -using workflows from the High Performance Open Biology Lab defined -<a href="https://github.com/hpobio-lab/viral-analysis/tree/master/cwl/pangenome-generate">here</a>. -</p> -</div> -</div> - -<div id="outline-container-orgd0c5abb" class="outline-2"> -<h2 id="orgd0c5abb"><span class="section-number-2">7</span> Who uses the public sequence resource?</h2> -<div class="outline-text-2" id="text-7"> -<p> -The Swiss Institute of Bioinformatics has included this data in -<a href="https://covid-19-sparql.expasy.org/">https://covid-19-sparql.expasy.org/</a> and made it part of <a href="https://www.uniprot.org/">Uniprot</a>. -</p> - -<p> -The Pantograph <a href="https://graph-genome.github.io/">viewer</a> uses PubSeq data for their visualisations. -</p> - -<p> -<a href="https://uthsc.edu">UTHSC</a> (USA), <a href="https://www.esr.cri.nz/">ESR</a> (New Zealand) and <a href="https://www.ornl.gov/news/ornl-fight-against-covid-19">ORNL</a> (USA) use COVID-19 PubSeq data -for monitoring, protein prediction and drug development. -</p> -</div> -</div> - -<div id="outline-container-org56f4a54" class="outline-2"> -<h2 id="org56f4a54"><span class="section-number-2">8</span> How can I contribute?</h2> -<div class="outline-text-2" id="text-8"> -<p> -You can contribute by submitting sequences, updating metadata, submit -issues on our issue tracker, and more importantly add functionality. -See 'How do I change the source code' below. Read through our online -documentation at <a href="http://covid19.genenetwork.org/blog">http://covid19.genenetwork.org/blog</a> as a starting -point. -</p> -</div> -</div> - -<div id="outline-container-org2240ef7" class="outline-2"> -<h2 id="org2240ef7"><span class="section-number-2">9</span> Is this about open data?</h2> -<div class="outline-text-2" id="text-9"> -<p> -All data is published under a <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons 4.0 attribution license</a> -(CC-BY-4.0). You can download the raw and published (GFA/RDF/FASTA) -data and store it for further processing. -</p> -</div> -</div> - -<div id="outline-container-orgbb655e0" class="outline-2"> -<h2 id="orgbb655e0"><span class="section-number-2">10</span> Is this about free software?</h2> -<div class="outline-text-2" id="text-10"> -<p> -Absolutely. Free software allows for fully reproducible pipelines. You -can take our workflows and data and run it elsewhere! -</p> -</div> -</div> - -<div id="outline-container-org4e779f4" class="outline-2"> -<h2 id="org4e779f4"><span class="section-number-2">11</span> How do I upload raw data?</h2> -<div class="outline-text-2" id="text-11"> -<p> -We are preparing raw sequence data pipelines (fastq and BAM). The -reason is that we want the best data possible for downstream analysis -(including protein prediction and test development). The current -approach where people publish final sequences of SARS-CoV-2 is lacking -because it hides how this sequence was created. For reasons of -reproducible and improved results we want/need to work with the raw -sequence reads (both short reads and long reads) and take alternative -assembly variations into consideration. This is all work in progress. -</p> -</div> -</div> - -<div id="outline-container-org83f6b7b" class="outline-2"> -<h2 id="org83f6b7b"><span class="section-number-2">12</span> How do I change metadata?</h2> -<div class="outline-text-2" id="text-12"> -<p> -See the <a href="http://covid19.genenetwork.org/blog">http://covid19.genenetwork.org/blog</a>! -</p> -</div> -</div> - -<div id="outline-container-org1bc6dab" class="outline-2"> -<h2 id="org1bc6dab"><span class="section-number-2">13</span> How do I change the work flows?</h2> -<div class="outline-text-2" id="text-13"> -<p> -Workflows are on <a href="https://github.com/arvados/bh20-seq-resource/tree/master/workflows">github</a> and can be modified. See also the BLOG -<a href="http://covid19.genenetwork.org/blog">http://covid19.genenetwork.org/blog</a> on workflows. -</p> -</div> -</div> - -<div id="outline-container-org1140d62" class="outline-2"> -<h2 id="org1140d62"><span class="section-number-2">14</span> How do I change the source code?</h2> -<div class="outline-text-2" id="text-14"> -<p> -Go to our <a href="https://github.com/arvados/bh20-seq-resource">source code repositories</a>, fork/clone the repository, change -something and submit a <a href="https://github.com/arvados/bh20-seq-resource/pulls">pull request</a> (PR). That easy! Check out how -many PRs we already merged. -</p> -</div> -</div> - -<div id="outline-container-orge182714" class="outline-2"> -<h2 id="orge182714"><span class="section-number-2">15</span> Should I choose CC-BY or CC0?</h2> -<div class="outline-text-2" id="text-15"> -<p> -Restrictive data licenses are hampering data sharing and reproducible -research. CC0 is the preferred license because it gives researchers -the most freedom. Since we provide metadata there is no reason for -others not to honour your work. We also provide CC-BY as an option -because we know people like the attribution clause. -</p> - -<p> -In all honesty: we prefer both data and software to be free. -</p> -</div> -</div> - -<div id="outline-container-orgf4a692b" class="outline-2"> -<h2 id="orgf4a692b"><span class="section-number-2">16</span> How do I deal with private data and privacy?</h2> -<div class="outline-text-2" id="text-16"> -<p> -A public sequence resource is about public data. Metadata can refer to -private data. You can use your own (anonymous) identifiers. We also -plan to combine identifiers with clinical data stored securely at -<a href="https://redcap-covid19.elixir-luxembourg.org/redcap/">REDCap</a>. See the relevant <a href="https://github.com/arvados/bh20-seq-resource/issues/21">tracker</a> for more information and contributing. -</p> -</div> -</div> - -<div id="outline-container-org7757574" class="outline-2"> -<h2 id="org7757574"><span class="section-number-2">17</span> How do I communicate with you?</h2> -<div class="outline-text-2" id="text-17"> -<p> -We use a <a href="https://gitter.im/arvados/pubseq?utm_source=share-link&utm_medium=link&utm_campaign=share-link">gitter channel</a> you can join. -</p> -</div> -</div> - -<div id="outline-container-org194006f" class="outline-2"> -<h2 id="org194006f"><span class="section-number-2">18</span> Who are the sponsors?</h2> -<div class="outline-text-2" id="text-18"> -<p> -The main sponsors are listed in the footer. In addition to the time -generously donated by many contributors we also acknowledge Amazon AWS -for donating COVID-19 related compute time. -</p> -</div> -</div> + <h1 class="title">About/FAQ</h1> + <div id="table-of-contents"> + <h2>Table of Contents</h2> + <div id="text-table-of-contents"> + <ul> + <li><a href="#org0db9061">1. What is the 'public sequence resource' about?</a></li> + <li><a href="#org983877d">2. Who created the public sequence resource?</a></li> + <li><a href="#org83093c3">3. How does the public sequence resource compare to other data resources?</a> + </li> + <li><a href="#org9b31fd4">4. Why should I upload my data here?</a></li> + <li><a href="#org4e92cb5">5. Why should I not upload by data here?</a></li> + <li><a href="#orgdfe72f6">6. How does the public sequence resource work?</a></li> + <li><a href="#orgd0c5abb">7. Who uses the public sequence resource?</a></li> + <li><a href="#org56f4a54">8. How can I contribute?</a></li> + <li><a href="#org2240ef7">9. Is this about open data?</a></li> + <li><a href="#orgbb655e0">10. Is this about free software?</a></li> + <li><a href="#org4e779f4">11. How do I upload raw data?</a></li> + <li><a href="#org83f6b7b">12. How do I change metadata?</a></li> + <li><a href="#org1bc6dab">13. How do I change the work flows?</a></li> + <li><a href="#org1140d62">14. How do I change the source code?</a></li> + <li><a href="#orge182714">15. Should I choose CC-BY or CC0?</a></li> + <li><a href="#orgf4a692b">16. How do I deal with private data and privacy?</a></li> + <li><a href="#org7757574">17. How do I communicate with you?</a></li> + <li><a href="#org194006f">18. Who are the sponsors?</a></li> + </ul> + </div> + </div> + + <div id="outline-container-org0db9061" class="outline-2"> + <h2 id="org0db9061"><span class="section-number-2">1</span> What is the 'public sequence resource' about?</h2> + <div class="outline-text-2" id="text-1"> + <p> + The <b>public sequence resource</b> aims to provide a generic and useful + resource for COVID-19 research. The focus is on providing the best + possible sequence data with associated metadata that can be used for + sequence comparison and protein prediction. + </p> + <p> + We were at the <strong>Bioinformatics Community Conference 2020</strong>! Have a look at the + <a href="https://bcc2020.sched.com/event/coLw">video talk</a></li> + (<a href="https://drive.google.com/file/d/1skXHwVKM_gl73-_4giYIOQ1IlC5X5uBo/view?usp=sharing">alternative link</a>) + and the <a href="https://drive.google.com/file/d/1vyEgfvSqhM9yIwWZ6Iys-QxhxtVxPSdp/view?usp=sharing">poster</a>. + </p> + </div> + </div> + + <div id="outline-container-org983877d" class="outline-2"> + <h2 id="org983877d"><span class="section-number-2">2</span> Who created the public sequence resource?</h2> + <div class="outline-text-2" id="text-2"> + <p> + The <b>public sequence resource</b> is an initiative by <a + href="https://github.com/arvados/bh20-seq-resource/graphs/contributors">bioinformatics</a> and + ontology experts who want to create something agile and useful for the + wider research community. The initiative started at the COVID-19 + biohackathon in April 2020 and is ongoing. The main project drivers + are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Andrea Guarracino + (University of Rome Tor Vergata), Michael Crusoe (Common Workflow + Language), Thomas Liener (consultant, formerly EBI), Erik Garrison + (UCSC) and Jerven Bolleman (Swiss Institute of Bioinformatics). + </p> + + <p> + Notably, as this is a free software initiative, the project represents + major work by hundreds of software developers and ontology and data + wrangling experts. Thank you everyone! + </p> + </div> + </div> + + <div id="outline-container-org83093c3" class="outline-2"> + <h2 id="org83093c3"><span class="section-number-2">3</span> How does the public sequence resource compare to + other data resources?</h2> + <div class="outline-text-2" id="text-3"> + <p> + The short version is that we use state-of-the-art practices in + bioinformatics using agile methods. Unlike the resources from large + institutes we can improve things on a dime and anyone can contribute + to building out this resource! Sequences from GenBank, EBI/ENA and + others are regularly added to PubSeq. We encourage people to everyone + to submit on PubSeq because of its superior live tooling and metadata + support (see the next question). + </p> + + <p> + Importantly: all data is published under either the <a + href="https://creativecommons.org/licenses/by/4.0/">Creative Commons + 4.0 attribution license</a> or the <a + href="https://creativecommons.org/share-your-work/public-domain/cc0/">CC0 “No Rights Reserved” + license</a> which + means it data can be published and workflows can run in public + environments allowing for improved access for research and + reproducible results. This contrasts with some other public resources, + such as GISAID. + </p> + </div> + </div> + + <div id="outline-container-org9b31fd4" class="outline-2"> + <h2 id="org9b31fd4"><span class="section-number-2">4</span> Why should I upload my data here?</h2> + <div class="outline-text-2" id="text-4"> + <ol class="org-ol"> + <li>We champion truly shareable data without licensing restrictions - with proper + attribution + </li> + <li>We provide full metadata support using state-of-the-art ontology's</li> + <li>We provide a web-based sequence uploader and a command-line version + for bulk uploads + </li> + <li>We provide a live SPARQL end-point for all metadata</li> + <li>We provide free data analysis and sequence comparison triggered on data upload</li> + <li>We do real work for you, with this <a + href="https://workbench.lugli.arvadosapi.com/container_requests/lugli-xvhdp-bhhk4nxx1lch5od">link</a> + you can see the last + run took 5.5 hours! + </li> + <li>We provide free downloads of all computed output</li> + <li>There is no need to set up pipelines and/or compute clusters</li> + <li>All workflows get triggered on uploading a new sequence</li> + <li>When someone (you?) improves the software/workflows and everyone benefits</li> + <li>Your data gets automatically integrated with the Swiss Institure of + Bioinformatics COVID-19 knowledge base + <a href="https://covid-19-sparql.expasy.org/">https://covid-19-sparql.expasy.org/</a> (Elixir + Switzerland) + </li> + <li>Your data will be used to develop drug targets</li> + </ol> + + <p> + Finally, if you upload your data here we have workflows that output + formatted data suitable for <a + href="http://covid19.genenetwork.org/blog?id=using-covid-19-pubseq-part6">uploading to EBI + resources</a> (and soon + others). Uploading your data here get your data ready for upload to + multiple resources. + </p> + </div> + </div> + + <div id="outline-container-org4e92cb5" class="outline-2"> + <h2 id="org4e92cb5"><span class="section-number-2">5</span> Why should I not upload by data here?</h2> + <div class="outline-text-2" id="text-5"> + <p> + Funny question. There are only good reasons to upload your data here + and make it available to the widest audience possible. + </p> + + <p> + In fact, you can upload your data here as well as to other + resources. It is your data after all. No one can prevent you from + uploading your data to multiple resources. + </p> + + <p> + We recommend uploading to EBI and NCBI resources using our data + conversion tools. It means you only enter data once and make the + process smooth. You can also use our command line data uploader + for bulk uploads! + </p> + </div> + </div> + + <div id="outline-container-orgdfe72f6" class="outline-2"> + <h2 id="orgdfe72f6"><span class="section-number-2">6</span> How does the public sequence resource work?</h2> + <div class="outline-text-2" id="text-6"> + <p> + On uploading a sequence with metadata it will automatically be + processed and incorporated into the public pangenome with metadata + using workflows from the High Performance Open Biology Lab defined + <a href="https://github.com/hpobio-lab/viral-analysis/tree/master/cwl/pangenome-generate">here</a>. + </p> + </div> + </div> + + <div id="outline-container-orgd0c5abb" class="outline-2"> + <h2 id="orgd0c5abb"><span class="section-number-2">7</span> Who uses the public sequence resource?</h2> + <div class="outline-text-2" id="text-7"> + <p> + The Swiss Institute of Bioinformatics has included this data in + <a href="https://covid-19-sparql.expasy.org/">https://covid-19-sparql.expasy.org/</a> and made it part + of <a href="https://www.uniprot.org/">Uniprot</a>. + </p> + + <p> + The Pantograph <a href="https://graph-genome.github.io/">viewer</a> uses PubSeq data for their + visualisations. + </p> + + <p> + <a href="https://uthsc.edu">UTHSC</a> (USA), <a href="https://www.esr.cri.nz/">ESR</a> (New Zealand) and + <a href="https://www.ornl.gov/news/ornl-fight-against-covid-19">ORNL</a> (USA) use COVID-19 PubSeq data + for monitoring, protein prediction and drug development. + </p> + </div> + </div> + + <div id="outline-container-org56f4a54" class="outline-2"> + <h2 id="org56f4a54"><span class="section-number-2">8</span> How can I contribute?</h2> + <div class="outline-text-2" id="text-8"> + <p> + You can contribute by submitting sequences, updating metadata, submit + issues on our issue tracker, and more importantly add functionality. + See 'How do I change the source code' below. Read through our online + documentation at <a href="http://covid19.genenetwork.org/blog">http://covid19.genenetwork.org/blog</a> + as a starting + point. + </p> + </div> + </div> + + <div id="outline-container-org2240ef7" class="outline-2"> + <h2 id="org2240ef7"><span class="section-number-2">9</span> Is this about open data?</h2> + <div class="outline-text-2" id="text-9"> + <p> + All data is published under a <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons + 4.0 attribution license</a> + (CC-BY-4.0). You can download the raw and published (GFA/RDF/FASTA) + data and store it for further processing. + </p> + </div> + </div> + + <div id="outline-container-orgbb655e0" class="outline-2"> + <h2 id="orgbb655e0"><span class="section-number-2">10</span> Is this about free software?</h2> + <div class="outline-text-2" id="text-10"> + <p> + Absolutely. Free software allows for fully reproducible pipelines. You + can take our workflows and data and run it elsewhere! + </p> + </div> + </div> + + <div id="outline-container-org4e779f4" class="outline-2"> + <h2 id="org4e779f4"><span class="section-number-2">11</span> How do I upload raw data?</h2> + <div class="outline-text-2" id="text-11"> + <p> + We are preparing raw sequence data pipelines (fastq and BAM). The + reason is that we want the best data possible for downstream analysis + (including protein prediction and test development). The current + approach where people publish final sequences of SARS-CoV-2 is lacking + because it hides how this sequence was created. For reasons of + reproducible and improved results we want/need to work with the raw + sequence reads (both short reads and long reads) and take alternative + assembly variations into consideration. This is all work in progress. + </p> + </div> + </div> + + <div id="outline-container-org83f6b7b" class="outline-2"> + <h2 id="org83f6b7b"><span class="section-number-2">12</span> How do I change metadata?</h2> + <div class="outline-text-2" id="text-12"> + <p> + See the <a href="http://covid19.genenetwork.org/blog">http://covid19.genenetwork.org/blog</a>! + </p> + </div> + </div> + + <div id="outline-container-org1bc6dab" class="outline-2"> + <h2 id="org1bc6dab"><span class="section-number-2">13</span> How do I change the work flows?</h2> + <div class="outline-text-2" id="text-13"> + <p> + Workflows are on <a href="https://github.com/arvados/bh20-seq-resource/tree/master/workflows">github</a> + and can be modified. See also the BLOG + <a href="http://covid19.genenetwork.org/blog">http://covid19.genenetwork.org/blog</a> on workflows. + </p> + </div> + </div> + + <div id="outline-container-org1140d62" class="outline-2"> + <h2 id="org1140d62"><span class="section-number-2">14</span> How do I change the source code?</h2> + <div class="outline-text-2" id="text-14"> + <p> + Go to our <a href="https://github.com/arvados/bh20-seq-resource">source code repositories</a>, + fork/clone the repository, change + something and submit a <a href="https://github.com/arvados/bh20-seq-resource/pulls">pull request</a> + (PR). That easy! Check out how + many PRs we already merged. + </p> + </div> + </div> + + <div id="outline-container-orge182714" class="outline-2"> + <h2 id="orge182714"><span class="section-number-2">15</span> Should I choose CC-BY or CC0?</h2> + <div class="outline-text-2" id="text-15"> + <p> + Restrictive data licenses are hampering data sharing and reproducible + research. CC0 is the preferred license because it gives researchers + the most freedom. Since we provide metadata there is no reason for + others not to honour your work. We also provide CC-BY as an option + because we know people like the attribution clause. + </p> + + <p> + In all honesty: we prefer both data and software to be free. + </p> + </div> + </div> + + <div id="outline-container-orgf4a692b" class="outline-2"> + <h2 id="orgf4a692b"><span class="section-number-2">16</span> How do I deal with private data and privacy?</h2> + <div class="outline-text-2" id="text-16"> + <p> + A public sequence resource is about public data. Metadata can refer to + private data. You can use your own (anonymous) identifiers. We also + plan to combine identifiers with clinical data stored securely at + <a href="https://redcap-covid19.elixir-luxembourg.org/redcap/">REDCap</a>. See the relevant <a + href="https://github.com/arvados/bh20-seq-resource/issues/21">tracker</a> for more information and + contributing. + </p> + </div> + </div> + + <div id="outline-container-org7757574" class="outline-2"> + <h2 id="org7757574"><span class="section-number-2">17</span> How do I communicate with you?</h2> + <div class="outline-text-2" id="text-17"> + <p> + We use a <a + href="https://gitter.im/arvados/pubseq?utm_source=share-link&utm_medium=link&utm_campaign=share-link">gitter + channel</a> you can join. + </p> + </div> + </div> + + <div id="outline-container-org194006f" class="outline-2"> + <h2 id="org194006f"><span class="section-number-2">18</span> Who are the sponsors?</h2> + <div class="outline-text-2" id="text-18"> + <p> + The main sponsors are listed in the footer. In addition to the time + generously donated by many contributors we also acknowledge Amazon AWS + for donating COVID-19 related compute time. + </p> + </div> + </div> </div> <div id="postamble" class="status"> -<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-07-18 Sat 03:27</small>. + <hr> + <small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs + org-mode and a healthy dose of Lisp!<br/>Modified 2020-07-18 Sat 03:27</small>. </div> </body> </html> diff --git a/doc/web/about.org b/doc/web/about.org index 39fb667..29a80bf 100644 --- a/doc/web/about.org +++ b/doc/web/about.org @@ -17,7 +17,10 @@ - [[#how-do-i-change-the-work-flows][How do I change the work flows?]] - [[#how-do-i-change-the-source-code][How do I change the source code?]] - [[#should-i-choose-cc-by-or-cc0][Should I choose CC-BY or CC0?]] + - [[#are-there-also-variant-in-the-RDF-databases]][Are there also variant in the RDF databases?] - [[#how-do-i-deal-with-private-data-and-privacy][How do I deal with private data and privacy?]] + - [[#do-you-have-any-checks-or-concerns-if-human-sequence-accidentally-submitted-to-your-service-as-part-of-a-fastq][Do you have any checks or concerns if human sequence accidentally submitted to your service as part of a fastq?] + - [[#does-PubSeq-support-only-SARS-CoV-2=data]][Does PubSeq support only SARS-CoV-2 data?] - [[#how-do-i-communicate-with-you][How do I communicate with you?]] - [[#who-are-the-sponsors][Who are the sponsors?]] @@ -28,6 +31,8 @@ resource for COVID-19 research. The focus is on providing the best possible sequence data with associated metadata that can be used for sequence comparison and protein prediction. +We were at the *Bioinformatics Community Conference 2020*! Have a look at the [[https://bcc2020.sched.com/event/coLw]][video talk] ([[https://drive.google.com/file/d/1skXHwVKM_gl73-_4giYIOQ1IlC5X5uBo/view?usp=sharing]][alternative link]) and the [[https://drive.google.com/file/d/1vyEgfvSqhM9yIwWZ6Iys-QxhxtVxPSdp/view?usp=sharing]][poster]. + * Who created the public sequence resource? The *public sequence resource* is an initiative by [[https://github.com/arvados/bh20-seq-resource/graphs/contributors][bioinformatics]] and @@ -171,6 +176,12 @@ because we know people like the attribution clause. In all honesty: we prefer both data and software to be free. +* Are there also variant in the RDF databases? * + +We do output a RDF file with the pangenome built in, and you can parse it because it has variants implicitly. + +We are also writing tools to generate VCF files directly from the pangenome. + * How do I deal with private data and privacy? A public sequence resource is about public data. Metadata can refer to @@ -178,6 +189,15 @@ private data. You can use your own (anonymous) identifiers. We also plan to combine identifiers with clinical data stored securely at [[https://redcap-covid19.elixir-luxembourg.org/redcap/][REDCap]]. See the relevant [[https://github.com/arvados/bh20-seq-resource/issues/21][tracker]] for more information and contributing. +* Do you have any checks or concerns if human sequence accidentally submitted to your service as part of a fastq? * + +We are planning to remove reads that match the human reference. + +* Does PubSeq support only SARS-CoV-2 data? * + +To date, PubSeq is a resource specific to SARS-CoV-2, but we are designing it to be able to support other species in the future. + + * How do I communicate with you? We use a [[https://gitter.im/arvados/pubseq?utm_source=share-link&utm_medium=link&utm_campaign=share-link][gitter channel]] you can join. diff --git a/image/homepage.png b/image/homepage.png Binary files differnew file mode 100644 index 0000000..f66f9fd --- /dev/null +++ b/image/homepage.png diff --git a/image/website.png b/image/website.png Binary files differdeleted file mode 100644 index fa57ca5..0000000 --- a/image/website.png +++ /dev/null diff --git a/workflows/pangenome-generate/odgi-build-from-spoa-gfa.cwl b/workflows/pangenome-generate/odgi-build-from-spoa-gfa.cwl new file mode 100644 index 0000000..2459ce7 --- /dev/null +++ b/workflows/pangenome-generate/odgi-build-from-spoa-gfa.cwl @@ -0,0 +1,29 @@ +cwlVersion: v1.1 +class: CommandLineTool +inputs: + inputGFA: File +outputs: + odgiGraph: + type: File + outputBinding: + glob: $(inputs.inputGFA.nameroot).unchop.sorted.odgi +requirements: + InlineJavascriptRequirement: {} + ShellCommandRequirement: {} +hints: + DockerRequirement: + dockerPull: "quay.io/biocontainers/odgi:v0.3--py37h8b12597_0" + ResourceRequirement: + coresMin: 4 + ramMin: $(7 * 1024) + outdirMin: $(Math.ceil((inputs.inputGFA.size/(1024*1024*1024)+1) * 2)) + InitialWorkDirRequirement: + listing: + - entry: $(inputs.inputGFA) + writable: true +arguments: [odgi, build, -g, $(inputs.inputGFA), -o, -, + {shellQuote: false, valueFrom: "|"}, + odgi, unchop, -i, -, -o, -, + {shellQuote: false, valueFrom: "|"}, + odgi, sort, -i, -, -p, s, -o, $(inputs.inputGFA.nameroot).unchop.sorted.odgi + ] diff --git a/workflows/pangenome-generate/pangenome-generate_spoa.cwl b/workflows/pangenome-generate/pangenome-generate_spoa.cwl new file mode 100644 index 0000000..958ffb6 --- /dev/null +++ b/workflows/pangenome-generate/pangenome-generate_spoa.cwl @@ -0,0 +1,122 @@ +#!/usr/bin/env cwl-runner +cwlVersion: v1.1 +class: Workflow +requirements: + ScatterFeatureRequirement: {} + StepInputExpressionRequirement: {} +inputs: + inputReads: File[] + metadata: File[] + metadataSchema: File + subjects: string[] + exclude: File? + bin_widths: + type: int[] + default: [ 1, 4, 16, 64, 256, 1000, 4000, 16000] + doc: width of each bin in basepairs along the graph vector + cells_per_file: + type: int + default: 100 + doc: Cells per file on component_segmentation +outputs: + odgiGraph: + type: File + outputSource: buildGraph/odgiGraph + odgiPNG: + type: File + outputSource: vizGraph/graph_image + spoaGFA: + type: File + outputSource: induceGraph/spoaGFA + odgiRDF: + type: File + outputSource: odgi2rdf/rdf + readsMergeDedup: + type: File + outputSource: dedup/reads_dedup + mergedMetadata: + type: File + outputSource: mergeMetadata/merged + indexed_paths: + type: File + outputSource: index_paths/indexed_paths + colinear_components: + type: Directory + outputSource: segment_components/colinear_components +steps: + relabel: + in: + readsFA: inputReads + subjects: subjects + exclude: exclude + out: [relabeledSeqs, originalLabels] + run: relabel-seqs.cwl + dedup: + in: {reads: relabel/relabeledSeqs} + out: [reads_dedup, dups] + run: ../tools/seqkit/seqkit_rmdup.cwl + sort_by_quality_and_len: + in: {reads: dedup/reads_dedup} + out: [reads_sorted_by_quality_and_len] + run: sort_fasta_by_quality_and_len.cwl + induceGraph: + in: + readsFA: sort_by_quality_and_len/reads_sorted_by_quality_and_len + out: [spoaGFA] + run: spoa.cwl + buildGraph: + in: {inputGFA: induceGraph/spoaGFA} + out: [odgiGraph] + run: odgi-build-from-spoa-gfa.cwl + vizGraph: + in: + sparse_graph_index: buildGraph/odgiGraph + width: + default: 50000 + height: + default: 500 + path_per_row: + default: true + path_height: + default: 4 + out: [graph_image] + run: ../tools/odgi/odgi_viz.cwl + odgi2rdf: + in: {odgi: buildGraph/odgiGraph} + out: [rdf] + run: odgi_to_rdf.cwl + mergeMetadata: + in: + metadata: metadata + metadataSchema: metadataSchema + subjects: subjects + dups: dedup/dups + originalLabels: relabel/originalLabels + out: [merged] + run: merge-metadata.cwl + bin_paths: + run: ../tools/odgi/odgi_bin.cwl + in: + sparse_graph_index: buildGraph/odgiGraph + bin_width: bin_widths + scatter: bin_width + out: [ bins, pangenome_sequence ] + index_paths: + label: Create path index + run: ../tools/odgi/odgi_pathindex.cwl + in: + sparse_graph_index: buildGraph/odgiGraph + out: [ indexed_paths ] + segment_components: + label: Run component segmentation + run: ../tools/graph-genome-segmentation/component_segmentation.cwl + in: + bins: bin_paths/bins + cells_per_file: cells_per_file + pangenome_sequence: + source: bin_paths/pangenome_sequence + valueFrom: $(self[0]) + # the bin_paths step is scattered over the bin_width array, but always using the same sparse_graph_index + # the pangenome_sequence that is extracted is exactly the same for the same sparse_graph_index + # regardless of bin_width, so we take the first pangenome_sequence as input for this step + out: [ colinear_components ] diff --git a/workflows/pangenome-generate/sort_fasta_by_quality_and_len.cwl b/workflows/pangenome-generate/sort_fasta_by_quality_and_len.cwl new file mode 100644 index 0000000..59f027e --- /dev/null +++ b/workflows/pangenome-generate/sort_fasta_by_quality_and_len.cwl @@ -0,0 +1,18 @@ +cwlVersion: v1.1 +class: CommandLineTool +inputs: + readsFA: + type: File + inputBinding: {position: 2} + script: + type: File + inputBinding: {position: 1} + default: {class: File, location: sort_fasta_by_quality_and_len.py} +stdout: $(inputs.readsFA.nameroot).sorted_by_quality_and_len.fasta +outputs: + sortedReadsFA: + type: stdout +requirements: + InlineJavascriptRequirement: {} + ShellCommandRequirement: {} +baseCommand: [python] diff --git a/workflows/pangenome-generate/sort_fasta_by_quality_and_len.py b/workflows/pangenome-generate/sort_fasta_by_quality_and_len.py new file mode 100644 index 0000000..e48fd68 --- /dev/null +++ b/workflows/pangenome-generate/sort_fasta_by_quality_and_len.py @@ -0,0 +1,35 @@ +#!/usr/bin/env python3 + +# Sort the sequences by quality (percentage of number of N bases not called, descending) and by length (descending). +# The best sequence is the longest one, with no uncalled bases. + +import os +import sys +import gzip + +def open_gzipsafe(path_file): + if path_file.endswith('.gz'): + return gzip.open(path_file, 'rt') + else: + return open(path_file) + +path_fasta = sys.argv[1] + +header_to_seq_dict = {} +header_percCalledBases_seqLength_list = [] + +with open_gzipsafe(path_fasta) as f: + for fasta in f.read().strip('\n>').split('>'): + header = fasta.strip('\n').split('\n')[0] + + header_to_seq_dict[ + header + ] = ''.join(fasta.strip('\n').split('\n')[1:]) + + seq_len = len(header_to_seq_dict[header]) + header_percCalledBases_seqLength_list.append([ + header, header_to_seq_dict[header].count('N'), (seq_len - header_to_seq_dict[header].count('N'))/seq_len, seq_len + ]) + +for header, x, percCalledBases, seqLength_list in sorted(header_percCalledBases_seqLength_list, key=lambda x: (x[-2], x[-1]), reverse = True): + sys.stdout.write('>{}\n{}\n'.format(header, header_to_seq_dict[header])) diff --git a/workflows/pangenome-generate/spoa.cwl b/workflows/pangenome-generate/spoa.cwl new file mode 100644 index 0000000..1e390d8 --- /dev/null +++ b/workflows/pangenome-generate/spoa.cwl @@ -0,0 +1,27 @@ +cwlVersion: v1.1 +class: CommandLineTool +inputs: + readsFA: File +stdout: $(inputs.readsFA.nameroot).g6.gfa +script: + type: File + default: {class: File, location: relabel-seqs.py} +outputs: + spoaGFA: + type: stdout +requirements: + InlineJavascriptRequirement: {} + ShellCommandRequirement: {} +hints: + DockerRequirement: + dockerPull: "quay.io/biocontainers/spoa:3.0.2--hc9558a2_0" + ResourceRequirement: + coresMin: 1 + ramMin: $(15 * 1024) + outdirMin: $(Math.ceil(inputs.readsFA.size/(1024*1024*1024) + 20)) +baseCommand: spoa +arguments: [ + $(inputs.readsFA), + -G, + -g, '-6' +] |