From fc872f15da426926414fb7629bf6660d9880ed1e Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Fri, 10 Apr 2020 17:16:35 -0500 Subject: Draft --- paper/paper.md | 160 ++++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 135 insertions(+), 25 deletions(-) (limited to 'paper/paper.md') diff --git a/paper/paper.md b/paper/paper.md index caa9903..813c91b 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -1,8 +1,9 @@ --- -title: 'Public Sequence Resource for COVID-19' +title: 'CPSR: COVID-19 Public Sequence Resource' +title_short: 'CPSR: COVID-19 Public Sequence Resource' tags: - Sequencing - - COVID + - COVID-19 authors: - name: Pjotr Prins orcid: 0000-0002-8021-9162 @@ -25,16 +26,30 @@ authors: - name: Rutger Vos orcid: 0000 affiliation: 7 - - Michael Heuer + - name: Michael Heuer orcid: 0000 affiliation: 8 - + - name: Adam Novak + orcid: 0000 + affiliation: 9 + - name: Alex Kanitz + orcid: 0000 + affiliation: 10 + - name: Jerven Bolleman + orcid: 0000 + affiliation: 11 + - name: Joep de Ligt + orcid: 0000 + affiliation: 12 affiliations: - name: Department of Genetics, Genomics and Informatics, The University of Tennessee Health Science Center, Memphis, TN, USA. index: 1 - name: Curii, Boston, USA index: 2 date: 11 April 2020 +event: COVID2020 +group: Public Sequence Uploader +authors_short: Pjotr Prins & Peter Amstutz \emph{et al.} bibliography: paper.bib --- @@ -49,13 +64,48 @@ pasting above link (or yours) with https://github.com/biohackrxiv/bhxiv-gen-pdf +Note that author order will change! + --> # Introduction -As part of the one week COVID-19 Biohackathion 2020, we formed a -working group on creating a public sequence resource for Corona virus. - +As part of the COVID-19 Biohackathion 2020 we formed a working +group to create a COVID-19 Public Sequence Resource (CPSR) for +Corona virus sequences. The general idea was to create a +repository that has a low barrier to entry for uploading sequence +data using best practices. I.e., data published with a creative +commons 4.0 (CC-4.0) license with metadata using state-of-the art +standards and, perhaps most importantly, providing standardized +workflows that get triggered on upload, so that results are +immediately available in standardized data formats. + +Existing data repositories for viral data include GISAID, EBI ENA +and NCBI. These repositories allow for free sharing of data, but +do not add value in terms of running immediate +computations. Also, GISAID, at this point, has the most complete +collection of genetic sequence data of influenza viruses and +related clinical and epidemiological data through its +database. But, due to a restricted license, data submitted to +GISAID can not be used for online web services and on-the-fly +computation. In addition GISAID registration which can take weeks +and, painfully, forces users to download sequences one at a time +to do any type of analysis. In our opinion this does not fit a +pandemic scenario where fast turnaround times are key and data +analysis has to be agile. + +We managed to create a useful sequence uploader utility within +one week by leveraging existing technologies, such as the Arvados +Cloud platform [@Arvados], the Common Workflow Langauge (CWL) +[@CWL], Docker images built with Debian packages, and the many +free and open source software packages that are available for +bioinformatics. + +The source code for the CLI uploader and web uploader can be +found [here](https://github.com/arvados/bh20-seq-resource) +(FIXME: we'll have a full page). The CWL workflow definitions can +be found [here](https://github.com/hpobio-lab/viral-analysis) and +on CWL hub (FIXME). + +We aim to add more workflows to CPSR, for example to prepare +sequence data for submitting in other public repositories, such +as EBI ENA and GISAID. This will allow researchers to share data +in multiple systems without pain, circumventing current sharing +restrictions. + +# Acknowledgements + +We thank the COVID-19 BioHackathon 2020 and ELIXIR for creating a +unique event that triggered many collaborations. We thank Curii +Corporation for their financial support for creating and running +Arvados instances. We thank Amazon AWS for their financial +support to run COVID-19 workflows. We also want to thank the +other working groups in the BioHackathon who generously +contributed onthologies, workflows and software. + # References -- cgit v1.2.3