From fc872f15da426926414fb7629bf6660d9880ed1e Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Fri, 10 Apr 2020 17:16:35 -0500 Subject: Draft --- paper/paper.md | 160 ++++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 135 insertions(+), 25 deletions(-) (limited to 'paper/paper.md') diff --git a/paper/paper.md b/paper/paper.md index caa9903..813c91b 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -1,8 +1,9 @@ --- -title: 'Public Sequence Resource for COVID-19' +title: 'CPSR: COVID-19 Public Sequence Resource' +title_short: 'CPSR: COVID-19 Public Sequence Resource' tags: - Sequencing - - COVID + - COVID-19 authors: - name: Pjotr Prins orcid: 0000-0002-8021-9162 @@ -25,16 +26,30 @@ authors: - name: Rutger Vos orcid: 0000 affiliation: 7 - - Michael Heuer + - name: Michael Heuer orcid: 0000 affiliation: 8 - + - name: Adam Novak + orcid: 0000 + affiliation: 9 + - name: Alex Kanitz + orcid: 0000 + affiliation: 10 + - name: Jerven Bolleman + orcid: 0000 + affiliation: 11 + - name: Joep de Ligt + orcid: 0000 + affiliation: 12 affiliations: - name: Department of Genetics, Genomics and Informatics, The University of Tennessee Health Science Center, Memphis, TN, USA. index: 1 - name: Curii, Boston, USA index: 2 date: 11 April 2020 +event: COVID2020 +group: Public Sequence Uploader +authors_short: Pjotr Prins & Peter Amstutz \emph{et al.} bibliography: paper.bib --- @@ -49,13 +64,48 @@ pasting above link (or yours) with https://github.com/biohackrxiv/bhxiv-gen-pdf +Note that author order will change! + --> # Introduction -As part of the one week COVID-19 Biohackathion 2020, we formed a -working group on creating a public sequence resource for Corona virus. - +As part of the COVID-19 Biohackathion 2020 we formed a working +group to create a COVID-19 Public Sequence Resource (CPSR) for +Corona virus sequences. The general idea was to create a +repository that has a low barrier to entry for uploading sequence +data using best practices. I.e., data published with a creative +commons 4.0 (CC-4.0) license with metadata using state-of-the art +standards and, perhaps most importantly, providing standardized +workflows that get triggered on upload, so that results are +immediately available in standardized data formats. + +Existing data repositories for viral data include GISAID, EBI ENA +and NCBI. These repositories allow for free sharing of data, but +do not add value in terms of running immediate +computations. Also, GISAID, at this point, has the most complete +collection of genetic sequence data of influenza viruses and +related clinical and epidemiological data through its +database. But, due to a restricted license, data submitted to +GISAID can not be used for online web services and on-the-fly +computation. In addition GISAID registration which can take weeks +and, painfully, forces users to download sequences one at a time +to do any type of analysis. In our opinion this does not fit a +pandemic scenario where fast turnaround times are key and data +analysis has to be agile. + +We managed to create a useful sequence uploader utility within +one week by leveraging existing technologies, such as the Arvados +Cloud platform [@Arvados], the Common Workflow Langauge (CWL) +[@CWL], Docker images built with Debian packages, and the many +free and open source software packages that are available for +bioinformatics. + +The source code for the CLI uploader and web uploader can be +found [here](https://github.com/arvados/bh20-seq-resource) +(FIXME: we'll have a full page). The CWL workflow definitions can +be found [here](https://github.com/hpobio-lab/viral-analysis) and +on CWL hub (FIXME). + +We aim to add more workflows to CPSR, for example to prepare +sequence data for submitting in other public repositories, such +as EBI ENA and GISAID. This will allow researchers to share data +in multiple systems without pain, circumventing current sharing +restrictions. + +# Acknowledgements + +We thank the COVID-19 BioHackathon 2020 and ELIXIR for creating a +unique event that triggered many collaborations. We thank Curii +Corporation for their financial support for creating and running +Arvados instances. We thank Amazon AWS for their financial +support to run COVID-19 workflows. We also want to thank the +other working groups in the BioHackathon who generously +contributed onthologies, workflows and software. + # References -- cgit v1.2.3 From dcd7f12d10e7f6399a0d515606148f85358d9dc7 Mon Sep 17 00:00:00 2001 From: Michael L Heuer Date: Fri, 10 Apr 2020 17:52:45 -0500 Subject: Add author and affiliation --- paper/paper.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'paper/paper.md') diff --git a/paper/paper.md b/paper/paper.md index 813c91b..bc7e835 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -27,7 +27,7 @@ authors: orcid: 0000 affiliation: 7 - name: Michael Heuer - orcid: 0000 + orcid: 0000-0002-9052-6000 affiliation: 8 - name: Adam Novak orcid: 0000 @@ -46,6 +46,8 @@ affiliations: index: 1 - name: Curii, Boston, USA index: 2 + - name: RISE Lab, University of California Berkeley, Berkeley, CA, USA. + index: 8 date: 11 April 2020 event: COVID2020 group: Public Sequence Uploader -- cgit v1.2.3 From 89f996912240cfb2f5adcf95f401dd59319dac3b Mon Sep 17 00:00:00 2001 From: Adam Novak Date: Fri, 10 Apr 2020 16:28:08 -0700 Subject: Add affiliation info --- paper/paper.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'paper/paper.md') diff --git a/paper/paper.md b/paper/paper.md index 813c91b..b789f60 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -29,8 +29,8 @@ authors: - name: Michael Heuer orcid: 0000 affiliation: 8 - - name: Adam Novak - orcid: 0000 + - name: Adam M Novak + orcid: 0000-0001-5828-047X affiliation: 9 - name: Alex Kanitz orcid: 0000 @@ -46,6 +46,8 @@ affiliations: index: 1 - name: Curii, Boston, USA index: 2 + - name: UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA. + index: 9 date: 11 April 2020 event: COVID2020 group: Public Sequence Uploader -- cgit v1.2.3 From fcd45e42942750950076553ac995d738c863aa7a Mon Sep 17 00:00:00 2001 From: Adam Novak Date: Fri, 10 Apr 2020 16:30:21 -0700 Subject: Grab Erik --- paper/paper.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'paper/paper.md') diff --git a/paper/paper.md b/paper/paper.md index b789f60..e7678dc 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -31,7 +31,7 @@ authors: affiliation: 8 - name: Adam M Novak orcid: 0000-0001-5828-047X - affiliation: 9 + affiliation: 5 - name: Alex Kanitz orcid: 0000 affiliation: 10 @@ -47,7 +47,7 @@ affiliations: - name: Curii, Boston, USA index: 2 - name: UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA. - index: 9 + index: 5 date: 11 April 2020 event: COVID2020 group: Public Sequence Uploader -- cgit v1.2.3