About/FAQ
+Table of Contents
+-
+
- 1. What is the 'public sequence resource' about? +
- 2. Who created the public sequence resource? +
- 3. How does the public sequence resource compare to other data resources? +
- 4. Why should I upload my data here? +
- 5. Why should I not upload by data here? +
- 6. How does the public sequence resource work? +
- 7. Is this about open data? +
- 8. Is this about free software? +
- 9. How do I upload raw data? +
- 10. How do I change metadata? +
- 11. How do I change the work flows? +
- 12. How do I change the source code? +
- 13. How do I deal with private data and privacy? +
- 14. Who are the sponsors? +
1 What is the 'public sequence resource' about?
++The public sequence resource aims to provide a generic and useful +resource for COVID-19 research. The focus is on providing the best +possible sequence data with associated metadata that can be used for +sequence comparison and protein prediction. +
+2 Who created the public sequence resource?
++The public sequence resource is an initiative by bioinformatics and +ontology experts who want to create something agile and useful for +the wider research community. The initiative started at the COVID-19 +biohackathon in April 2020 and is ongoing. The main project drivers +are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Michael Crusoe (Common +Workflow Language) and Thomas Liener (consultant, formerly EBI). But +as this is a free software initiative the project represents major +work by hundreds of software developers and ontology and data +wrangling experts. Thank you everyone! +
+3 How does the public sequence resource compare to other data resources?
++The short version is that we use state-of-the-art practices in +bioinformatics using agile methods. Unlike the resources from large +institutes we can improve things on a dime and anyone can contribute +to building out this resource! +
+ ++Importantly: all data is published under the Creative Commons 4.0 +attribution license which means it data can be published and workflows +can run in public environments allowing for improved access for +research and reproducible results. This contrasts with some other +public resources, including GISAID. +
+4 Why should I upload my data here?
+-
+
- We champion truly shareable data without licensing restrictions - with proper +attribution +
- We provide full metadata support using state-of-the-art ontology's +
- We provide a web-based sequence uploader and a command-line version +for bulk uploads +
- We provide a live SPARQL end-point for all metadata +
- We provide free data analysis and sequence comparison triggered on data upload +
- We provide free downloads of all computed output +
- There is no need to set up pipelines and/or compute clusters +
- All workflows get triggered on uploading a new sequence +
- When someone (you?) improves the software/workflows and everyone benefits +
+Finally, if you upload your data here we have workflows that output +formatted data suitable for uploading to EBI resources (and soon +others). Uploading your data here get your data ready for upload to +multiple resources. +
+5 Why should I not upload by data here?
++Funny question. There is no good reason not to upload your data here! +In fact, you can upload your data here as well as to other +resources. It is your data after all. No one can prevent you from +uploading your data to multiple resources. We recommend uploading to +EBI and NCBI resources. Use our data conversion tools to only enter +data once and make the process smooth. +
+6 How does the public sequence resource work?
++On uploading a sequence with metadata it will automatically be +processed and incorporated into the public pangenome with metadata +using workflows from the High Performance Open Biology Lab defined +here. +
+7 Is this about open data?
++All data is published under a Creative Commons 4.0 attribution license +(CC-BY-4.0). You can download the raw and published (GFA/RDF/FASTA) +data and store it for further processing. +
+8 Is this about free software?
++Absolutely. Free software allows for fully reproducible pipelines. You +can take our workflows and data and run it elsewhere! +
+9 How do I upload raw data?
++We are preparing raw sequence data pipelines (fastq and BAM). The +reason is that we want the best data possible for downstream analysis +(including protein prediction and test development). The current +approach where people publish final sequences of SARS-CoV-2 is lacking +because it hides how this sequence was created. For reasons of +reproducible and improved results we want/need to work with the raw +sequence reads (both short reads and long reads) and take alternative +assembly variations into consideration. This is all work in progress. +
+10 How do I change metadata?
++See the http://covid19.genenetwork.org/blog! +
+11 How do I change the work flows?
++See the http://covid19.genenetwork.org/blog! +
+12 How do I change the source code?
++Go to our source code repositories, fork/clone the repository, change +something and submit a pull request (PR). That easy! Check out how +many PRs we already merged. +
+13 How do I deal with private data and privacy?
++A public sequence resource is about public data. Metadata can refer to +private data. You can use your own (anonymous) identifiers. We also +plan to combine identifiers with clinical data stored securely at +REDCap. Contact Pjotr Prins if you want to work on this. +
+14 Who are the sponsors?
++The main sponsors are listed in the footer. In addition to the time +generously donated by many contributors we also acknowledge Amazon AWS +for donating COVID-19 related compute time. +
+