About/FAQ
-Table of Contents
--
-
- 1. What is the 'public sequence resource' about? -
- 2. Who created the public sequence resource? -
- 3. How does the public sequence resource compare to other data resources? -
- 4. Why should I upload my data here? -
- 5. Why should I not upload by data here? -
- 6. How does the public sequence resource work? -
- 7. Who uses the public sequence resource? -
- 8. How can I contribute? -
- 9. Is this about open data? -
- 10. Is this about free software? -
- 11. How do I upload raw data? -
- 12. How do I change metadata? -
- 13. How do I change the work flows? -
- 14. How do I change the source code? -
- 15. Should I choose CC-BY or CC0? -
- 16. How do I deal with private data and privacy? -
- 17. How do I communicate with you? -
- 18. Who are the sponsors? -
1 What is the 'public sequence resource' about?
--The public sequence resource aims to provide a generic and useful -resource for COVID-19 research. The focus is on providing the best -possible sequence data with associated metadata that can be used for -sequence comparison and protein prediction. -
-2 Who created the public sequence resource?
--The public sequence resource is an initiative by bioinformatics and -ontology experts who want to create something agile and useful for the -wider research community. The initiative started at the COVID-19 -biohackathon in April 2020 and is ongoing. The main project drivers -are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Andrea Guarracino -(University of Rome Tor Vergata), Michael Crusoe (Common Workflow -Language), Thomas Liener (consultant, formerly EBI), Erik Garrison -(UCSC) and Jerven Bolleman (Swiss Institute of Bioinformatics). -
- --Notably, as this is a free software initiative, the project represents -major work by hundreds of software developers and ontology and data -wrangling experts. Thank you everyone! -
-3 How does the public sequence resource compare to other data resources?
--The short version is that we use state-of-the-art practices in -bioinformatics using agile methods. Unlike the resources from large -institutes we can improve things on a dime and anyone can contribute -to building out this resource! Sequences from GenBank, EBI/ENA and -others are regularly added to PubSeq. We encourage people to everyone -to submit on PubSeq because of its superior live tooling and metadata -support (see the next question). -
- --Importantly: all data is published under either the Creative Commons -4.0 attribution license or the CC0 “No Rights Reserved” license which -means it data can be published and workflows can run in public -environments allowing for improved access for research and -reproducible results. This contrasts with some other public resources, -such as GISAID. -
-4 Why should I upload my data here?
--
-
- We champion truly shareable data without licensing restrictions - with proper -attribution -
- We provide full metadata support using state-of-the-art ontology's -
- We provide a web-based sequence uploader and a command-line version -for bulk uploads -
- We provide a live SPARQL end-point for all metadata -
- We provide free data analysis and sequence comparison triggered on data upload -
- We do real work for you, with this link you can see the last -run took 5.5 hours! -
- We provide free downloads of all computed output -
- There is no need to set up pipelines and/or compute clusters -
- All workflows get triggered on uploading a new sequence -
- When someone (you?) improves the software/workflows and everyone benefits -
- Your data gets automatically integrated with the Swiss Institure of -Bioinformatics COVID-19 knowledge base -https://covid-19-sparql.expasy.org/ (Elixir Switzerland) -
- Your data will be used to develop drug targets -
-Finally, if you upload your data here we have workflows that output -formatted data suitable for uploading to EBI resources (and soon -others). Uploading your data here get your data ready for upload to -multiple resources. -
-5 Why should I not upload by data here?
--Funny question. There are only good reasons to upload your data here -and make it available to the widest audience possible. -
- --In fact, you can upload your data here as well as to other -resources. It is your data after all. No one can prevent you from -uploading your data to multiple resources. -
- --We recommend uploading to EBI and NCBI resources using our data -conversion tools. It means you only enter data once and make the -process smooth. You can also use our command line data uploader -for bulk uploads! -
-6 How does the public sequence resource work?
--On uploading a sequence with metadata it will automatically be -processed and incorporated into the public pangenome with metadata -using workflows from the High Performance Open Biology Lab defined -here. -
-7 Who uses the public sequence resource?
--The Swiss Institute of Bioinformatics has included this data in -https://covid-19-sparql.expasy.org/ and made it part of Uniprot. -
- --The Pantograph viewer uses PubSeq data for their visualisations. -
- --UTHSC (USA), ESR (New Zealand) and ORNL (USA) use COVID-19 PubSeq data -for monitoring, protein prediction and drug development. -
-8 How can I contribute?
--You can contribute by submitting sequences, updating metadata, submit -issues on our issue tracker, and more importantly add functionality. -See 'How do I change the source code' below. Read through our online -documentation at http://covid19.genenetwork.org/blog as a starting -point. -
-9 Is this about open data?
--All data is published under a Creative Commons 4.0 attribution license -(CC-BY-4.0). You can download the raw and published (GFA/RDF/FASTA) -data and store it for further processing. -
-10 Is this about free software?
--Absolutely. Free software allows for fully reproducible pipelines. You -can take our workflows and data and run it elsewhere! -
-11 How do I upload raw data?
--We are preparing raw sequence data pipelines (fastq and BAM). The -reason is that we want the best data possible for downstream analysis -(including protein prediction and test development). The current -approach where people publish final sequences of SARS-CoV-2 is lacking -because it hides how this sequence was created. For reasons of -reproducible and improved results we want/need to work with the raw -sequence reads (both short reads and long reads) and take alternative -assembly variations into consideration. This is all work in progress. -
-12 How do I change metadata?
--See the http://covid19.genenetwork.org/blog! -
-13 How do I change the work flows?
--Workflows are on github and can be modified. See also the BLOG -http://covid19.genenetwork.org/blog on workflows. -
-14 How do I change the source code?
--Go to our source code repositories, fork/clone the repository, change -something and submit a pull request (PR). That easy! Check out how -many PRs we already merged. -
-15 Should I choose CC-BY or CC0?
--Restrictive data licenses are hampering data sharing and reproducible -research. CC0 is the preferred license because it gives researchers -the most freedom. Since we provide metadata there is no reason for -others not to honour your work. We also provide CC-BY as an option -because we know people like the attribution clause. -
- --In all honesty: we prefer both data and software to be free. -
-16 How do I deal with private data and privacy?
- -17 How do I communicate with you?
--We use a gitter channel you can join. -
-18 Who are the sponsors?
--The main sponsors are listed in the footer. In addition to the time -generously donated by many contributors we also acknowledge Amazon AWS -for donating COVID-19 related compute time. -
-About/FAQ
+Table of Contents
+-
+
- 1. What is the 'public sequence resource' about? +
- 2. Who created the public sequence resource? +
- 3. How does the public sequence resource compare to other data resources? + +
- 4. Why should I upload my data here? +
- 5. Why should I not upload by data here? +
- 6. How does the public sequence resource work? +
- 7. Who uses the public sequence resource? +
- 8. How can I contribute? +
- 9. Is this about open data? +
- 10. Is this about free software? +
- 11. How do I upload raw data? +
- 12. How do I change metadata? +
- 13. How do I change the work flows? +
- 14. How do I change the source code? +
- 15. Should I choose CC-BY or CC0? +
- 16. How do I deal with private data and privacy? +
- 17. How do I communicate with you? +
- 18. Who are the sponsors? +
1 What is the 'public sequence resource' about?
++ The public sequence resource aims to provide a generic and useful + resource for COVID-19 research. The focus is on providing the best + possible sequence data with associated metadata that can be used for + sequence comparison and protein prediction. +
++ We were at the Bioinformatics Community Conference 2020! Have a look at the + video talk + (alternative link) + and the poster. +
+2 Who created the public sequence resource?
++ The public sequence resource is an initiative by bioinformatics and + ontology experts who want to create something agile and useful for the + wider research community. The initiative started at the COVID-19 + biohackathon in April 2020 and is ongoing. The main project drivers + are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Andrea Guarracino + (University of Rome Tor Vergata), Michael Crusoe (Common Workflow + Language), Thomas Liener (consultant, formerly EBI), Erik Garrison + (UCSC) and Jerven Bolleman (Swiss Institute of Bioinformatics). +
+ ++ Notably, as this is a free software initiative, the project represents + major work by hundreds of software developers and ontology and data + wrangling experts. Thank you everyone! +
+3 How does the public sequence resource compare to + other data resources?
++ The short version is that we use state-of-the-art practices in + bioinformatics using agile methods. Unlike the resources from large + institutes we can improve things on a dime and anyone can contribute + to building out this resource! Sequences from GenBank, EBI/ENA and + others are regularly added to PubSeq. We encourage people to everyone + to submit on PubSeq because of its superior live tooling and metadata + support (see the next question). +
+ ++ Importantly: all data is published under either the Creative Commons + 4.0 attribution license or the CC0 “No Rights Reserved” + license which + means it data can be published and workflows can run in public + environments allowing for improved access for research and + reproducible results. This contrasts with some other public resources, + such as GISAID. +
+4 Why should I upload my data here?
+-
+
- We champion truly shareable data without licensing restrictions - with proper + attribution + +
- We provide full metadata support using state-of-the-art ontology's +
- We provide a web-based sequence uploader and a command-line version + for bulk uploads + +
- We provide a live SPARQL end-point for all metadata +
- We provide free data analysis and sequence comparison triggered on data upload +
- We do real work for you, with this link + you can see the last + run took 5.5 hours! + +
- We provide free downloads of all computed output +
- There is no need to set up pipelines and/or compute clusters +
- All workflows get triggered on uploading a new sequence +
- When someone (you?) improves the software/workflows and everyone benefits +
- Your data gets automatically integrated with the Swiss Institure of + Bioinformatics COVID-19 knowledge base + https://covid-19-sparql.expasy.org/ (Elixir + Switzerland) + +
- Your data will be used to develop drug targets +
+ Finally, if you upload your data here we have workflows that output + formatted data suitable for uploading to EBI + resources (and soon + others). Uploading your data here get your data ready for upload to + multiple resources. +
+5 Why should I not upload by data here?
++ Funny question. There are only good reasons to upload your data here + and make it available to the widest audience possible. +
+ ++ In fact, you can upload your data here as well as to other + resources. It is your data after all. No one can prevent you from + uploading your data to multiple resources. +
+ ++ We recommend uploading to EBI and NCBI resources using our data + conversion tools. It means you only enter data once and make the + process smooth. You can also use our command line data uploader + for bulk uploads! +
+6 How does the public sequence resource work?
++ On uploading a sequence with metadata it will automatically be + processed and incorporated into the public pangenome with metadata + using workflows from the High Performance Open Biology Lab defined + here. +
+7 Who uses the public sequence resource?
++ The Swiss Institute of Bioinformatics has included this data in + https://covid-19-sparql.expasy.org/ and made it part + of Uniprot. +
+ ++ The Pantograph viewer uses PubSeq data for their + visualisations. +
+ ++ UTHSC (USA), ESR (New Zealand) and + ORNL (USA) use COVID-19 PubSeq data + for monitoring, protein prediction and drug development. +
+8 How can I contribute?
++ You can contribute by submitting sequences, updating metadata, submit + issues on our issue tracker, and more importantly add functionality. + See 'How do I change the source code' below. Read through our online + documentation at http://covid19.genenetwork.org/blog + as a starting + point. +
+9 Is this about open data?
++ All data is published under a Creative Commons + 4.0 attribution license + (CC-BY-4.0). You can download the raw and published (GFA/RDF/FASTA) + data and store it for further processing. +
+10 Is this about free software?
++ Absolutely. Free software allows for fully reproducible pipelines. You + can take our workflows and data and run it elsewhere! +
+11 How do I upload raw data?
++ We are preparing raw sequence data pipelines (fastq and BAM). The + reason is that we want the best data possible for downstream analysis + (including protein prediction and test development). The current + approach where people publish final sequences of SARS-CoV-2 is lacking + because it hides how this sequence was created. For reasons of + reproducible and improved results we want/need to work with the raw + sequence reads (both short reads and long reads) and take alternative + assembly variations into consideration. This is all work in progress. +
+12 How do I change metadata?
++ See the http://covid19.genenetwork.org/blog! +
+13 How do I change the work flows?
++ Workflows are on github + and can be modified. See also the BLOG + http://covid19.genenetwork.org/blog on workflows. +
+14 How do I change the source code?
++ Go to our source code repositories, + fork/clone the repository, change + something and submit a pull request + (PR). That easy! Check out how + many PRs we already merged. +
+15 Should I choose CC-BY or CC0?
++ Restrictive data licenses are hampering data sharing and reproducible + research. CC0 is the preferred license because it gives researchers + the most freedom. Since we provide metadata there is no reason for + others not to honour your work. We also provide CC-BY as an option + because we know people like the attribution clause. +
+ ++ In all honesty: we prefer both data and software to be free. +
+16 How do I deal with private data and privacy?
+ +17 How do I communicate with you?
++ We use a gitter + channel you can join. +
+18 Who are the sponsors?
++ The main sponsors are listed in the footer. In addition to the time + generously donated by many contributors we also acknowledge Amazon AWS + for donating COVID-19 related compute time. +
+