About/FAQ
-Table of Contents
--
-
- 1. What is the 'public sequence resource' about? -
- 2. Who created the public sequence resource? -
- 3. How does the public sequence resource compare to other data resources? - -
- 4. Why should I upload my data here? -
- 5. Why should I not upload by data here? -
- 6. How does the public sequence resource work? -
- 7. Who uses the public sequence resource? -
- 8. How can I contribute? -
- 9. Is this about open data? -
- 10. Is this about free software? -
- 11. How do I upload raw data? -
- 12. How do I change metadata? -
- 13. How do I change the work flows? -
- 14. How do I change the source code? -
- 15. Should I choose CC-BY or CC0? -
- 16. How do I deal with private data and privacy? -
- 17. How do I communicate with you? -
- 18. Who are the sponsors? -
1 What is the 'public sequence resource' about?
-- The public sequence resource aims to provide a generic and useful - resource for COVID-19 research. The focus is on providing the best - possible sequence data with associated metadata that can be used for - sequence comparison and protein prediction. -
-- We were at the Bioinformatics Community Conference 2020! Have a look at the - video talk - (alternative link) - and the poster. -
-2 Who created the public sequence resource?
-- The public sequence resource is an initiative by bioinformatics and - ontology experts who want to create something agile and useful for the - wider research community. The initiative started at the COVID-19 - biohackathon in April 2020 and is ongoing. The main project drivers - are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Andrea Guarracino - (University of Rome Tor Vergata), Michael Crusoe (Common Workflow - Language), Thomas Liener (consultant, formerly EBI), Erik Garrison - (UCSC) and Jerven Bolleman (Swiss Institute of Bioinformatics). -
- -- Notably, as this is a free software initiative, the project represents - major work by hundreds of software developers and ontology and data - wrangling experts. Thank you everyone! -
-3 How does the public sequence resource compare to - other data resources?
-- The short version is that we use state-of-the-art practices in - bioinformatics using agile methods. Unlike the resources from large - institutes we can improve things on a dime and anyone can contribute - to building out this resource! Sequences from GenBank, EBI/ENA and - others are regularly added to PubSeq. We encourage people to everyone - to submit on PubSeq because of its superior live tooling and metadata - support (see the next question). -
- -- Importantly: all data is published under either the Creative Commons - 4.0 attribution license or the CC0 “No Rights Reserved” - license which - means it data can be published and workflows can run in public - environments allowing for improved access for research and - reproducible results. This contrasts with some other public resources, - such as GISAID. -
-4 Why should I upload my data here?
--
-
- We champion truly shareable data without licensing restrictions - with proper - attribution - -
- We provide full metadata support using state-of-the-art ontology's -
- We provide a web-based sequence uploader and a command-line version - for bulk uploads - -
- We provide a live SPARQL end-point for all metadata -
- We provide free data analysis and sequence comparison triggered on data upload -
- We do real work for you, with this link - you can see the last - run took 5.5 hours! - -
- We provide free downloads of all computed output -
- There is no need to set up pipelines and/or compute clusters -
- All workflows get triggered on uploading a new sequence -
- When someone (you?) improves the software/workflows and everyone benefits -
- Your data gets automatically integrated with the Swiss Institure of - Bioinformatics COVID-19 knowledge base - https://covid-19-sparql.expasy.org/ (Elixir - Switzerland) - -
- Your data will be used to develop drug targets -
- Finally, if you upload your data here we have workflows that output - formatted data suitable for uploading to EBI - resources (and soon - others). Uploading your data here get your data ready for upload to - multiple resources. -
-5 Why should I not upload by data here?
-- Funny question. There are only good reasons to upload your data here - and make it available to the widest audience possible. -
- -- In fact, you can upload your data here as well as to other - resources. It is your data after all. No one can prevent you from - uploading your data to multiple resources. -
- -- We recommend uploading to EBI and NCBI resources using our data - conversion tools. It means you only enter data once and make the - process smooth. You can also use our command line data uploader - for bulk uploads! -
-6 How does the public sequence resource work?
-- On uploading a sequence with metadata it will automatically be - processed and incorporated into the public pangenome with metadata - using workflows from the High Performance Open Biology Lab defined - here. -
-7 Who uses the public sequence resource?
-- The Swiss Institute of Bioinformatics has included this data in - https://covid-19-sparql.expasy.org/ and made it part - of Uniprot. -
- -- The Pantograph viewer uses PubSeq data for their - visualisations. -
- -- UTHSC (USA), ESR (New Zealand) and - ORNL (USA) use COVID-19 PubSeq data - for monitoring, protein prediction and drug development. -
-8 How can I contribute?
-- You can contribute by submitting sequences, updating metadata, submit - issues on our issue tracker, and more importantly add functionality. - See 'How do I change the source code' below. Read through our online - documentation at http://covid19.genenetwork.org/blog - as a starting - point. -
-9 Is this about open data?
-- All data is published under a Creative Commons - 4.0 attribution license - (CC-BY-4.0). You can download the raw and published (GFA/RDF/FASTA) - data and store it for further processing. -
-10 Is this about free software?
-- Absolutely. Free software allows for fully reproducible pipelines. You - can take our workflows and data and run it elsewhere! -
-11 How do I upload raw data?
-- We are preparing raw sequence data pipelines (fastq and BAM). The - reason is that we want the best data possible for downstream analysis - (including protein prediction and test development). The current - approach where people publish final sequences of SARS-CoV-2 is lacking - because it hides how this sequence was created. For reasons of - reproducible and improved results we want/need to work with the raw - sequence reads (both short reads and long reads) and take alternative - assembly variations into consideration. This is all work in progress. -
-12 How do I change metadata?
-- See the http://covid19.genenetwork.org/blog! -
-13 How do I change the work flows?
-- Workflows are on github - and can be modified. See also the BLOG - http://covid19.genenetwork.org/blog on workflows. -
-14 How do I change the source code?
-- Go to our source code repositories, - fork/clone the repository, change - something and submit a pull request - (PR). That easy! Check out how - many PRs we already merged. -
-15 Should I choose CC-BY or CC0?
-- Restrictive data licenses are hampering data sharing and reproducible - research. CC0 is the preferred license because it gives researchers - the most freedom. Since we provide metadata there is no reason for - others not to honour your work. We also provide CC-BY as an option - because we know people like the attribution clause. -
- -- In all honesty: we prefer both data and software to be free. -
-16 How do I deal with private data and privacy?
- -17 How do I communicate with you?
-- We use a gitter - channel you can join. -
-18 Who are the sponsors?
-- The main sponsors are listed in the footer. In addition to the time - generously donated by many contributors we also acknowledge Amazon AWS - for donating COVID-19 related compute time. -
-About/FAQ
+Table of Contents
+-
+
- 1. What is the 'public sequence resource' about? +
- 2. Presentations +
- 3. Who created the public sequence resource? +
- 4. How does the public sequence resource compare to other data resources? +
- 5. Why should I upload my data here? +
- 6. Why should I not upload by data here? +
- 7. How does the public sequence resource work? +
- 8. Who uses the public sequence resource? +
- 9. How can I contribute? +
- 10. Is this about open data? +
- 11. Is this about free software? +
- 12. How do I upload raw data? +
- 13. How do I change metadata? +
- 14. How do I change the work flows? +
- 15. How do I change the source code? +
- 16. Should I choose CC-BY or CC0? +
- 17. Are there also variant in the RDF databases? * +
- 18. How do I deal with private data and privacy? +
- 19. Do you have any checks or concerns if human sequence accidentally submitted to your service as part of a fastq? * +
- 20. Does PubSeq support only SARS-CoV-2 data? * +
- 21. How do I communicate with you? +
- 22. Who are the sponsors? +
1 What is the 'public sequence resource' about?
++The public sequence resource aims to provide a generic and useful +resource for COVID-19 research. The focus is on providing the best +possible sequence data with associated metadata that can be used for +sequence comparison and protein prediction. +
+2 Presentations
++We presented at the BOSC 2020 Have a look at the video (alternative +link) and the poster. +
+3 Who created the public sequence resource?
++The public sequence resource is an initiative by bioinformatics and +ontology experts who want to create something agile and useful for the +wider research community. The initiative started at the COVID-19 +biohackathon in April 2020 and is ongoing. The main project drivers +are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Andrea Guarracino +(University of Rome Tor Vergata), Michael Crusoe (Common Workflow +Language), Thomas Liener (consultant, formerly EBI), Erik Garrison +(UCSC) and Jerven Bolleman (Swiss Institute of Bioinformatics). +
+ ++Notably, as this is a free software initiative, the project represents +major work by hundreds of software developers and ontology and data +wrangling experts. Thank you everyone! +
+4 How does the public sequence resource compare to other data resources?
++The short version is that we use state-of-the-art practices in +bioinformatics using agile methods. Unlike the resources from large +institutes we can improve things on a dime and anyone can contribute +to building out this resource! Sequences from GenBank, EBI/ENA and +others are regularly added to PubSeq. We encourage people to everyone +to submit on PubSeq because of its superior live tooling and metadata +support (see the next question). +
+ ++Importantly: all data is published under either the Creative Commons +4.0 attribution license or the CC0 “No Rights Reserved” license which +means it data can be published and workflows can run in public +environments allowing for improved access for research and +reproducible results. This contrasts with some other public resources, +such as GISAID. +
+5 Why should I upload my data here?
+-
+
- We champion truly shareable data without licensing restrictions - with proper +attribution +
- We provide full metadata support using state-of-the-art ontology's +
- We provide a web-based sequence uploader and a command-line version +for bulk uploads +
- We provide a live SPARQL end-point for all metadata +
- We provide free data analysis and sequence comparison triggered on data upload +
- We do real work for you, with this link you can see the last +run took 5.5 hours! +
- We provide free downloads of all computed output +
- There is no need to set up pipelines and/or compute clusters +
- All workflows get triggered on uploading a new sequence +
- When someone (you?) improves the software/workflows and everyone benefits +
- Your data gets automatically integrated with the Swiss Institure of +Bioinformatics COVID-19 knowledge base +https://covid-19-sparql.expasy.org/ (Elixir Switzerland) +
- Your data will be used to develop drug targets +
+Finally, if you upload your data here we have workflows that output +formatted data suitable for uploading to EBI resources (and soon +others). Uploading your data here get your data ready for upload to +multiple resources. +
+6 Why should I not upload by data here?
++Funny question. There are only good reasons to upload your data here +and make it available to the widest audience possible. +
+ ++In fact, you can upload your data here as well as to other +resources. It is your data after all. No one can prevent you from +uploading your data to multiple resources. +
+ ++We recommend uploading to EBI and NCBI resources using our data +conversion tools. It means you only enter data once and make the +process smooth. You can also use our command line data uploader +for bulk uploads! +
+7 How does the public sequence resource work?
++On uploading a sequence with metadata it will automatically be +processed and incorporated into the public pangenome with metadata +using workflows from the High Performance Open Biology Lab defined +here. +
+8 Who uses the public sequence resource?
++The Swiss Institute of Bioinformatics has included this data in +https://covid-19-sparql.expasy.org/ and made it part of Uniprot. +
+ ++The Pantograph viewer uses PubSeq data for their visualisations. +
+ ++UTHSC (USA), ESR (New Zealand) and ORNL (USA) use COVID-19 PubSeq data +for monitoring, protein prediction and drug development. +
+9 How can I contribute?
++You can contribute by submitting sequences, updating metadata, submit +issues on our issue tracker, and more importantly add functionality. +See 'How do I change the source code' below. Read through our online +documentation at http://covid19.genenetwork.org/blog as a starting +point. +
+10 Is this about open data?
++All data is published under a Creative Commons 4.0 attribution license +(CC-BY-4.0). You can download the raw and published (GFA/RDF/FASTA) +data and store it for further processing. +
+11 Is this about free software?
++Absolutely. Free software allows for fully reproducible pipelines. You +can take our workflows and data and run it elsewhere! +
+12 How do I upload raw data?
++We are preparing raw sequence data pipelines (fastq and BAM). The +reason is that we want the best data possible for downstream analysis +(including protein prediction and test development). The current +approach where people publish final sequences of SARS-CoV-2 is lacking +because it hides how this sequence was created. For reasons of +reproducible and improved results we want/need to work with the raw +sequence reads (both short reads and long reads) and take alternative +assembly variations into consideration. This is all work in progress. +
+13 How do I change metadata?
++See the http://covid19.genenetwork.org/blog! +
+14 How do I change the work flows?
++Workflows are on github and can be modified. See also the BLOG +http://covid19.genenetwork.org/blog on workflows. +
+15 How do I change the source code?
++Go to our source code repositories, fork/clone the repository, change +something and submit a pull request (PR). That easy! Check out how +many PRs we already merged. +
+16 Should I choose CC-BY or CC0?
++Restrictive data licenses are hampering data sharing and reproducible +research. CC0 is the preferred license because it gives researchers +the most freedom. Since we provide metadata there is no reason for +others not to honour your work. We also provide CC-BY as an option +because we know people like the attribution clause. +
+ ++In all honesty: we prefer both data and software to be free. +
+17 Are there also variant in the RDF databases? *
++We do output a RDF file with the pangenome built in, and you can parse it because it has variants implicitly. +
+ ++We are also writing tools to generate VCF files directly from the pangenome. +
+18 How do I deal with private data and privacy?
+ +19 Do you have any checks or concerns if human sequence accidentally submitted to your service as part of a fastq? *
++We are planning to remove reads that match the human reference. +
+20 Does PubSeq support only SARS-CoV-2 data? *
++To date, PubSeq is a resource specific to SARS-CoV-2, but we are designing it to be able to support other species in the future. +
+21 How do I communicate with you?
++We use a gitter channel you can join. +
+22 Who are the sponsors?
++The main sponsors are listed in the footer. In addition to the time +generously donated by many contributors we also acknowledge Amazon AWS +for donating COVID-19 related compute time. +
+