From 0a23018b8afced6a145d96efbe5bffe86f092cce Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sat, 22 Aug 2020 09:49:11 +0100 Subject: Small text fixes --- doc/blog/using-covid-19-pubseq-part5.org | 4 +- doc/web/about.html | 1522 +++++++++++------------------- doc/web/about.org | 12 +- 3 files changed, 580 insertions(+), 958 deletions(-) (limited to 'doc') diff --git a/doc/blog/using-covid-19-pubseq-part5.org b/doc/blog/using-covid-19-pubseq-part5.org index 78eea66..99c8ebf 100644 --- a/doc/blog/using-covid-19-pubseq-part5.org +++ b/doc/blog/using-covid-19-pubseq-part5.org @@ -23,8 +23,8 @@ The public sequence resource uses multiple data formats listed on the for RDF and semantic web/linked data ontologies. This technology allows for querying data in unprescribed ways - that is, you can formulate your own queries without dealing with a preset model of that -data (so typical of CSV files and SQL tables). Examples of exploring -data are listed [[http://covid19.genenetwork.org/blog?id=using-covid-19-pubseq-part1][here]]. +data (which is how one has to approach CSV files and SQL +tables). Examples of exploring data are listed [[http://covid19.genenetwork.org/blog?id=using-covid-19-pubseq-part1][here]]. In this BLOG we are going to look at the metadata entered on the COVID-19 PubSeq website (or command line client). It is important to diff --git a/doc/web/about.html b/doc/web/about.html index c971a4e..a4ab186 100644 --- a/doc/web/about.html +++ b/doc/web/about.html @@ -1,964 +1,582 @@ +"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - - - - About/FAQ - - - - + + + +About/FAQ + + + +
-

About/FAQ

-
-

Table of Contents

- -
- -
-

1 What is the 'public sequence resource' about?

-
-

- The public sequence resource aims to provide a generic and useful - resource for COVID-19 research. The focus is on providing the best - possible sequence data with associated metadata that can be used for - sequence comparison and protein prediction. -

-

- We were at the Bioinformatics Community Conference 2020! Have a look at the - video talk - (alternative link) - and the poster. -

-
-
- -
-

2 Who created the public sequence resource?

-
-

- The public sequence resource is an initiative by bioinformatics and - ontology experts who want to create something agile and useful for the - wider research community. The initiative started at the COVID-19 - biohackathon in April 2020 and is ongoing. The main project drivers - are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Andrea Guarracino - (University of Rome Tor Vergata), Michael Crusoe (Common Workflow - Language), Thomas Liener (consultant, formerly EBI), Erik Garrison - (UCSC) and Jerven Bolleman (Swiss Institute of Bioinformatics). -

- -

- Notably, as this is a free software initiative, the project represents - major work by hundreds of software developers and ontology and data - wrangling experts. Thank you everyone! -

-
-
- -
-

3 How does the public sequence resource compare to - other data resources?

-
-

- The short version is that we use state-of-the-art practices in - bioinformatics using agile methods. Unlike the resources from large - institutes we can improve things on a dime and anyone can contribute - to building out this resource! Sequences from GenBank, EBI/ENA and - others are regularly added to PubSeq. We encourage people to everyone - to submit on PubSeq because of its superior live tooling and metadata - support (see the next question). -

- -

- Importantly: all data is published under either the Creative Commons - 4.0 attribution license or the CC0 “No Rights Reserved” - license which - means it data can be published and workflows can run in public - environments allowing for improved access for research and - reproducible results. This contrasts with some other public resources, - such as GISAID. -

-
-
- -
-

4 Why should I upload my data here?

-
-
    -
  1. We champion truly shareable data without licensing restrictions - with proper - attribution -
  2. -
  3. We provide full metadata support using state-of-the-art ontology's
  4. -
  5. We provide a web-based sequence uploader and a command-line version - for bulk uploads -
  6. -
  7. We provide a live SPARQL end-point for all metadata
  8. -
  9. We provide free data analysis and sequence comparison triggered on data upload
  10. -
  11. We do real work for you, with this link - you can see the last - run took 5.5 hours! -
  12. -
  13. We provide free downloads of all computed output
  14. -
  15. There is no need to set up pipelines and/or compute clusters
  16. -
  17. All workflows get triggered on uploading a new sequence
  18. -
  19. When someone (you?) improves the software/workflows and everyone benefits
  20. -
  21. Your data gets automatically integrated with the Swiss Institure of - Bioinformatics COVID-19 knowledge base - https://covid-19-sparql.expasy.org/ (Elixir - Switzerland) -
  22. -
  23. Your data will be used to develop drug targets
  24. -
- -

- Finally, if you upload your data here we have workflows that output - formatted data suitable for uploading to EBI - resources (and soon - others). Uploading your data here get your data ready for upload to - multiple resources. -

-
-
- -
-

5 Why should I not upload by data here?

-
-

- Funny question. There are only good reasons to upload your data here - and make it available to the widest audience possible. -

- -

- In fact, you can upload your data here as well as to other - resources. It is your data after all. No one can prevent you from - uploading your data to multiple resources. -

- -

- We recommend uploading to EBI and NCBI resources using our data - conversion tools. It means you only enter data once and make the - process smooth. You can also use our command line data uploader - for bulk uploads! -

-
-
- -
-

6 How does the public sequence resource work?

-
-

- On uploading a sequence with metadata it will automatically be - processed and incorporated into the public pangenome with metadata - using workflows from the High Performance Open Biology Lab defined - here. -

-
-
- -
-

7 Who uses the public sequence resource?

-
-

- The Swiss Institute of Bioinformatics has included this data in - https://covid-19-sparql.expasy.org/ and made it part - of Uniprot. -

- -

- The Pantograph viewer uses PubSeq data for their - visualisations. -

- -

- UTHSC (USA), ESR (New Zealand) and - ORNL (USA) use COVID-19 PubSeq data - for monitoring, protein prediction and drug development. -

-
-
- -
-

8 How can I contribute?

-
-

- You can contribute by submitting sequences, updating metadata, submit - issues on our issue tracker, and more importantly add functionality. - See 'How do I change the source code' below. Read through our online - documentation at http://covid19.genenetwork.org/blog - as a starting - point. -

-
-
- -
-

9 Is this about open data?

-
-

- All data is published under a Creative Commons - 4.0 attribution license - (CC-BY-4.0). You can download the raw and published (GFA/RDF/FASTA) - data and store it for further processing. -

-
-
- -
-

10 Is this about free software?

-
-

- Absolutely. Free software allows for fully reproducible pipelines. You - can take our workflows and data and run it elsewhere! -

-
-
- -
-

11 How do I upload raw data?

-
-

- We are preparing raw sequence data pipelines (fastq and BAM). The - reason is that we want the best data possible for downstream analysis - (including protein prediction and test development). The current - approach where people publish final sequences of SARS-CoV-2 is lacking - because it hides how this sequence was created. For reasons of - reproducible and improved results we want/need to work with the raw - sequence reads (both short reads and long reads) and take alternative - assembly variations into consideration. This is all work in progress. -

-
-
- -
-

12 How do I change metadata?

- -
- -
-

13 How do I change the work flows?

-
-

- Workflows are on github - and can be modified. See also the BLOG - http://covid19.genenetwork.org/blog on workflows. -

-
-
- -
-

14 How do I change the source code?

-
-

- Go to our source code repositories, - fork/clone the repository, change - something and submit a pull request - (PR). That easy! Check out how - many PRs we already merged. -

-
-
- -
-

15 Should I choose CC-BY or CC0?

-
-

- Restrictive data licenses are hampering data sharing and reproducible - research. CC0 is the preferred license because it gives researchers - the most freedom. Since we provide metadata there is no reason for - others not to honour your work. We also provide CC-BY as an option - because we know people like the attribution clause. -

- -

- In all honesty: we prefer both data and software to be free. -

-
-
- -
-

16 How do I deal with private data and privacy?

-
-

- A public sequence resource is about public data. Metadata can refer to - private data. You can use your own (anonymous) identifiers. We also - plan to combine identifiers with clinical data stored securely at - REDCap. See the relevant tracker for more information and - contributing. -

-
-
- -
-

17 How do I communicate with you?

-
-

- We use a gitter - channel you can join. -

-
-
- -
-

18 Who are the sponsors?

-
-

- The main sponsors are listed in the footer. In addition to the time - generously donated by many contributors we also acknowledge Amazon AWS - for donating COVID-19 related compute time. -

-
-
+

About/FAQ

+
+

Table of Contents

+ +
+ +
+

1 What is the 'public sequence resource' about?

+
+

+The public sequence resource aims to provide a generic and useful +resource for COVID-19 research. The focus is on providing the best +possible sequence data with associated metadata that can be used for +sequence comparison and protein prediction. +

+
+
+ +
+

2 Presentations

+
+

+We presented at the BOSC 2020 Have a look at the video (alternative +link) and the poster. +

+
+
+ +
+

3 Who created the public sequence resource?

+
+

+The public sequence resource is an initiative by bioinformatics and +ontology experts who want to create something agile and useful for the +wider research community. The initiative started at the COVID-19 +biohackathon in April 2020 and is ongoing. The main project drivers +are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Andrea Guarracino +(University of Rome Tor Vergata), Michael Crusoe (Common Workflow +Language), Thomas Liener (consultant, formerly EBI), Erik Garrison +(UCSC) and Jerven Bolleman (Swiss Institute of Bioinformatics). +

+ +

+Notably, as this is a free software initiative, the project represents +major work by hundreds of software developers and ontology and data +wrangling experts. Thank you everyone! +

+
+
+ +
+

4 How does the public sequence resource compare to other data resources?

+
+

+The short version is that we use state-of-the-art practices in +bioinformatics using agile methods. Unlike the resources from large +institutes we can improve things on a dime and anyone can contribute +to building out this resource! Sequences from GenBank, EBI/ENA and +others are regularly added to PubSeq. We encourage people to everyone +to submit on PubSeq because of its superior live tooling and metadata +support (see the next question). +

+ +

+Importantly: all data is published under either the Creative Commons +4.0 attribution license or the CC0 “No Rights Reserved” license which +means it data can be published and workflows can run in public +environments allowing for improved access for research and +reproducible results. This contrasts with some other public resources, +such as GISAID. +

+
+
+ +
+

5 Why should I upload my data here?

+
+
    +
  1. We champion truly shareable data without licensing restrictions - with proper +attribution
  2. +
  3. We provide full metadata support using state-of-the-art ontology's
  4. +
  5. We provide a web-based sequence uploader and a command-line version +for bulk uploads
  6. +
  7. We provide a live SPARQL end-point for all metadata
  8. +
  9. We provide free data analysis and sequence comparison triggered on data upload
  10. +
  11. We do real work for you, with this link you can see the last +run took 5.5 hours!
  12. +
  13. We provide free downloads of all computed output
  14. +
  15. There is no need to set up pipelines and/or compute clusters
  16. +
  17. All workflows get triggered on uploading a new sequence
  18. +
  19. When someone (you?) improves the software/workflows and everyone benefits
  20. +
  21. Your data gets automatically integrated with the Swiss Institure of +Bioinformatics COVID-19 knowledge base +https://covid-19-sparql.expasy.org/ (Elixir Switzerland)
  22. +
  23. Your data will be used to develop drug targets
  24. +
+ +

+Finally, if you upload your data here we have workflows that output +formatted data suitable for uploading to EBI resources (and soon +others). Uploading your data here get your data ready for upload to +multiple resources. +

+
+
+ +
+

6 Why should I not upload by data here?

+
+

+Funny question. There are only good reasons to upload your data here +and make it available to the widest audience possible. +

+ +

+In fact, you can upload your data here as well as to other +resources. It is your data after all. No one can prevent you from +uploading your data to multiple resources. +

+ +

+We recommend uploading to EBI and NCBI resources using our data +conversion tools. It means you only enter data once and make the +process smooth. You can also use our command line data uploader +for bulk uploads! +

+
+
+ +
+

7 How does the public sequence resource work?

+
+

+On uploading a sequence with metadata it will automatically be +processed and incorporated into the public pangenome with metadata +using workflows from the High Performance Open Biology Lab defined +here. +

+
+
+ +
+

8 Who uses the public sequence resource?

+
+

+The Swiss Institute of Bioinformatics has included this data in +https://covid-19-sparql.expasy.org/ and made it part of Uniprot. +

+ +

+The Pantograph viewer uses PubSeq data for their visualisations. +

+ +

+UTHSC (USA), ESR (New Zealand) and ORNL (USA) use COVID-19 PubSeq data +for monitoring, protein prediction and drug development. +

+
+
+ +
+

9 How can I contribute?

+
+

+You can contribute by submitting sequences, updating metadata, submit +issues on our issue tracker, and more importantly add functionality. +See 'How do I change the source code' below. Read through our online +documentation at http://covid19.genenetwork.org/blog as a starting +point. +

+
+
+ +
+

10 Is this about open data?

+
+

+All data is published under a Creative Commons 4.0 attribution license +(CC-BY-4.0). You can download the raw and published (GFA/RDF/FASTA) +data and store it for further processing. +

+
+
+ +
+

11 Is this about free software?

+
+

+Absolutely. Free software allows for fully reproducible pipelines. You +can take our workflows and data and run it elsewhere! +

+
+
+ +
+

12 How do I upload raw data?

+
+

+We are preparing raw sequence data pipelines (fastq and BAM). The +reason is that we want the best data possible for downstream analysis +(including protein prediction and test development). The current +approach where people publish final sequences of SARS-CoV-2 is lacking +because it hides how this sequence was created. For reasons of +reproducible and improved results we want/need to work with the raw +sequence reads (both short reads and long reads) and take alternative +assembly variations into consideration. This is all work in progress. +

+
+
+ +
+

13 How do I change metadata?

+ +
+ +
+

14 How do I change the work flows?

+
+

+Workflows are on github and can be modified. See also the BLOG +http://covid19.genenetwork.org/blog on workflows. +

+
+
+ +
+

15 How do I change the source code?

+
+

+Go to our source code repositories, fork/clone the repository, change +something and submit a pull request (PR). That easy! Check out how +many PRs we already merged. +

+
+
+ +
+

16 Should I choose CC-BY or CC0?

+
+

+Restrictive data licenses are hampering data sharing and reproducible +research. CC0 is the preferred license because it gives researchers +the most freedom. Since we provide metadata there is no reason for +others not to honour your work. We also provide CC-BY as an option +because we know people like the attribution clause. +

+ +

+In all honesty: we prefer both data and software to be free. +

+
+
+ +
+

17 Are there also variant in the RDF databases? *

+
+

+We do output a RDF file with the pangenome built in, and you can parse it because it has variants implicitly. +

+ +

+We are also writing tools to generate VCF files directly from the pangenome. +

+
+
+ +
+

18 How do I deal with private data and privacy?

+
+

+A public sequence resource is about public data. Metadata can refer to +private data. You can use your own (anonymous) identifiers. We also +plan to combine identifiers with clinical data stored securely at +REDCap. See the relevant tracker for more information and contributing. +

+
+
+ +
+

19 Do you have any checks or concerns if human sequence accidentally submitted to your service as part of a fastq? *

+
+

+We are planning to remove reads that match the human reference. +

+
+
+ +
+

20 Does PubSeq support only SARS-CoV-2 data? *

+
+

+To date, PubSeq is a resource specific to SARS-CoV-2, but we are designing it to be able to support other species in the future. +

+
+
+ + +
+

21 How do I communicate with you?

+
+

+We use a gitter channel you can join. +

+
+
+ +
+

22 Who are the sponsors?

+
+

+The main sponsors are listed in the footer. In addition to the time +generously donated by many contributors we also acknowledge Amazon AWS +for donating COVID-19 related compute time. +

+
+
-
- Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs - org-mode and a healthy dose of Lisp!
Modified 2020-07-18 Sat 03:27
. +
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-08-22 Sat 03:48
.
diff --git a/doc/web/about.org b/doc/web/about.org index 29a80bf..5f864e9 100644 --- a/doc/web/about.org +++ b/doc/web/about.org @@ -3,6 +3,7 @@ * Table of Contents :TOC:noexport: - [[#what-is-the-public-sequence-resource-about][What is the 'public sequence resource' about?]] + - [[#presentations][Presentations]] - [[#who-created-the-public-sequence-resource][Who created the public sequence resource?]] - [[#how-does-the-public-sequence-resource-compare-to-other-data-resources][How does the public sequence resource compare to other data resources?]] - [[#why-should-i-upload-my-data-here][Why should I upload my data here?]] @@ -17,10 +18,10 @@ - [[#how-do-i-change-the-work-flows][How do I change the work flows?]] - [[#how-do-i-change-the-source-code][How do I change the source code?]] - [[#should-i-choose-cc-by-or-cc0][Should I choose CC-BY or CC0?]] - - [[#are-there-also-variant-in-the-RDF-databases]][Are there also variant in the RDF databases?] + - [[#are-there-also-variant-in-the-rdf-databases-][Are there also variant in the RDF databases? *]] - [[#how-do-i-deal-with-private-data-and-privacy][How do I deal with private data and privacy?]] - - [[#do-you-have-any-checks-or-concerns-if-human-sequence-accidentally-submitted-to-your-service-as-part-of-a-fastq][Do you have any checks or concerns if human sequence accidentally submitted to your service as part of a fastq?] - - [[#does-PubSeq-support-only-SARS-CoV-2=data]][Does PubSeq support only SARS-CoV-2 data?] + - [[#do-you-have-any-checks-or-concerns-if-human-sequence-accidentally-submitted-to-your-service-as-part-of-a-fastq-][Do you have any checks or concerns if human sequence accidentally submitted to your service as part of a fastq? *]] + - [[#does-pubseq-support-only-sars-cov-2-data-][Does PubSeq support only SARS-CoV-2 data? *]] - [[#how-do-i-communicate-with-you][How do I communicate with you?]] - [[#who-are-the-sponsors][Who are the sponsors?]] @@ -31,7 +32,10 @@ resource for COVID-19 research. The focus is on providing the best possible sequence data with associated metadata that can be used for sequence comparison and protein prediction. -We were at the *Bioinformatics Community Conference 2020*! Have a look at the [[https://bcc2020.sched.com/event/coLw]][video talk] ([[https://drive.google.com/file/d/1skXHwVKM_gl73-_4giYIOQ1IlC5X5uBo/view?usp=sharing]][alternative link]) and the [[https://drive.google.com/file/d/1vyEgfvSqhM9yIwWZ6Iys-QxhxtVxPSdp/view?usp=sharing]][poster]. +* Presentations + +We presented at the BOSC 2020 Have a look at the [[https://bcc2020.sched.com/event/coLw][video]] ([[https://drive.google.com/file/d/1skXHwVKM_gl73-_4giYIOQ1IlC5X5uBo/view?usp=sharing][alternative +link]]) and the [[https://drive.google.com/file/d/1vyEgfvSqhM9yIwWZ6Iys-QxhxtVxPSdp/view?usp=sharing][poster]]. * Who created the public sequence resource? -- cgit v1.2.3