From c3bbd48601cdb4bec510db72bd2296724874f4f3 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sun, 24 May 2020 10:31:24 -0500 Subject: Display About/FAQ --- doc/web/about.html | 462 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 462 insertions(+) create mode 100644 doc/web/about.html (limited to 'doc/web/about.html') diff --git a/doc/web/about.html b/doc/web/about.html new file mode 100644 index 0000000..1f8b1a1 --- /dev/null +++ b/doc/web/about.html @@ -0,0 +1,462 @@ + + + + + + + +About/FAQ + + + + + + +
+

About/FAQ

+
+

Table of Contents

+ +
+ +
+

1 What is the 'public sequence resource' about?

+
+

+The public sequence resource aims to provide a generic and useful +resource for COVID-19 research. The focus is on providing the best +possible sequence data with associated metadata that can be used for +sequence comparison and protein prediction. +

+
+
+ +
+

2 Who created the public sequence resource?

+
+

+The public sequence resource is an initiative by bioinformatics and +ontology experts who want to create something agile and useful for +the wider research community. The initiative started at the COVID-19 +biohackathon in April 2020 and is ongoing. The main project drivers +are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Michael Crusoe (Common +Workflow Language) and Thomas Liener (consultant, formerly EBI). But +as this is a free software initiative the project represents major +work by hundreds of software developers and ontology and data +wrangling experts. Thank you everyone! +

+
+
+ +
+

3 How does the public sequence resource compare to other data resources?

+
+

+The short version is that we use state-of-the-art practices in +bioinformatics using agile methods. Unlike the resources from large +institutes we can improve things on a dime and anyone can contribute +to building out this resource! +

+ +

+Importantly: all data is published under the Creative Commons 4.0 +attribution license which means it data can be published and workflows +can run in public environments allowing for improved access for +research and reproducible results. This contrasts with some other +public resources, including GISAID. +

+
+
+ +
+

4 Why should I upload my data here?

+
+
    +
  1. We champion truly shareable data without licensing restrictions - with proper +attribution
  2. +
  3. We provide full metadata support using state-of-the-art ontology's
  4. +
  5. We provide a web-based sequence uploader and a command-line version +for bulk uploads
  6. +
  7. We provide a live SPARQL end-point for all metadata
  8. +
  9. We provide free data analysis and sequence comparison triggered on data upload
  10. +
  11. We provide free downloads of all computed output
  12. +
  13. There is no need to set up pipelines and/or compute clusters
  14. +
  15. All workflows get triggered on uploading a new sequence
  16. +
  17. When someone (you?) improves the software/workflows and everyone benefits
  18. +
+ +

+Finally, if you upload your data here we have workflows that output +formatted data suitable for uploading to EBI resources (and soon +others). Uploading your data here get your data ready for upload to +multiple resources. +

+
+
+ +
+

5 Why should I not upload by data here?

+
+

+Funny question. There is no good reason not to upload your data here! +In fact, you can upload your data here as well as to other +resources. It is your data after all. No one can prevent you from +uploading your data to multiple resources. We recommend uploading to +EBI and NCBI resources. Use our data conversion tools to only enter +data once and make the process smooth. +

+
+
+ +
+

6 How does the public sequence resource work?

+
+

+On uploading a sequence with metadata it will automatically be +processed and incorporated into the public pangenome with metadata +using workflows from the High Performance Open Biology Lab defined +here. +

+
+
+ +
+

7 Is this about open data?

+
+

+All data is published under a Creative Commons 4.0 attribution license +(CC-BY-4.0). You can download the raw and published (GFA/RDF/FASTA) +data and store it for further processing. +

+
+
+ +
+

8 Is this about free software?

+
+

+Absolutely. Free software allows for fully reproducible pipelines. You +can take our workflows and data and run it elsewhere! +

+
+
+ +
+

9 How do I upload raw data?

+
+

+We are preparing raw sequence data pipelines (fastq and BAM). The +reason is that we want the best data possible for downstream analysis +(including protein prediction and test development). The current +approach where people publish final sequences of SARS-CoV-2 is lacking +because it hides how this sequence was created. For reasons of +reproducible and improved results we want/need to work with the raw +sequence reads (both short reads and long reads) and take alternative +assembly variations into consideration. This is all work in progress. +

+
+
+ +
+

10 How do I change metadata?

+ +
+ +
+

11 How do I change the work flows?

+ +
+ +
+

12 How do I change the source code?

+
+

+Go to our source code repositories, fork/clone the repository, change +something and submit a pull request (PR). That easy! Check out how +many PRs we already merged. +

+
+
+ +
+

13 How do I deal with private data and privacy?

+
+

+A public sequence resource is about public data. Metadata can refer to +private data. You can use your own (anonymous) identifiers. We also +plan to combine identifiers with clinical data stored securely at +REDCap. Contact Pjotr Prins if you want to work on this. +

+
+
+ +
+

14 Who are the sponsors?

+
+

+The main sponsors are listed in the footer. In addition to the time +generously donated by many contributors we also acknowledge Amazon AWS +for donating COVID-19 related compute time. +

+
+
+
+
+
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-05-24 Sun 10:26
. +
+ + -- cgit v1.2.3