From e4738edf99cb96214db066079adae021c25bc059 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sun, 24 May 2020 11:16:47 -0500 Subject: Download page --- doc/web/about.org | 15 +- doc/web/download.html | 375 ++++++++++++++++++++++++++++++++++++++++++++++++++ doc/web/download.org | 69 ++++++++++ 3 files changed, 454 insertions(+), 5 deletions(-) create mode 100644 doc/web/download.html create mode 100644 doc/web/download.org (limited to 'doc/web') diff --git a/doc/web/about.org b/doc/web/about.org index fc9d1ff..26b675d 100644 --- a/doc/web/about.org +++ b/doc/web/about.org @@ -27,13 +27,15 @@ sequence comparison and protein prediction. * Who created the public sequence resource? The *public sequence resource* is an initiative by [[https://github.com/arvados/bh20-seq-resource/graphs/contributors][bioinformatics]] and -ontology experts who want to create something agile and useful for -the wider research community. The initiative started at the COVID-19 +ontology experts who want to create something agile and useful for the +wider research community. The initiative started at the COVID-19 biohackathon in April 2020 and is ongoing. The main project drivers are Pjotr Prins (UTHSC), Peter Amstutz (Curii), Michael Crusoe (Common -Workflow Language) and Thomas Liener (consultant, formerly EBI). But -as this is a free software initiative the project represents major -work by hundreds of software developers and ontology and data +Workflow Language), Thomas Liener (consultant, formerly EBI) and +Jerven Bolleman (Swiss Institute of Bioinformatics). + +Notably, as this is a free software initiative, the project represents +major work by hundreds of software developers and ontology and data wrangling experts. Thank you everyone! * How does the public sequence resource compare to other data resources? @@ -62,6 +64,9 @@ public resources, including GISAID. 3. There is no need to set up pipelines and/or compute clusters 4. All workflows get triggered on uploading a new sequence 4. When someone (you?) improves the software/workflows and everyone benefits +4. Your data gets automatically integrated with the Swiss Institure of + Bioinformatics COVID-19 knowledge base + https://covid-19-sparql.expasy.org/ (Elixir Switzerland) Finally, if you upload your data here we have workflows that output formatted data suitable for uploading to EBI resources (and soon diff --git a/doc/web/download.html b/doc/web/download.html new file mode 100644 index 0000000..879e8d4 --- /dev/null +++ b/doc/web/download.html @@ -0,0 +1,375 @@ + + + + + + + +Download + + + + + + +
+

Download

+
+

Table of Contents

+ +
+ +
+

1 FASTA files

+
+

+The public sequence resource provides all uploaded sequences as +FASTA files. They can be referred to from metadata individually. We +also provide a single file FASTA download. +

+
+
+ +
+

2 Metadata

+
+

+Metadata can be downloaded as Turtle RDF as a mergedmetadat.ttl which +can be loaded into any RDF triple-store. We provide a Virtuoso SPARQL +endpoint ourselves which can be queried from +http://sparql.genenetwork.org/sparql/. Query examples can be found in +our BLOG. +

+ +

+The Swiss Institute of Bioinformatics has included this data in +https://covid-19-sparql.expasy.org/ and made it part of Uniprot. +

+ +

+An RDF file that includes the sequences themselves in a variation +graph can be downloaded from below Pangenome RDF format. +

+
+
+ +
+

3 Pangenome

+
+

+Pangenome data is made available in multiple guises. Variation graphs +(VG) provide a succinct encoding of the sequences of many genomes. +

+
+ +
+

3.1 Pangenome GFA format

+
+

+GFA is a standard for graphical fragment assembly and consumed +by tools such as vgtools. +

+
+
+ +
+

3.2 Pangenome in ODGI format

+
+

+ODGI is a format that supports an optimized dynamic genome/graph +implementation. +

+
+
+ +
+

3.3 Pangenome RDF format

+
+

+An RDF file that includes the sequences themselves in a variation +graph can be downloaded from +relabeledSeqs-dedup-relabeledSeqs-dedup.ttl.xz. +

+
+
+ + +
+

3.4 Pangenome Browser format

+
+

+The many JSON files that are named as +results/1/chunk001200.bin1.schematic.json are consumed by the +Pangenome browser. +

+
+
+
+ +
+

4 Log of workflow output

+
+

+Including in below link is a log file of the last workflow runs. +

+
+
+ +
+

5 All files

+ +
+
+
+
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-05-24 Sun 11:11
. +
+ + diff --git a/doc/web/download.org b/doc/web/download.org new file mode 100644 index 0000000..498b132 --- /dev/null +++ b/doc/web/download.org @@ -0,0 +1,69 @@ +#+TITLE: Download +#+AUTHOR: Pjotr Prins + +* Table of Contents :TOC:noexport: + - [[#fasta-files][FASTA files]] + - [[#metadata][Metadata]] + - [[#pangenome][Pangenome]] + - [[#pangenome-gfa-format][Pangenome GFA format]] + - [[#pangenome-in-odgi-format][Pangenome in ODGI format]] + - [[#pangenome-rdf-format][Pangenome RDF format]] + - [[#pangenome-browser-format][Pangenome Browser format]] + - [[#log-of-workflow-output][Log of workflow output]] + - [[#all-files][All files]] + +* FASTA files + +The *public sequence resource* provides all uploaded sequences as +FASTA files. They can be referred to from metadata individually. We +also provide a single file [[https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/relabeledSeqs_dedup.fasta][FASTA download]]. + +* Metadata + +Metadata can be downloaded as [[https://www.w3.org/TR/turtle/][Turtle RDF]] as a [[https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/mergedmetadata.ttl][mergedmetadat.ttl]] which +can be loaded into any RDF triple-store. We provide a Virtuoso SPARQL +endpoint ourselves which can be queried from +http://sparql.genenetwork.org/sparql/. Query examples can be found in +our [[https://github.com/arvados/bh20-seq-resource/blob/master/doc/blog/using-covid-19-pubseq-part1.org][BLOG]]. + +The Swiss Institute of Bioinformatics has included this data in +https://covid-19-sparql.expasy.org/ and made it part of [[https://www.uniprot.org/][Uniprot]]. + +An RDF file that includes the sequences themselves in a variation +graph can be downloaded from below Pangenome RDF format. + +* Pangenome + +Pangenome data is made available in multiple guises. Variation graphs +(VG) provide a succinct encoding of the sequences of many genomes. + +** Pangenome GFA format + +[[https://github.com/GFA-spec/GFA-spec][GFA]] is a standard for graphical fragment assembly and consumed +by tools such as [[https://github.com/vgteam/vg][vgtools]]. + +** Pangenome in ODGI format + +[[https://github.com/vgteam/odgi][ODGI]] is a format that supports an optimised dynamic genome/graph +implementation. + +** Pangenome RDF format + +An RDF file that includes the sequences themselves in a variation +graph can be downloaded from +[[https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/][relabeledSeqs-dedup-relabeledSeqs-dedup.ttl.xz]]. + + +** Pangenome Browser format + +The many JSON files that are named as +[[https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/][results/1/chunk001200.bin1.schematic.json]] are consumed by the +Pangenome browser. + +* Log of workflow output + +Including in below link is a log file of the last workflow runs. + +* All files + +https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/ -- cgit v1.2.3