Download

1. Workflow runs
2. FASTA files
3. Metadata
4. Pangenome
5. Log of workflow output
6. All files
7. Planned
8. Source code
9. Citing PubSeq

1 Workflow runs

The last runs can be viewed here. If you click on a run you can see the workflows that ran under Processes. Output (also intermediate) is listed under Data collections. All current data is listed here. Note that it takes time for a run to complete and show.

2 FASTA files

The public sequence resource provides all uploaded sequences as FASTA files. They can be referred to from metadata individually. We also provide a single file FASTA download.

3 Metadata

Metadata can be downloaded as Turtle RDF as a mergedmetadat.ttl which can be loaded into any RDF triple-store. We provide a Virtuoso SPARQL endpoint ourselves which can be queried from http://sparql.genenetwork.org/sparql/. Query examples can be found in the DOCS

The Swiss Institute of Bioinformatics has included this data in https://covid-19-sparql.expasy.org/ and made it part of Uniprot.

An RDF file that includes the sequences themselves in a variation graph can be downloaded from below Pangenome RDF format.

4 Pangenome

Pangenome data is made available in multiple guises. Variation graphs (VG) provide a succinct encoding of the sequences of many genomes.

4.1 Pangenome GFA format

GFA is a standard for graphical fragment assembly and consumed by tools such as vgtools.

4.2 Pangenome in ODGI format

ODGI is a format that supports an optimised dynamic genome/graph implementation.

4.3 Pangenome RDF format

An RDF file that includes the sequences themselves in a variation graph can be downloaded from relabeledSeqs-dedup-relabeledSeqs-dedup.ttl.xz.

4.4 Pangenome Browser format

The many JSON files that are named as results/1/chunk001200.bin1.schematic.json are consumed by the Pangenome browser.

5 Log of workflow output

Including in below link is a log file of the last workflow runs.

6 All files

https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/

7 Planned

We are planning the add the following output (see also

7.4 Protein prediction

We aim to make protein predictions available.

8 Source code

All source code for this website and tooling is available from https://github.com/arvados/bh20-seq-resource

9 Citing PubSeq

See the FAQ.