From 5f36706b7c9dc1786e47848f0ce8aabd4e7ab851 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sat, 30 May 2020 10:46:28 -0500 Subject: BLOG: checkout output --- doc/blog/using-covid-19-pubseq-part3.html | 127 ++++++++++++++++-------------- doc/blog/using-covid-19-pubseq-part3.org | 8 ++ doc/web/download.html | 99 ++++++++++++----------- doc/web/download.org | 5 +- 4 files changed, 133 insertions(+), 106 deletions(-) (limited to 'doc') diff --git a/doc/blog/using-covid-19-pubseq-part3.html b/doc/blog/using-covid-19-pubseq-part3.html index 6838bc7..4132784 100644 --- a/doc/blog/using-covid-19-pubseq-part3.html +++ b/doc/blog/using-covid-19-pubseq-part3.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- +Work in progress! @@ -291,8 +292,8 @@ for the JavaScript code in this tag.
The COVID-19 PubSeq allows you to upload your SARS-Cov-2 strains to a @@ -302,8 +303,8 @@ upload. Read the ABOUT page for more information.
To upload a sequence in the web upload page hit the browse button and @@ -331,8 +332,8 @@ an improved pangenome.
The web upload page contains fields for adding metadata. Metadata is @@ -358,12 +359,12 @@ the web form. Here we add some extra information.
This is a string field that defines a unique sample identifier by the @@ -381,8 +382,8 @@ Here we add the GenBank ID MT536190.1.
Estimated collection date. The GenBank page says April 6, 2020. @@ -390,8 +391,8 @@ Estimated collection date. The GenBank page says April 6, 2020.
A search on wikidata says Los Angelos is @@ -400,8 +401,8 @@ A search on wikidata says Los Angelos is
GenBank entry says Illumina, so we can fill that in @@ -409,8 +410,8 @@ GenBank entry says Illumina, so we can fill that in
GenBank entry says 'Lamers,S., Nolan,D.J., Rose,R., Cross,S., Moraga @@ -421,16 +422,16 @@ Freehan,A. and Garcia-Diaz,J.', so we can fill that in.
All other fields are optional. But let's see what we can add.
Sadly, not much is known about the host from GenBank. A little @@ -444,8 +445,8 @@ did to the person and what the person was like (say age group).
We can fill that in. @@ -453,8 +454,8 @@ We can fill that in.
We have that: nasopharyngeal swab @@ -462,8 +463,8 @@ We have that: nasopharyngeal swab
Genbank which is http://identifiers.org/insdc/MT536190.1#sequence. @@ -472,8 +473,8 @@ Note we plug in our own identifier MT536190.1.
SARS-CoV-2/human/USA/LA-BIE-070/2020 @@ -483,8 +484,8 @@ SARS-CoV-2/human/USA/LA-BIE-070/2020
Once you have the sequence and the metadata together, hit @@ -492,10 +493,22 @@ the 'Add to Pangenome' button. The data will be checked, submitted and the workflows should kick in!
+The current pipeline takes 5.5 hours to complete! Once it completes +the updated data can be checked on the DOWNLOAD page. After completion +of above output this SPARQL query shows some of the metadata we put +in. +
+We got an error saying: {"stem": "http://www.wikidata.org/entity/",… which means that our location field was not formed correctly! After @@ -509,7 +522,7 @@ submit button.
The public sequence resource provides all uploaded sequences as
@@ -292,8 +295,8 @@ also provide a single file
-
Metadata can be downloaded as Turtle RDF as a mergedmetadat.ttl which
@@ -315,8 +318,8 @@ graph can be downloaded from below Pangenome RDF format.
Pangenome data is made available in multiple guises. Variation graphs
@@ -324,8 +327,8 @@ Pangenome data is made available in multiple guises. Variation graphs
ODGI is a format that supports an optimised dynamic genome/graph
@@ -344,8 +347,8 @@ implementation.
An RDF file that includes the sequences themselves in a variation
@@ -356,8 +359,8 @@ graph can be downloaded from
The many JSON files that are named as
@@ -368,8 +371,8 @@ Pangenome browser.
Including in below link is a log file of the last workflow runs.
@@ -377,8 +380,8 @@ Including in below link is a log file of the last workflow runs.
https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/
@@ -386,16 +389,16 @@ Including in below link is a log file of the last workflow runs.
We are planning the add the following output (see also
See fastq tracker and BAM tracker.
@@ -403,8 +406,8 @@ See fastq track
See MSA tracker.
@@ -412,8 +415,8 @@ See MSA tracker
See Phylo tracker.
@@ -421,8 +424,8 @@ See Phylo track
We aim to make protein predictions available.
@@ -432,7 +435,7 @@ We aim to make protein predictions available.
3 Metadata
+3 Metadata
4 Pangenome
+4 Pangenome
4.1 Pangenome GFA format
+4.1 Pangenome GFA format
4.2 Pangenome in ODGI format
+4.2 Pangenome in ODGI format
4.3 Pangenome RDF format
+4.3 Pangenome RDF format
4.4 Pangenome Browser format
+4.4 Pangenome Browser format
5 Log of workflow output
+5 Log of workflow output
6 All files
+6 All files
7 Planned
+7 Planned
7.1 Raw sequence data
+7.1 Raw sequence data
7.2 Multiple Sequence Alignment (MSA)
+7.2 Multiple Sequence Alignment (MSA)
7.3 Phylogenetic tree
+7.3 Phylogenetic tree
7.4 Protein prediction
+7.4 Protein prediction
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-05-29 Fri 08:27.
+
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-05-30 Sat 10:40.