From 5f36706b7c9dc1786e47848f0ce8aabd4e7ab851 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sat, 30 May 2020 10:46:28 -0500 Subject: BLOG: checkout output --- doc/blog/using-covid-19-pubseq-part3.html | 127 ++++++++++++++++-------------- doc/blog/using-covid-19-pubseq-part3.org | 8 ++ doc/web/download.html | 99 ++++++++++++----------- doc/web/download.org | 5 +- 4 files changed, 133 insertions(+), 106 deletions(-) diff --git a/doc/blog/using-covid-19-pubseq-part3.html b/doc/blog/using-covid-19-pubseq-part3.html index 6838bc7..4132784 100644 --- a/doc/blog/using-covid-19-pubseq-part3.html +++ b/doc/blog/using-covid-19-pubseq-part3.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - + COVID-19 PubSeq Uploading Data (part 3) @@ -248,42 +248,43 @@ for the JavaScript code in this tag.

Table of Contents

-
-

1 Uploading Data

+
+

1 Uploading Data

Work in progress! @@ -291,8 +292,8 @@ for the JavaScript code in this tag.

-
-

2 Introduction

+
+

2 Introduction

The COVID-19 PubSeq allows you to upload your SARS-Cov-2 strains to a @@ -302,8 +303,8 @@ upload. Read the ABOUT page for more information.

-
-

3 Step 1: Upload sequence

+
+

3 Step 1: Upload sequence

To upload a sequence in the web upload page hit the browse button and @@ -331,8 +332,8 @@ an improved pangenome.

-
-

4 Step 2: Add metadata

+
+

4 Step 2: Add metadata

The web upload page contains fields for adding metadata. Metadata is @@ -358,12 +359,12 @@ the web form. Here we add some extra information.

-
-

4.1 Obligatory fields

+
+

4.1 Obligatory fields

-
-

4.1.1 Sample ID (sampleid)

+
+

4.1.1 Sample ID (sampleid)

This is a string field that defines a unique sample identifier by the @@ -381,8 +382,8 @@ Here we add the GenBank ID MT536190.1.

-
-

4.1.2 Collection date

+
+

4.1.2 Collection date

Estimated collection date. The GenBank page says April 6, 2020. @@ -390,8 +391,8 @@ Estimated collection date. The GenBank page says April 6, 2020.

-
-

4.1.3 Collection location

+
+

4.1.3 Collection location

A search on wikidata says Los Angelos is @@ -400,8 +401,8 @@ A search on wikidata says Los Angelos is

-
-

4.1.4 Sequencing technology

+
+

4.1.4 Sequencing technology

GenBank entry says Illumina, so we can fill that in @@ -409,8 +410,8 @@ GenBank entry says Illumina, so we can fill that in

-
-

4.1.5 Authors

+
+

4.1.5 Authors

GenBank entry says 'Lamers,S., Nolan,D.J., Rose,R., Cross,S., Moraga @@ -421,16 +422,16 @@ Freehan,A. and Garcia-Diaz,J.', so we can fill that in.

-
-

4.2 Optional fields

+
+

4.2 Optional fields

All other fields are optional. But let's see what we can add.

-
-

4.2.1 Host information

+
+

4.2.1 Host information

Sadly, not much is known about the host from GenBank. A little @@ -444,8 +445,8 @@ did to the person and what the person was like (say age group).

-
-

4.2.2 Collecting institution

+
+

4.2.2 Collecting institution

We can fill that in. @@ -453,8 +454,8 @@ We can fill that in.

-
-

4.2.3 Specimen source

+
+

4.2.3 Specimen source

We have that: nasopharyngeal swab @@ -462,8 +463,8 @@ We have that: nasopharyngeal swab

-
-

4.2.4 Source database accession

+
+

4.2.4 Source database accession

Genbank which is http://identifiers.org/insdc/MT536190.1#sequence. @@ -472,8 +473,8 @@ Note we plug in our own identifier MT536190.1.

-
-

4.2.5 Strain name

+
+

4.2.5 Strain name

SARS-CoV-2/human/USA/LA-BIE-070/2020 @@ -483,8 +484,8 @@ SARS-CoV-2/human/USA/LA-BIE-070/2020

-
-

5 Step 3: Submit to COVID-19 PubSeq

+
+

5 Step 3: Submit to COVID-19 PubSeq

Once you have the sequence and the metadata together, hit @@ -492,10 +493,22 @@ the 'Add to Pangenome' button. The data will be checked, submitted and the workflows should kick in!

+
+ +
+

6 Step 4: Check output

+
+

+The current pipeline takes 5.5 hours to complete! Once it completes +the updated data can be checked on the DOWNLOAD page. After completion +of above output this SPARQL query shows some of the metadata we put +in. +

+
-
-

5.1 Trouble shooting

-
+
+

6.1 Trouble shooting

+

We got an error saying: {"stem": "http://www.wikidata.org/entity/",… which means that our location field was not formed correctly! After @@ -509,7 +522,7 @@ submit button.

-
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-05-29 Fri 14:22
. +
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-05-30 Sat 10:44
.
diff --git a/doc/blog/using-covid-19-pubseq-part3.org b/doc/blog/using-covid-19-pubseq-part3.org index ade902d..4dd3078 100644 --- a/doc/blog/using-covid-19-pubseq-part3.org +++ b/doc/blog/using-covid-19-pubseq-part3.org @@ -18,6 +18,7 @@ - [[#obligatory-fields][Obligatory fields]] - [[#optional-fields][Optional fields]] - [[#step-3-submit-to-covid-19-pubseq][Step 3: Submit to COVID-19 PubSeq]] + - [[#step-4-check-output][Step 4: Check output]] - [[#trouble-shooting][Trouble shooting]] * Introduction @@ -135,6 +136,13 @@ Once you have the sequence and the metadata together, hit the 'Add to Pangenome' button. The data will be checked, submitted and the workflows should kick in! +* Step 4: Check output + +The current pipeline takes 5.5 hours to complete! Once it completes +the updated data can be checked on the [[./download][DOWNLOAD]] page. After completion +of above output this [[http://sparql.genenetwork.org/sparql/?default-graph-uri=&query=PREFIX+pubseq%3A+%3Chttp%3A%2F%2Fbiohackathon.org%2Fbh20-seq-schema%23MainSchema%2F%3E%0D%0APREFIX+sio%3A+%3Chttp%3A%2F%2Fsemanticscience.org%2Fresource%2F%3E%0D%0Aselect+distinct+%3Fsample+%3Fp+%3Fo%0D%0A%7B%0D%0A+++%3Fsample+sio%3ASIO_000115+%22MT536190.1%22+.%0D%0A+++%3Fsample+%3Fp+%3Fo+.%0D%0A%7D&format=text%2Fhtml&timeout=0&debug=on&run=+Run+Query+][SPARQL query]] shows some of the metadata we put +in. + ** Trouble shooting We got an error saying: {"stem": "http://www.wikidata.org/entity/",... diff --git a/doc/web/download.html b/doc/web/download.html index 2fde013..001b071 100644 --- a/doc/web/download.html +++ b/doc/web/download.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - + Download @@ -247,42 +247,45 @@ for the JavaScript code in this tag.

Table of Contents

-
-

1 Workflow runs

+
+

1 Workflow runs

-The last runs can be viewed here. +The last runs can be viewed here. If you click on a run you can see +the workflows that ran under Processes. Output (also intermediate) +is listed under Data collections. All current data is listed +here. Note that it takes time for a run to complete and show.

-
-

2 FASTA files

+
+

2 FASTA files

The public sequence resource provides all uploaded sequences as @@ -292,8 +295,8 @@ also provide a single file -

3 Metadata

+
+

3 Metadata

Metadata can be downloaded as Turtle RDF as a mergedmetadat.ttl which @@ -315,8 +318,8 @@ graph can be downloaded from below Pangenome RDF format.

-
-

4 Pangenome

+
+

4 Pangenome

Pangenome data is made available in multiple guises. Variation graphs @@ -324,8 +327,8 @@ Pangenome data is made available in multiple guises. Variation graphs

-
-

4.1 Pangenome GFA format

+
+

4.1 Pangenome GFA format

GFA is a standard for graphical fragment assembly and consumed @@ -334,8 +337,8 @@ by tools such as vgtools.

-
-

4.2 Pangenome in ODGI format

+
+

4.2 Pangenome in ODGI format

ODGI is a format that supports an optimised dynamic genome/graph @@ -344,8 +347,8 @@ implementation.

-
-

4.3 Pangenome RDF format

+
+

4.3 Pangenome RDF format

An RDF file that includes the sequences themselves in a variation @@ -356,8 +359,8 @@ graph can be downloaded from

-
-

4.4 Pangenome Browser format

+
+

4.4 Pangenome Browser format

The many JSON files that are named as @@ -368,8 +371,8 @@ Pangenome browser.

-
-

5 Log of workflow output

+
+

5 Log of workflow output

Including in below link is a log file of the last workflow runs. @@ -377,8 +380,8 @@ Including in below link is a log file of the last workflow runs.

-
-

6 All files

+
+

6 All files

https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/ @@ -386,16 +389,16 @@ Including in below link is a log file of the last workflow runs.

-
-

7 Planned

+
+

7 Planned

We are planning the add the following output (see also

-
-

7.1 Raw sequence data

+
+

7.1 Raw sequence data

See fastq tracker and BAM tracker. @@ -403,8 +406,8 @@ See fastq track

-
-

7.2 Multiple Sequence Alignment (MSA)

+
-
-

7.3 Phylogenetic tree

+
-
-

7.4 Protein prediction

+
+

7.4 Protein prediction

We aim to make protein predictions available. @@ -432,7 +435,7 @@ We aim to make protein predictions available.

-
Created by
Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-05-29 Fri 08:27
. +
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-05-30 Sat 10:40
.
diff --git a/doc/web/download.org b/doc/web/download.org index 2781d67..da6a9a7 100644 --- a/doc/web/download.org +++ b/doc/web/download.org @@ -20,7 +20,10 @@ * Workflow runs -The last runs can be viewed [[https://workbench.lugli.arvadosapi.com/container_requests/lugli-xvhdp-bhhk4nxx1lch5od][here]]. +The last runs can be viewed [[https://workbench.lugli.arvadosapi.com/projects/lugli-j7d0g-y4k4uswcqi3ku56#Subprojects][here]]. If you click on a run you can see +the workflows that ran under ~Processes~. Output (also intermediate) +is listed under ~Data collections~. All current data is listed +[[https://collections.lugli.arvadosapi.com/c=lugli-4zz18-z513nlpqm03hpca/][here]]. Note that it takes time for a run to complete and show. * FASTA files -- cgit v1.2.3