From 0495b892fba350096c8b1bd741c55e148e7fc2de Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Fri, 29 May 2020 14:23:25 -0500 Subject: Blog info for uploading sequence --- doc/blog/using-covid-19-pubseq-part3.html | 245 ++++++++++++++++++++++++++++-- 1 file changed, 229 insertions(+), 16 deletions(-) (limited to 'doc/blog/using-covid-19-pubseq-part3.html') diff --git a/doc/blog/using-covid-19-pubseq-part3.html b/doc/blog/using-covid-19-pubseq-part3.html index 7903791..6838bc7 100644 --- a/doc/blog/using-covid-19-pubseq-part3.html +++ b/doc/blog/using-covid-19-pubseq-part3.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - + COVID-19 PubSeq Uploading Data (part 3) @@ -248,16 +248,42 @@ for the JavaScript code in this tag.

Table of Contents

-
-

1 Uploading Data

+
+

1 Uploading Data

Work in progress! @@ -265,8 +291,8 @@ for the JavaScript code in this tag.

-
-

2 Introduction

+
+

2 Introduction

The COVID-19 PubSeq allows you to upload your SARS-Cov-2 strains to a @@ -276,27 +302,214 @@ upload. Read the ABOUT page for more information.

-
-

3 Step 1: Sequence

+
+

3 Step 1: Upload sequence

+

+To upload a sequence in the web upload page hit the browse button and +select the FASTA file on your local hard disk. +

+

We start with an assembled or mapped sequence in FASTA format. The PubSeq uploader contains a QC step which checks whether it is a likely SARS-CoV-2 sequence. While PubSeq deduplicates sequences and never -overwrites metadata it probably pays to check whether your data +overwrites metadata, you may still want to check whether your data already is in the system by querying some metadata as described in -Query metadata with SPARQL. +Query metadata with SPARQL or by simply downloading and checking one +of the files on the download page. We find GenBank MT536190.1 has not +been included yet. A FASTA text file can be downloaded to your local +disk and uploaded through our web upload page. Make sure the file does +not include any HTML! +

+ +

+Note: we currently only allow FASTA uploads. In the near future we'll +allow for uploading raw sequence files. This is important for creating +an improved pangenome. +

+
+
+ +
+

4 Step 2: Add metadata

+
+

+The web upload page contains fields for adding metadata. Metadata is +not only important for attribution, is also important for +analysis. The metadata is available for queries, see Query metadata +with SPARQL, and can be used to annotate variations of the virus in +different ways. +

+ +

+A number of fields are obligatory: sample id, date, location, +technology and authors. The others are optional, but it is valuable to +enter them when information is available. Metadata is defined in this +schema. From this schema we generate the input form. Note that +opitional fields have a question mark in the type. You can add +metadata yourself, btw, because this is a public resource! See also +Modify metadata for more information. +

+ +

+To get more information about a field click on the question mark on +the web form. Here we add some extra information. +

+
+ +
+

4.1 Obligatory fields

+
+
+
+

4.1.1 Sample ID (sampleid)

+
+

+This is a string field that defines a unique sample identifier by the +submitter. In addition to sampleid we also have hostid, +providersampleid and submittersampleid where host is the host the +sample came from, provider sample is the institution sample id and +submitter is the submitting individual id. hostid is important when +multiple sequences come from the same host. Make sure not to have +spaces in the sampleid. +

+ +

+Here we add the GenBank ID MT536190.1. +

+
+
+ +
+

4.1.2 Collection date

+
+

+Estimated collection date. The GenBank page says April 6, 2020. +

+
+
+ +
+

4.1.3 Collection location

+
+

+A search on wikidata says Los Angelos is +https://www.wikidata.org/entity/Q65 +

+
+
+ +
+

4.1.4 Sequencing technology

+
+

+GenBank entry says Illumina, so we can fill that in +

+
+
+ +
+

4.1.5 Authors

+
+

+GenBank entry says 'Lamers,S., Nolan,D.J., Rose,R., Cross,S., Moraga +Amador,D., Yang,T., Caruso,L., Navia,W., Von Borstel,L., Hui Zhou,X., +Freehan,A. and Garcia-Diaz,J.', so we can fill that in. +

+
+
+
+ +
+

4.2 Optional fields

+
+

+All other fields are optional. But let's see what we can add. +

+
+ +
+

4.2.1 Host information

+
+

+Sadly, not much is known about the host from GenBank. A little +sleuthing renders an interesting paper by some of the authors titled +SARS-CoV-2 is consistent across multiple samples and methodologies +which dates after the sample, but has no reference other than that the +raw data came from the SRA database, so it probably does not describe +this particular sample. We don't know what this strain of SARS-Cov-2 +did to the person and what the person was like (say age group). +

+
+
+ +
+

4.2.2 Collecting institution

+
+

+We can fill that in. +

+
+
+ +
+

4.2.3 Specimen source

+
+

+We have that: nasopharyngeal swab

+
+

4.2.4 Source database accession

+
+

+Genbank which is http://identifiers.org/insdc/MT536190.1#sequence. +Note we plug in our own identifier MT536190.1. +

+
+
-
-

4 Step 2: Metadata

+
+

4.2.5 Strain name

+
+

+SARS-CoV-2/human/USA/LA-BIE-070/2020 +

+
+
+
+
+ +
+

5 Step 3: Submit to COVID-19 PubSeq

+
+

+Once you have the sequence and the metadata together, hit +the 'Add to Pangenome' button. The data will be checked, +submitted and the workflows should kick in! +

+
+ +
+

5.1 Trouble shooting

+
+

+We got an error saying: {"stem": "http://www.wikidata.org/entity/",… +which means that our location field was not formed correctly! After +fixing it to look like http://www.wikidata.org/entity/Q65 (note http +instead on https and entity instead of wiki) the submission went +through. Reload the page (it won't empty the fields) to re-enable the +submit button. +

+
+
-
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-05-29 Fri 10:00
. +
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-05-29 Fri 14:22
.
-- cgit v1.2.3