From 0495b892fba350096c8b1bd741c55e148e7fc2de Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Fri, 29 May 2020 14:23:25 -0500 Subject: Blog info for uploading sequence --- doc/blog/using-covid-19-pubseq-part3.html | 245 ++++++++++++++++++++++++++++-- 1 file changed, 229 insertions(+), 16 deletions(-) (limited to 'doc/blog/using-covid-19-pubseq-part3.html') diff --git a/doc/blog/using-covid-19-pubseq-part3.html b/doc/blog/using-covid-19-pubseq-part3.html index 7903791..6838bc7 100644 --- a/doc/blog/using-covid-19-pubseq-part3.html +++ b/doc/blog/using-covid-19-pubseq-part3.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- +Work in progress! @@ -265,8 +291,8 @@ for the JavaScript code in this tag.
The COVID-19 PubSeq allows you to upload your SARS-Cov-2 strains to a @@ -276,27 +302,214 @@ upload. Read the ABOUT page for more information.
+To upload a sequence in the web upload page hit the browse button and +select the FASTA file on your local hard disk. +
+We start with an assembled or mapped sequence in FASTA format. The PubSeq uploader contains a QC step which checks whether it is a likely SARS-CoV-2 sequence. While PubSeq deduplicates sequences and never -overwrites metadata it probably pays to check whether your data +overwrites metadata, you may still want to check whether your data already is in the system by querying some metadata as described in -Query metadata with SPARQL. +Query metadata with SPARQL or by simply downloading and checking one +of the files on the download page. We find GenBank MT536190.1 has not +been included yet. A FASTA text file can be downloaded to your local +disk and uploaded through our web upload page. Make sure the file does +not include any HTML! +
+ ++Note: we currently only allow FASTA uploads. In the near future we'll +allow for uploading raw sequence files. This is important for creating +an improved pangenome. +
++The web upload page contains fields for adding metadata. Metadata is +not only important for attribution, is also important for +analysis. The metadata is available for queries, see Query metadata +with SPARQL, and can be used to annotate variations of the virus in +different ways. +
+ +
+A number of fields are obligatory: sample id, date, location,
+technology and authors. The others are optional, but it is valuable to
+enter them when information is available. Metadata is defined in this
+schema. From this schema we generate the input form. Note that
+opitional fields have a question mark in the type
. You can add
+metadata yourself, btw, because this is a public resource! See also
+Modify metadata for more information.
+
+To get more information about a field click on the question mark on +the web form. Here we add some extra information. +
++This is a string field that defines a unique sample identifier by the +submitter. In addition to sampleid we also have hostid, +providersampleid and submittersampleid where host is the host the +sample came from, provider sample is the institution sample id and +submitter is the submitting individual id. hostid is important when +multiple sequences come from the same host. Make sure not to have +spaces in the sampleid. +
+ ++Here we add the GenBank ID MT536190.1. +
++Estimated collection date. The GenBank page says April 6, 2020. +
++A search on wikidata says Los Angelos is +https://www.wikidata.org/entity/Q65 +
++GenBank entry says Illumina, so we can fill that in +
++GenBank entry says 'Lamers,S., Nolan,D.J., Rose,R., Cross,S., Moraga +Amador,D., Yang,T., Caruso,L., Navia,W., Von Borstel,L., Hui Zhou,X., +Freehan,A. and Garcia-Diaz,J.', so we can fill that in. +
++All other fields are optional. But let's see what we can add. +
++Sadly, not much is known about the host from GenBank. A little +sleuthing renders an interesting paper by some of the authors titled +SARS-CoV-2 is consistent across multiple samples and methodologies +which dates after the sample, but has no reference other than that the +raw data came from the SRA database, so it probably does not describe +this particular sample. We don't know what this strain of SARS-Cov-2 +did to the person and what the person was like (say age group). +
++We can fill that in. +
++We have that: nasopharyngeal swab
+Genbank which is http://identifiers.org/insdc/MT536190.1#sequence. +Note we plug in our own identifier MT536190.1. +
++SARS-CoV-2/human/USA/LA-BIE-070/2020 +
++Once you have the sequence and the metadata together, hit +the 'Add to Pangenome' button. The data will be checked, +submitted and the workflows should kick in! +
++We got an error saying: {"stem": "http://www.wikidata.org/entity/",… +which means that our location field was not formed correctly! After +fixing it to look like http://www.wikidata.org/entity/Q65 (note http +instead on https and entity instead of wiki) the submission went +through. Reload the page (it won't empty the fields) to re-enable the +submit button. +
+