From 02d761902d49491f5b85c117dcb37db072be034d Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Tue, 25 Aug 2020 12:13:53 +0100 Subject: Fix underscores --- doc/blog/using-covid-19-pubseq-part3.html | 140 +++++++++++++++--------------- doc/blog/using-covid-19-pubseq-part3.org | 1 + 2 files changed, 71 insertions(+), 70 deletions(-) diff --git a/doc/blog/using-covid-19-pubseq-part3.html b/doc/blog/using-covid-19-pubseq-part3.html index 15c1b78..e2eb996 100644 --- a/doc/blog/using-covid-19-pubseq-part3.html +++ b/doc/blog/using-covid-19-pubseq-part3.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- +The COVID-19 PubSeq allows you to upload your SARS-Cov-2 strains to a @@ -301,8 +301,8 @@ gets triggered on upload. Read the ABOUT page for more inf
To upload a sequence in the web upload page hit the browse button and @@ -330,8 +330,8 @@ an improved pangenome.
The web upload page contains fields for adding metadata. Metadata is @@ -357,21 +357,21 @@ the web form. Here we add some extra information.
This is a string field that defines a unique sample identifier by the -submitter. In addition to sampleid we also have hostid, -providersampleid and submittersampleid where host is the host the +submitter. In addition to sample_id we also have host_id, +provider_sample_id and submitter_sample_id where host is the host the sample came from, provider sample is the institution sample id and -submitter is the submitting individual id. hostid is important when +submitter is the submitting individual id. host_id is important when multiple sequences come from the same host. Make sure not to have -spaces in the sampleid. +spaces in the sample_id.
@@ -380,8 +380,8 @@ Here we add the GenBank ID MT536190.1.
Estimated collection date. The GenBank page says April 6, 2020. @@ -389,8 +389,8 @@ Estimated collection date. The GenBank page says April 6, 2020.
A search on wikidata says Los Angeles is @@ -399,8 +399,8 @@ A search on wikidata says Los Angeles is
GenBank entry says Illumina, so we can fill that in @@ -408,8 +408,8 @@ GenBank entry says Illumina, so we can fill that in
GenBank entry says 'Lamers,S., Nolan,D.J., Rose,R., Cross,S., Moraga @@ -420,16 +420,16 @@ Freehan,A. and Garcia-Diaz,J.', so we can fill that in.
All other fields are optional. But let's see what we can add.
Sadly, not much is known about the host from GenBank. A little @@ -443,8 +443,8 @@ did to the person and what the person was like (say age group).
We can fill that in. @@ -452,8 +452,8 @@ We can fill that in.
We have that: nasopharyngeal swab @@ -461,8 +461,8 @@ We have that: nasopharyngeal swab
Genbank which is http://identifiers.org/insdc/MT536190.1#sequence. @@ -471,8 +471,8 @@ Note we plug in our own identifier MT536190.1.
SARS-CoV-2/human/USA/LA-BIE-070/2020 @@ -482,8 +482,8 @@ SARS-CoV-2/human/USA/LA-BIE-070/2020
Once you have the sequence and the metadata together, hit @@ -493,8 +493,8 @@ submitted and the workflows should kick in!
We got an error saying: {"stem": "http://www.wikidata.org/entity/",… @@ -508,8 +508,8 @@ submit button.
The current pipeline takes 5.5 hours to complete! Once it completes @@ -520,8 +520,8 @@ in.
Above steps require a manual upload of one sequence with metadata. @@ -584,8 +584,8 @@ submitter:
Installing with pip you should be @@ -620,8 +620,8 @@ The web interface using this exact same script so it should just work
We also use above script to bulk upload GenBank sequences with a FASTA
@@ -632,7 +632,7 @@ took above for uploading a GenBank sequence are already automated.
The steps are: from the
bh20-seq-resource/scripts/download_genbank_data/
directory using the
-fromgenbanktofastaandyaml.py script:
+from_genbank_to_fasta_and_yaml.py script:
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-08-24 Mon 04:34.
+
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-08-25 Tue 06:13.