From 73824fe1f94cb965f6de9d5b43bf2eb48241d3ea Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Tue, 27 Oct 2020 12:17:23 +0000 Subject: Updating docs --- doc/INSTALL.md | 6 +- doc/blog/using-covid-19-pubseq-part3.html | 239 ++++++++++++++++-------------- doc/blog/using-covid-19-pubseq-part3.org | 22 ++- 3 files changed, 154 insertions(+), 113 deletions(-) (limited to 'doc') diff --git a/doc/INSTALL.md b/doc/INSTALL.md index e31b7d7..df825c6 100644 --- a/doc/INSTALL.md +++ b/doc/INSTALL.md @@ -38,7 +38,11 @@ Note that python-pyshex is packaged in http://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics so you'll need it to the GUIX_PACKAGE_PATH - see the README in that -repository. +repository. E.g. + +```sh +env GUIX_PACKAGE_PATH=~/iwrk/opensource/guix/guix-bioinformatics/ ~/opt/guix/bin/guix environment -C guix --ad-hoc git python python-flask python-pyyaml python-pycurl python-magic nss-certs python-pyshex python-pyyaml --network openssl python-pyshex python-pyshexc minimap2 python-schema-salad python-arvados-python-client --share=/export/tmp -- env TMPDIR=/export/tmp python3 bh20sequploader/main.py --help +``` ### Using the Web Uploader diff --git a/doc/blog/using-covid-19-pubseq-part3.html b/doc/blog/using-covid-19-pubseq-part3.html index e2eb996..788c1d2 100644 --- a/doc/blog/using-covid-19-pubseq-part3.html +++ b/doc/blog/using-covid-19-pubseq-part3.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- +The COVID-19 PubSeq allows you to upload your SARS-Cov-2 strains to a @@ -301,8 +278,8 @@ gets triggered on upload. Read the ABOUT page for more inf
To upload a sequence in the web upload page hit the browse button and @@ -330,8 +307,8 @@ an improved pangenome.
The web upload page contains fields for adding metadata. Metadata is @@ -357,12 +334,12 @@ the web form. Here we add some extra information.
This is a string field that defines a unique sample identifier by the @@ -380,8 +357,8 @@ Here we add the GenBank ID MT536190.1.
Estimated collection date. The GenBank page says April 6, 2020. @@ -389,8 +366,8 @@ Estimated collection date. The GenBank page says April 6, 2020.
A search on wikidata says Los Angeles is @@ -399,8 +376,8 @@ A search on wikidata says Los Angeles is
GenBank entry says Illumina, so we can fill that in @@ -408,8 +385,8 @@ GenBank entry says Illumina, so we can fill that in
GenBank entry says 'Lamers,S., Nolan,D.J., Rose,R., Cross,S., Moraga @@ -420,16 +397,16 @@ Freehan,A. and Garcia-Diaz,J.', so we can fill that in.
All other fields are optional. But let's see what we can add.
Sadly, not much is known about the host from GenBank. A little @@ -443,8 +420,8 @@ did to the person and what the person was like (say age group).
We can fill that in. @@ -452,8 +429,8 @@ We can fill that in.
We have that: nasopharyngeal swab @@ -461,8 +438,8 @@ We have that: nasopharyngeal swab
Genbank which is http://identifiers.org/insdc/MT536190.1#sequence. @@ -471,8 +448,8 @@ Note we plug in our own identifier MT536190.1.
SARS-CoV-2/human/USA/LA-BIE-070/2020 @@ -482,8 +459,8 @@ SARS-CoV-2/human/USA/LA-BIE-070/2020
Once you have the sequence and the metadata together, hit @@ -493,8 +470,8 @@ submitted and the workflows should kick in!
We got an error saying: {"stem": "http://www.wikidata.org/entity/",… @@ -508,8 +485,8 @@ submit button.
The current pipeline takes 5.5 hours to complete! Once it completes @@ -520,8 +497,8 @@ in.
Above steps require a manual upload of one sequence with metadata. @@ -584,8 +561,8 @@ submitter:
Installing with pip you should be @@ -610,9 +587,28 @@ python3 bh20sequploader/main.py example/sequence.fasta example/maximum_metadata_
after installing dependencies (also described in INSTALL with the GNU
-Guix package manager).
+Guix package manager). The --help
shows
Entering sequence uploader +usage: main.py [-h] [--validate] [--skip-qc] [--trusted] metadata sequence_p1 [sequence_p2] + +Upload SARS-CoV-19 sequences for analysis + +positional arguments: + metadata sequence metadata json + sequence_p1 sequence FASTA/FASTQ + sequence_p2 sequence FASTQ pair + +optional arguments: + -h, --help show this help message and exit + --validate Dry run, validate only + --skip-qc Skip local qc check + --trusted Trust local validation and add directly to validated project ++
The web interface using this exact same script so it should just work (TM). @@ -620,8 +616,9 @@ The web interface using this exact same script so it should just work
We also use above script to bulk upload GenBank sequences with a FASTA @@ -646,10 +643,32 @@ ls $dir_fasta_and_yaml/*.yaml |
+Usually, metadata are available in tabular format, like spreadsheets. As an example, we provide a script +esr_samples.py to show you how to parse +your metadata in YAML files ready for the upload. To execute the script, go in the ~bh20-seq-resource/scripts/esr_samples +and execute +
+ +python3 esr_samples.py ++
+You will find the YAML files in the `yaml` folder which will be created in the same directory. +
+