From fbbec51e604964d18ab72cbf0ac24b102ecc0376 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Fri, 6 Nov 2020 07:45:10 +0000 Subject: Working on upload --- doc/blog/using-covid-19-pubseq-part3.html | 261 +++++++++++++++++++----------- doc/blog/using-covid-19-pubseq-part3.org | 161 +++++++++++------- 2 files changed, 272 insertions(+), 150 deletions(-) (limited to 'doc/blog') diff --git a/doc/blog/using-covid-19-pubseq-part3.html b/doc/blog/using-covid-19-pubseq-part3.html index 788c1d2..b49830b 100644 --- a/doc/blog/using-covid-19-pubseq-part3.html +++ b/doc/blog/using-covid-19-pubseq-part3.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - + COVID-19 PubSeq Uploading Data (part 3) @@ -224,52 +224,66 @@

Table of Contents

+
+

1 Introduction

+
+

+In this document we explain how to upload data into COVID-19 PubSeq. +This can happen through a web page, or through a command line +script. We'll also show how to parametrize uploads by using templates. +The procedure is much easier than with other repositories and can be +fully automated. Once uploaded you can use our export API to prepare +for other repositories. +

+
+
-
-

1 Uploading Data

-
+
+

2 Uploading data

+

The COVID-19 PubSeq allows you to upload your SARS-Cov-2 strains to a public resource for global comparisons. A recompute of the pangenome @@ -278,9 +292,9 @@ gets triggered on upload. Read the ABOUT page for more inf

-
-

2 Step 1: Upload sequence

-
+
+

3 Step 1: Upload sequence

+

To upload a sequence in the web upload page hit the browse button and select the FASTA file on your local hard disk. @@ -307,9 +321,9 @@ an improved pangenome.

-
-

3 Step 2: Add metadata

-
+
+

4 Step 2: Add metadata

+

The web upload page contains fields for adding metadata. Metadata is not only important for attribution, is also important for @@ -334,13 +348,13 @@ the web form. Here we add some extra information.

-
-

3.1 Obligatory fields

-
+
+

4.1 Obligatory fields

+
-
-

3.1.1 Sample ID (sample_id)

-
+
+

4.1.1 Sample ID (sample_id)

+

This is a string field that defines a unique sample identifier by the submitter. In addition to sample_id we also have host_id, @@ -357,18 +371,18 @@ Here we add the GenBank ID MT536190.1.

-
-

3.1.2 Collection date

-
+
+

4.1.2 Collection date

+

Estimated collection date. The GenBank page says April 6, 2020.

-
-

3.1.3 Collection location

-
+
+

4.1.3 Collection location

+

A search on wikidata says Los Angeles is https://www.wikidata.org/entity/Q65 @@ -376,18 +390,18 @@ A search on wikidata says Los Angeles is

-
-

3.1.4 Sequencing technology

-
+
+

4.1.4 Sequencing technology

+

GenBank entry says Illumina, so we can fill that in

-
-

3.1.5 Authors

-
+
+

4.1.5 Authors

+

GenBank entry says 'Lamers,S., Nolan,D.J., Rose,R., Cross,S., Moraga Amador,D., Yang,T., Caruso,L., Navia,W., Von Borstel,L., Hui Zhou,X., @@ -397,17 +411,17 @@ Freehan,A. and Garcia-Diaz,J.', so we can fill that in.

-
-

3.2 Optional fields

-
+
+

4.2 Optional fields

+

All other fields are optional. But let's see what we can add.

-
-

3.2.1 Host information

-
+
+

4.2.1 Host information

+

Sadly, not much is known about the host from GenBank. A little sleuthing renders an interesting paper by some of the authors titled @@ -420,27 +434,27 @@ did to the person and what the person was like (say age group).

-
-

3.2.2 Collecting institution

-
+
+

4.2.2 Collecting institution

+

We can fill that in.

-
-

3.2.3 Specimen source

-
+
+

4.2.3 Specimen source

+

We have that: nasopharyngeal swab

-
-

3.2.4 Source database accession

-
+
+

4.2.4 Source database accession

+

Genbank which is http://identifiers.org/insdc/MT536190.1#sequence. Note we plug in our own identifier MT536190.1. @@ -448,9 +462,9 @@ Note we plug in our own identifier MT536190.1.

-
-

3.2.5 Strain name

-
+
+

4.2.5 Strain name

+

SARS-CoV-2/human/USA/LA-BIE-070/2020

@@ -459,9 +473,9 @@ SARS-CoV-2/human/USA/LA-BIE-070/2020
-
-

4 Step 3: Submit to COVID-19 PubSeq

-
+
+

5 Step 3: Submit to COVID-19 PubSeq

+

Once you have the sequence and the metadata together, hit the 'Add to Pangenome' button. The data will be checked, @@ -470,9 +484,9 @@ submitted and the workflows should kick in!

-
-

4.1 Trouble shooting

-
+
+

5.1 Trouble shooting

+

We got an error saying: {"stem": "http://www.wikidata.org/entity/",… which means that our location field was not formed correctly! After @@ -485,9 +499,9 @@ submit button.

-
-

5 Step 4: Check output

-
+
+

6 Step 4: Check output

+

The current pipeline takes 5.5 hours to complete! Once it completes the updated data can be checked on the DOWNLOAD page. After completion @@ -497,9 +511,9 @@ in.

-
-

6 Bulk sequence uploader

-
+
+

7 Bulk sequence uploader

+ + +

+a more elaborate example (note most fields are optional) may look like +

+
id: placeholder
 
@@ -559,11 +606,20 @@ submitter:
     additional_submitter_information: Optional free text field for additional information
 
+ +

+more metadata is yummy. Yummydata is useful to a wider community. Note +that many of the terms in above example are URIs, such as +host_species: http://purl.obolibrary.org/obo/NCBITaxon_9606. We use +web ontologies for these to make the data less ambiguous and more +FAIR. Check out the option fields as defined in the schema. If it is not listed +a little bit of web searching may be required or contact us. +

-
-

6.1 Run the uploader (CLI)

-
+
+

7.1 Run the uploader (CLI)

+

Installing with pip you should be able to run @@ -574,7 +630,6 @@ bh20sequploader sequence.fasta metadata.yaml -

Alternatively the script can be installed from github. Run on the command line @@ -617,9 +672,9 @@ The web interface using this exact same script so it should just work

-
-

6.2 Example: uploading bulk GenBank sequences

-
+
+

7.2 Example: uploading bulk GenBank sequences

+

We also use above script to bulk upload GenBank sequences with a FASTA and YAML extractor specific for GenBank. This means that the steps we @@ -645,14 +700,15 @@ ls $dir_fasta_and_yaml/*.yaml | -

-

6.3 Example: preparing metadata

-
+
+

7.3 Example: preparing metadata

+

-Usually, metadata are available in tabular format, like spreadsheets. As an example, we provide a script -esr_samples.py to show you how to parse -your metadata in YAML files ready for the upload. To execute the script, go in the ~bh20-seq-resource/scripts/esr_samples -and execute +Usually, metadata are available in a tabular format, such as +spreadsheets. As an example, we provide a script esr_samples.py to +show you how to parse your metadata in YAML files ready for the +upload. To execute the script, go in the +~bh20-seq-resource/scripts/esr_samples and execute

@@ -661,14 +717,27 @@ and execute

-You will find the YAML files in the `yaml` folder which will be created in the same directory. +You will find the YAML files in the `yaml` folder which will be +created in the same directory. +

+ +

+In the example we use Python pandas to read the spreadsheet into a +tabular structure. Next we use a template.yaml file that gets filled +in by esr_samples.py so we get a metadata YAML file for each sample. +

+ +

+Next run the earlier CLI uploader for each YAML and FASTA combination. +It can't be much easier than this. For ESR we uploaded a batch of 600 +sequences this way. See example.

-
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-10-27 Tue 06:43
. +
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-11-05 Thu 07:27
.
diff --git a/doc/blog/using-covid-19-pubseq-part3.org b/doc/blog/using-covid-19-pubseq-part3.org index fb68251..f3ba073 100644 --- a/doc/blog/using-covid-19-pubseq-part3.org +++ b/doc/blog/using-covid-19-pubseq-part3.org @@ -7,10 +7,19 @@ #+HTML_HEAD: #+OPTIONS: ^:nil +* Introduction + +In this document we explain how to upload data into COVID-19 PubSeq. +This can happen through a web page, or through a command line +script. We'll also show how to parametrize uploads by using templates. +The procedure is much easier than with other repositories and can be +fully automated. Once uploaded you can use our export API to prepare +for other repositories. * Table of Contents :TOC:noexport: - - [[#uploading-data][Uploading Data]] + - [[#introduction][Introduction]] + - [[#uploading-data][Uploading data]] - [[#step-1-upload-sequence][Step 1: Upload sequence]] - [[#step-2-add-metadata][Step 2: Add metadata]] - [[#obligatory-fields][Obligatory fields]] @@ -23,7 +32,7 @@ - [[#example-uploading-bulk-genbank-sequences][Example: uploading bulk GenBank sequences]] - [[#example-preparing-metadata][Example: preparing metadata]] -* Uploading Data +* Uploading data The COVID-19 PubSeq allows you to upload your SARS-Cov-2 strains to a public resource for global comparisons. A recompute of the pangenome @@ -165,55 +174,90 @@ file an associated metadata in [[https://github.com/arvados/bh20-seq-resource/bl the web form and gets validated from the same [[https://github.com/arvados/bh20-seq-resource/blob/master/bh20sequploader/bh20seq-schema.yml][schema]] looks. The YAML that you need to create/generate for your samples looks like +A minimal example of metadata looks like + +#+begin_src json + id: placeholder + + license: + license_type: http://creativecommons.org/licenses/by/4.0/ + + host: + host_species: http://purl.obolibrary.org/obo/NCBITaxon_9606 + + sample: + sample_id: XX + collection_date: "2020-01-01" + collection_location: http://www.wikidata.org/entity/Q148 + + virus: + virus_species: http://purl.obolibrary.org/obo/NCBITaxon_2697049 + + technology: + sample_sequencing_technology: [http://www.ebi.ac.uk/efo/EFO_0008632] + + submitter: + authors: [John Doe] +#+end_src + +a more elaborate example (note most fields are optional) may look like + #+begin_src json -id: placeholder - -host: - host_id: XX1 - host_species: http://purl.obolibrary.org/obo/NCBITaxon_9606 - host_sex: http://purl.obolibrary.org/obo/PATO_0000384 - host_age: 20 - host_age_unit: http://purl.obolibrary.org/obo/UO_0000036 - host_health_status: http://purl.obolibrary.org/obo/NCIT_C25269 - host_treatment: Process in which the act is intended to modify or alter host status (Compounds) - host_vaccination: [vaccines1,vaccine2] - ethnicity: http://purl.obolibrary.org/obo/HANCESTRO_0010 - additional_host_information: Optional free text field for additional information - -sample: - sample_id: Id of the sample as defined by the submitter - collector_name: Name of the person that took the sample - collecting_institution: Institute that was responsible of sampling - specimen_source: [http://purl.obolibrary.org/obo/NCIT_C155831,http://purl.obolibrary.org/obo/NCIT_C155835] - collection_date: "2020-01-01" - collection_location: http://www.wikidata.org/entity/Q148 - sample_storage_conditions: frozen specimen - source_database_accession: [http://identifiers.org/insdc/LC522350.1#sequence] - additional_collection_information: Optional free text field for additional information - -virus: - virus_species: http://purl.obolibrary.org/obo/NCBITaxon_2697049 - virus_strain: SARS-CoV-2/human/CHN/HS_8/2020 - -technology: - sample_sequencing_technology: [http://www.ebi.ac.uk/efo/EFO_0009173,http://www.ebi.ac.uk/efo/EFO_0009173] - sequence_assembly_method: Protocol used for assembly - sequencing_coverage: [70.0, 100.0] - additional_technology_information: Optional free text field for additional information - -submitter: - authors: [John Doe, Joe Boe, Jonny Oe] - submitter_name: [John Doe] - submitter_address: John Doe's address - originating_lab: John Doe kitchen - lab_address: John Doe's address - provider_sample_id: XXX1 - submitter_sample_id: XXX2 - publication: PMID00001113 - submitter_orcid: [https://orcid.org/0000-0000-0000-0000,https://orcid.org/0000-0000-0000-0001] - additional_submitter_information: Optional free text field for additional information + id: placeholder + + host: + host_id: XX1 + host_species: http://purl.obolibrary.org/obo/NCBITaxon_9606 + host_sex: http://purl.obolibrary.org/obo/PATO_0000384 + host_age: 20 + host_age_unit: http://purl.obolibrary.org/obo/UO_0000036 + host_health_status: http://purl.obolibrary.org/obo/NCIT_C25269 + host_treatment: Process in which the act is intended to modify or alter host status (Compounds) + host_vaccination: [vaccines1,vaccine2] + ethnicity: http://purl.obolibrary.org/obo/HANCESTRO_0010 + additional_host_information: Optional free text field for additional information + + sample: + sample_id: Id of the sample as defined by the submitter + collector_name: Name of the person that took the sample + collecting_institution: Institute that was responsible of sampling + specimen_source: [http://purl.obolibrary.org/obo/NCIT_C155831,http://purl.obolibrary.org/obo/NCIT_C155835] + collection_date: "2020-01-01" + collection_location: http://www.wikidata.org/entity/Q148 + sample_storage_conditions: frozen specimen + source_database_accession: [http://identifiers.org/insdc/LC522350.1#sequence] + additional_collection_information: Optional free text field for additional information + + virus: + virus_species: http://purl.obolibrary.org/obo/NCBITaxon_2697049 + virus_strain: SARS-CoV-2/human/CHN/HS_8/2020 + + technology: + sample_sequencing_technology: [http://www.ebi.ac.uk/efo/EFO_0009173,http://www.ebi.ac.uk/efo/EFO_0009173] + sequence_assembly_method: Protocol used for assembly + sequencing_coverage: [70.0, 100.0] + additional_technology_information: Optional free text field for additional information + + submitter: + authors: [John Doe, Joe Boe, Jonny Oe] + submitter_name: [John Doe] + submitter_address: John Doe's address + originating_lab: John Doe kitchen + lab_address: John Doe's address + provider_sample_id: XXX1 + submitter_sample_id: XXX2 + publication: PMID00001113 + submitter_orcid: [https://orcid.org/0000-0000-0000-0000,https://orcid.org/0000-0000-0000-0001] + additional_submitter_information: Optional free text field for additional information #+end_src +more metadata is yummy when stored in RDF. [[https://yummydata.org/][Yummydata]] is useful to a wider community. Note +that many of the terms in above example are URIs, such as +host_species: http://purl.obolibrary.org/obo/NCBITaxon_9606. We use +web ontologies for these to make the data less ambiguous and more +FAIR. Check out the option fields as defined in the schema. If it is not listed +a little bit of web searching may be required or [[./contact][contact]] us. + ** Run the uploader (CLI) Installing with pip you should be @@ -221,7 +265,6 @@ able to run : bh20sequploader sequence.fasta metadata.yaml - Alternatively the script can be installed from [[https://github.com/arvados/bh20-seq-resource#installation][github]]. Run on the command line @@ -274,13 +317,23 @@ done ** Example: preparing metadata -Usually, metadata are available in tabular format, like spreadsheets. As an example, we provide a script -[[https://github.com/arvados/bh20-seq-resource/tree/master/scripts/esr_samples][esr_samples.py]] to show you how to parse -your metadata in YAML files ready for the upload. To execute the script, go in the ~bh20-seq-resource/scripts/esr_samples -and execute +Usually, metadata are available in a tabular format, such as +spreadsheets. As an example, we provide a script [[https://github.com/arvados/bh20-seq-resource/tree/master/scripts/esr_samples][esr_samples.py]] to +show you how to parse your metadata in YAML files ready for the +upload. To execute the script, go in the +~bh20-seq-resource/scripts/esr_samples and execute #+BEGIN_SRC sh python3 esr_samples.py #+END_SRC -You will find the YAML files in the `yaml` folder which will be created in the same directory. +You will find the YAML files in the `yaml` folder which will be +created in the same directory. + +In the example we use Python pandas to read the spreadsheet into a +tabular structure. Next we use a [[https://github.com/arvados/bh20-seq-resource/blob/master/scripts/esr_samples/template.yaml][template.yaml]] file that gets filled +in by ~esr_samples.py~ so we get a metadata YAML file for each sample. + +Next run the earlier CLI uploader for each YAML and FASTA combination. +It can't be much easier than this. For ESR we uploaded a batch of 600 +sequences this way writing a few lines of Python [[https://github.com/arvados/bh20-seq-resource/blob/master/scripts/esr_samples/esr_samples.py][code]]. See [[http://covid19.genenetwork.org/resource/20VR0995][example]]. -- cgit v1.2.3 From 43d7264dda8061a024befbc9ca0a89d7159b1e40 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Fri, 6 Nov 2020 09:52:32 +0000 Subject: UTHSC upload info --- doc/blog/using-covid-19-pubseq-part3.org | 3 +- scripts/uthsc_samples/.gitignore | 1 + scripts/uthsc_samples/template.yaml | 35 ++++++++++++++++++++ scripts/uthsc_samples/uthsc_samples.py | 57 ++++++++++++++++++++++++++++++++ 4 files changed, 95 insertions(+), 1 deletion(-) create mode 100644 scripts/uthsc_samples/.gitignore create mode 100644 scripts/uthsc_samples/template.yaml create mode 100644 scripts/uthsc_samples/uthsc_samples.py (limited to 'doc/blog') diff --git a/doc/blog/using-covid-19-pubseq-part3.org b/doc/blog/using-covid-19-pubseq-part3.org index f3ba073..d0d6c7f 100644 --- a/doc/blog/using-covid-19-pubseq-part3.org +++ b/doc/blog/using-covid-19-pubseq-part3.org @@ -255,7 +255,8 @@ more metadata is yummy when stored in RDF. [[https://yummydata.org/][Yummydata]] that many of the terms in above example are URIs, such as host_species: http://purl.obolibrary.org/obo/NCBITaxon_9606. We use web ontologies for these to make the data less ambiguous and more -FAIR. Check out the option fields as defined in the schema. If it is not listed +FAIR. Check out the option fields as defined in the schema. If it is not listed, +check the [[https://github.com/arvados/bh20-seq-resource/blob/master/semantic_enrichment/labels.ttl][labels.ttl]] file. Also, a little bit of web searching may be required or [[./contact][contact]] us. ** Run the uploader (CLI) diff --git a/scripts/uthsc_samples/.gitignore b/scripts/uthsc_samples/.gitignore new file mode 100644 index 0000000..8786e3f --- /dev/null +++ b/scripts/uthsc_samples/.gitignore @@ -0,0 +1 @@ +yaml diff --git a/scripts/uthsc_samples/template.yaml b/scripts/uthsc_samples/template.yaml new file mode 100644 index 0000000..1175ac8 --- /dev/null +++ b/scripts/uthsc_samples/template.yaml @@ -0,0 +1,35 @@ +id: placeholder + +license: + license_type: http://creativecommons.org/licenses/by/4.0/ + title: "$sample_name - $locationx" + attribution_name: "Mariah Taylor, Colleen Jonsson" + attribution_url: https://www.uthsc.edu/medicine/molecular-sciences/faculty-directory/jonsson.php + +host: + host_id: "$sample_id" + host_species: http://purl.obolibrary.org/obo/NCBITaxon_9606 + +sample: + sample_id: "$sample_id" + specimen_source: [http://purl.obolibrary.org/obo/NCIT_C155831] + collection_date: "$collection_date" + collection_location: $location + +virus: + virus_species: http://purl.obolibrary.org/obo/NCBITaxon_2697049 + virus_strain: "$strain" + +technology: + sample_sequencing_technology: [http://www.ebi.ac.uk/efo/EFO_0008632] + sequence_assembly_method: https://bio.tools/BWA#! + additional_technology_information: Oxford Nanopore MiniIon RNA long reads + +submitter: + authors: [Mariah Taylor, Colleen Jonsson] + submitter_name: [Mariah Taylor, Colleen B. Jonsson, Pjotr Prins] + submitter_address: UTHSC, Memphis, Tennessee 38163, USA + originating_lab: Regional Biocontainment Laboratory, Memphis, TN + provider_sample_id: $sample_id + submitter_sample_id: $sample_id + submitter_orcid: [https://orcid.org/0000-0002-2640-7672,https://orcid.org/0000-0002-8021-9162] diff --git a/scripts/uthsc_samples/uthsc_samples.py b/scripts/uthsc_samples/uthsc_samples.py new file mode 100644 index 0000000..5c39398 --- /dev/null +++ b/scripts/uthsc_samples/uthsc_samples.py @@ -0,0 +1,57 @@ +import os +import pandas as pd +from string import Template +from dateutil.parser import parse +import re + +import sys + +# Metadata in tabular format in a spreadsheet(?!) +xlsx = '../../test/data/10_samples.xlsx' + +# Template in a text file +template_yaml = 'template.yaml' + +dir_output = 'yaml' + +if not os.path.exists(dir_output): + os.makedirs(dir_output) + +table = pd.read_excel(xlsx) + +print(table) + +for index, row in table.iterrows(): + sample = row['Sample ID'] + print(f"Processing sample {sample}...") + + with open(template_yaml) as f: + text = Template(f.read()) + with open(os.path.join(dir_output,f"{sample}.yaml"), 'w') as fw: + sample_id = sample + sample_name = sample + collection_date = parse(str(row['Collection Date'])).strftime('%Y-%m-%d') + locationx = row['City']+", "+row['State']+", USA" + location = "https://www.wikidata.org/wiki/Q16563" # Memphis by default + map = { + "Pegram": "https://www.wikidata.org/wiki/Q3289517", + "Alexander": "https://www.wikidata.org/wiki/Q79663", + "Smithville": "https://www.wikidata.org/wiki/Q2145339", + "Nashville": "https://www.wikidata.org/wiki/Q23197", + "Madison": "https://www.wikidata.org/wiki/Q494755" + } + + for name in map: + p = re.compile(name) + if p.match(locationx): + location = map[name] + break + + strain = f"SARS-CoV-2/human/USA/{sample}/2020" + fw.write(text.substitute(sample_id=sample_id, + sample_name=sample_name, + collection_date=collection_date, + location=location, + locationx=locationx, + strain=strain + )) -- cgit v1.2.3 From 6eef898f8080e64a2eab9b60f54cacbd419c279e Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Tue, 10 Nov 2020 11:08:26 +0000 Subject: Document Arvados runner --- doc/blog/using-covid-19-pubseq-part2.html | 127 +++++++++++++++--------------- doc/blog/using-covid-19-pubseq-part2.org | 21 +++++ 2 files changed, 84 insertions(+), 64 deletions(-) (limited to 'doc/blog') diff --git a/doc/blog/using-covid-19-pubseq-part2.html b/doc/blog/using-covid-19-pubseq-part2.html index 567980d..eff6fcd 100644 --- a/doc/blog/using-covid-19-pubseq-part2.html +++ b/doc/blog/using-covid-19-pubseq-part2.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - + COVID-19 PubSeq - Arvados @@ -40,7 +40,7 @@ } pre.src { position: relative; - overflow: visible; + overflow: auto; padding-top: 1.2em; } pre.src:before { @@ -195,50 +195,26 @@ @@ -252,18 +228,18 @@ for the JavaScript code in this tag.

Table of Contents

-
-

1 The Arvados Web Server

+
+

1 The Arvados Web Server

We are using Arvados to run common workflow language (CWL) pipelines. @@ -283,8 +259,8 @@ workflows and the output of analysis pipelines (here CWL workflows).

-
-

2 The Arvados file interface

+
+

2 The Arvados file interface

Arvados has the web server, but it also has a REST API and associated @@ -361,8 +337,8 @@ arv-get 2be6af7b4741f2a5c5f8ff2bc6152d73+1955623+Ab9ad65d7fe958a053b3a57d545839d

-
-

3 The PubSeq Arvados shell

+
+

3 The PubSeq Arvados shell

When you login to Arvados (you can request permission from us) it is @@ -414,11 +390,34 @@ the git repo and starts a new run calling into /data/pubseq/bh20-seq-resource/venv3/bin/bh20-seq-analyzer which is essentially monitoring for uploads.

+ +

+On run --help +

+ +
+optional arguments:
+  -h, --help            show this help message and exit
+  --uploader-project UPLOADER_PROJECT
+  --pangenome-analysis-project PANGENOME_ANALYSIS_PROJECT
+  --fastq-project FASTQ_PROJECT
+  --validated-project VALIDATED_PROJECT
+  --workflow-def-project WORKFLOW_DEF_PROJECT
+  --pangenome-workflow-uuid PANGENOME_WORKFLOW_UUID
+  --fastq-workflow-uuid FASTQ_WORKFLOW_UUID
+  --exclude-list EXCLUDE_LIST
+  --latest-result-collection LATEST_RESULT_COLLECTION
+  --kickoff
+  --no-start-analysis
+  --once
+  --print-status PRINT_STATUS
+  --revalidate
+
-
-

4 Wiring up CWL

+
+

4 Wiring up CWL

In above script bh20-seq-analyzer you can see that the Common @@ -459,8 +458,8 @@ For more see -

5 Using the Arvados API

+
+

5 Using the Arvados API

Arvados provides a rich API for accessing internals of the Cloud @@ -476,8 +475,8 @@ get a list of -

6 Troubleshooting

+
+

6 Troubleshooting

When workflows have errors we should check the logs in Arvados. @@ -494,7 +493,7 @@ see what parts failed.

-
Created by
Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-08-26 Wed 05:01
. +
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-11-09 Mon 01:20
.
diff --git a/doc/blog/using-covid-19-pubseq-part2.org b/doc/blog/using-covid-19-pubseq-part2.org index 4b827f5..d7816ba 100644 --- a/doc/blog/using-covid-19-pubseq-part2.org +++ b/doc/blog/using-covid-19-pubseq-part2.org @@ -96,6 +96,27 @@ the git repo and starts a new run calling into /data/pubseq/bh20-seq-resource/venv3/bin/bh20-seq-analyzer which is essentially [[https://github.com/arvados/bh20-seq-resource/blob/2baa88b766ec540bd34b96599014dd16e393af39/bh20seqanalyzer/main.py#L354][monitoring]] for uploads. +On ~run --help~ + +#+begin_example +optional arguments: + -h, --help show this help message and exit + --uploader-project UPLOADER_PROJECT + --pangenome-analysis-project PANGENOME_ANALYSIS_PROJECT + --fastq-project FASTQ_PROJECT + --validated-project VALIDATED_PROJECT + --workflow-def-project WORKFLOW_DEF_PROJECT + --pangenome-workflow-uuid PANGENOME_WORKFLOW_UUID + --fastq-workflow-uuid FASTQ_WORKFLOW_UUID + --exclude-list EXCLUDE_LIST + --latest-result-collection LATEST_RESULT_COLLECTION + --kickoff + --no-start-analysis + --once + --print-status PRINT_STATUS + --revalidate +#+end_example + * Wiring up CWL In above script ~bh20-seq-analyzer~ you can see that the [[https://www.commonwl.org/][Common -- cgit v1.2.3