From 0be9983ef88fd3b925d8fa53e7f9ab2a28703bc0 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Tue, 14 Jul 2020 11:29:44 +0100 Subject: Started documenting EBI submission --- doc/blog/using-covid-19-pubseq-part6.org | 96 ++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 doc/blog/using-covid-19-pubseq-part6.org diff --git a/doc/blog/using-covid-19-pubseq-part6.org b/doc/blog/using-covid-19-pubseq-part6.org new file mode 100644 index 0000000..2a7c593 --- /dev/null +++ b/doc/blog/using-covid-19-pubseq-part6.org @@ -0,0 +1,96 @@ +#+TITLE: COVID-19 PubSeq (part 6) +#+AUTHOR: Pjotr Prins +# C-c C-e h h publish +# C-c ! insert date (use . for active agenda, C-u C-c ! for date, C-u C-c . for time) +# C-c C-t task rotate +# RSS_IMAGE_URL: http://xxxx.xxxx.free.fr/rss_icon.png + +#+HTML_HEAD: + + +* Table of Contents :TOC:noexport: + - [[#generating-output-for-ebi][Generating output for EBI]] + - [[#defining-the-ebi-study][Defining the EBI study]] + - [[#define-the-ebi-sample][Define the EBI sample]] + - [[#define-the-ebi-sequence][Define the EBI sequence]] + +* Generating output for EBI + +Would it not be great an uploader to PubSeq also can export samples +to, say, EBI? That is what we discuss in this section. The submission +process is somewhat laborious and when you have submitted to PubSeq +why not export the same to EBI too with the least amount of effort? + +COVID-19 PubSeq is a data source - both sequence data and metadata - +that can be used to push data to other sources, such as EBI. You can +register [[https://ena-docs.readthedocs.io/en/latest/submit/samples/programmatic.html][samples programmatically]] with a specific XML interface. + +EBI sequence resources are presented through ENA. For example +[[https://www.ebi.ac.uk/ena/browser/view/MT394864][Sequence: MT394864.1]]. + +EBI has XML Formats for + +- SUBMISSION +- STUDY +- SAMPLE +- EXPERIMENT +- RUN +- ANALYSIS +- DAC +- POLICY +- DATASET +- PROJECT + +with the schemas listed [[ftp://ftp.ebi.ac.uk/pub/databases/ena/doc/xsd/sra_1_5/][here]]. Since we are submitting sequences we +should follow submitting [[https://ena-docs.readthedocs.io/en/latest/submit/assembly.html][full genome assembly guidelines]] and [[https://ena-docs.readthedocs.io/en/latest/submit/general-guide/programmatic.html][ENA +guidelines]]. The first step is to define the study, next the sample and +finally the sequence (assembly). + +* Defining the EBI study + +A study is defined [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html][here]] and looks like + +#+BEGIN_SRC xml + + + Sequencing SARS-CoV-2 in the Washington DC area + This study collects samples from COVID-19 patients in the Washington DC area + + + + + +#+END_SRC + +also a submission 'command' is required looking like + +#+BEGIN_SRC xml + + + + + + + + + + + +#+END_SRC + +The webin system accepts such sources using a command like + +: curl -u username:password -F "SUBMISSION=@submission.xml" -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" + +as described [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html#submit-the-xmls-using-curl][here]]. + +/work in progress (WIP)/ + +* Define the EBI sample + + +/work in progress (WIP)/ + +* Define the EBI sequence + +/work in progress (WIP)/ -- cgit v1.2.3 From f76e14fe7d737fc50ed19e6f4dbe3613b4004380 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Thu, 16 Jul 2020 11:37:47 +0100 Subject: Remove extra exclamation mark --- bh20simplewebuploader/main.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bh20simplewebuploader/main.py b/bh20simplewebuploader/main.py index 8089883..77b3832 100644 --- a/bh20simplewebuploader/main.py +++ b/bh20simplewebuploader/main.py @@ -446,7 +446,7 @@ def receive_files(): def edit_button(url,text="Edit text!"): - return '

'+text+'!

' + return '

'+text+'

' def get_html_body(fn,source="https://github.com/arvados/bh20-seq-resource/tree/master/doc"): buf = edit_button(source) -- cgit v1.2.3 From 375df836387d11bbe1fa0b1bce14eed24507a6d8 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Thu, 16 Jul 2020 13:00:24 +0100 Subject: Blog: workflows --- doc/blog/using-covid-19-pubseq-part4.org | 6 ++++++ doc/web/about.org | 3 ++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/doc/blog/using-covid-19-pubseq-part4.org b/doc/blog/using-covid-19-pubseq-part4.org index 5fe71d1..8ad5e2d 100644 --- a/doc/blog/using-covid-19-pubseq-part4.org +++ b/doc/blog/using-covid-19-pubseq-part4.org @@ -10,6 +10,7 @@ * Table of Contents :TOC:noexport: - [[#what-does-this-mean][What does this mean?]] + - [[#where-can-i-find-the-workflows][Where can I find the workflows?]] - [[#modify-workflow][Modify Workflow]] * What does this mean? @@ -18,6 +19,11 @@ This means that when someone uploads a SARS-CoV-2 sequence using one of our tools (CLI or web-based) they add a sequence and some metadata which triggers a rerun of our workflows. +* Where can I find the workflows? + +Workflows are written in the common workflow language (CWL) and listed +on [[https://github.com/arvados/bh20-seq-resource/tree/master/workflows][github]]. PubSeq being an open project these workflows can be studied +and modified! * Modify Workflow diff --git a/doc/web/about.org b/doc/web/about.org index ad13bc3..1949e2d 100644 --- a/doc/web/about.org +++ b/doc/web/about.org @@ -140,7 +140,8 @@ See the [[http://covid19.genenetwork.org/blog]]! * How do I change the work flows? -See the [[http://covid19.genenetwork.org/blog]]! +Workflows are on [[https://github.com/arvados/bh20-seq-resource/tree/master/workflows][github]] and can be modified. See also the +[[[[http://covid19.genenetwork.org/blog]]][workflow blog]]. * How do I change the source code? -- cgit v1.2.3 From 87f3fb187ea7a956a3bd2fe224a3ea06ff1d760b Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Fri, 17 Jul 2020 11:42:15 +0100 Subject: Started EBI submission --- doc/blog/using-covid-19-pubseq-part6.org | 11 +++++++---- scripts/submit_ebi/example/project-submission.xml | 11 +++++++++++ scripts/submit_ebi/example/project.xml | 9 +++++++++ 3 files changed, 27 insertions(+), 4 deletions(-) create mode 100644 scripts/submit_ebi/example/project-submission.xml create mode 100644 scripts/submit_ebi/example/project.xml diff --git a/doc/blog/using-covid-19-pubseq-part6.org b/doc/blog/using-covid-19-pubseq-part6.org index 2a7c593..2d1c5e0 100644 --- a/doc/blog/using-covid-19-pubseq-part6.org +++ b/doc/blog/using-covid-19-pubseq-part6.org @@ -23,7 +23,10 @@ why not export the same to EBI too with the least amount of effort? COVID-19 PubSeq is a data source - both sequence data and metadata - that can be used to push data to other sources, such as EBI. You can -register [[https://ena-docs.readthedocs.io/en/latest/submit/samples/programmatic.html][samples programmatically]] with a specific XML interface. +register [[https://ena-docs.readthedocs.io/en/latest/submit/samples/programmatic.html][samples programmatically]] with a specific XML interface. Note +that (at this point) if you want to submit a sequence (FASTA) it can +only be done through the [[https://ena-docs.readthedocs.io/en/latest/submit/general-guide/webin-cli.html][Webin-CLI]]. Raw data (FASTQ) can go through +the XML interface. EBI sequence resources are presented through ENA. For example [[https://www.ebi.ac.uk/ena/browser/view/MT394864][Sequence: MT394864.1]]. @@ -42,9 +45,9 @@ EBI has XML Formats for - PROJECT with the schemas listed [[ftp://ftp.ebi.ac.uk/pub/databases/ena/doc/xsd/sra_1_5/][here]]. Since we are submitting sequences we -should follow submitting [[https://ena-docs.readthedocs.io/en/latest/submit/assembly.html][full genome assembly guidelines]] and [[https://ena-docs.readthedocs.io/en/latest/submit/general-guide/programmatic.html][ENA -guidelines]]. The first step is to define the study, next the sample and -finally the sequence (assembly). +should follow submitting [[https://ena-docs.readthedocs.io/en/latest/submit/assembly.html][full genome assembly guidelines]] and +[[https://ena-docs.readthedocs.io/en/latest/submit/general-guide/programmatic.html][ENA guidelines]]. The first step is to define the study, next the sample +and finally the sequence (assembly). * Defining the EBI study diff --git a/scripts/submit_ebi/example/project-submission.xml b/scripts/submit_ebi/example/project-submission.xml new file mode 100644 index 0000000..2d3ddc1 --- /dev/null +++ b/scripts/submit_ebi/example/project-submission.xml @@ -0,0 +1,11 @@ + + + + + + + + + + + diff --git a/scripts/submit_ebi/example/project.xml b/scripts/submit_ebi/example/project.xml new file mode 100644 index 0000000..90704ab --- /dev/null +++ b/scripts/submit_ebi/example/project.xml @@ -0,0 +1,9 @@ + + + Testing PubSeq Sample uploads + This study aimed to allow for uploading sequences from PubSeq + + + + + -- cgit v1.2.3 From 04ab343e57c7a23451164843d1922622c5f4f9f5 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Fri, 17 Jul 2020 12:05:53 +0100 Subject: Preparing for EBI submission --- bh20simplewebuploader/templates/blog.html | 8 + doc/blog/using-covid-19-pubseq-part6.html | 393 ++++++++++++++++++++++ doc/blog/using-covid-19-pubseq-part6.org | 7 +- scripts/submit_ebi/example/project-submission.xml | 3 +- scripts/submit_ebi/example/project.xml | 3 +- scripts/submit_ebi/example/sample-submission.xml | 8 + scripts/submit_ebi/example/sample.xml | 68 ++++ 7 files changed, 486 insertions(+), 4 deletions(-) create mode 100644 doc/blog/using-covid-19-pubseq-part6.html create mode 100644 scripts/submit_ebi/example/sample-submission.xml create mode 100644 scripts/submit_ebi/example/sample.xml diff --git a/bh20simplewebuploader/templates/blog.html b/bh20simplewebuploader/templates/blog.html index 823f8a1..f4c2a85 100644 --- a/bh20simplewebuploader/templates/blog.html +++ b/bh20simplewebuploader/templates/blog.html @@ -63,6 +63,14 @@ We explore the Arvados command line and API +
+
+ Prepare for uploading to EBI/ENA +
+
+ Generate the files needed for uploading to EBI/ENA +
+
diff --git a/doc/blog/using-covid-19-pubseq-part6.html b/doc/blog/using-covid-19-pubseq-part6.html new file mode 100644 index 0000000..278abe8 --- /dev/null +++ b/doc/blog/using-covid-19-pubseq-part6.html @@ -0,0 +1,393 @@ + + + + + + + +COVID-19 PubSeq (part 6) + + + + + + + +
+

COVID-19 PubSeq (part 6)

+
+

Table of Contents

+ +
+ + +
+

1 Generating output for EBI

+
+

+Would it not be great an uploader to PubSeq also can export samples +to, say, EBI? That is what we discuss in this section. The submission +process is somewhat laborious and when you have submitted to PubSeq +why not export the same to EBI too with the least amount of effort? +

+ +

+COVID-19 PubSeq is a data source - both sequence data and metadata - +that can be used to push data to other sources, such as EBI. You can +register samples programmatically with a specific XML interface. Note +that (at this point) if you want to submit a sequence (FASTA) it can +only be done through the Webin-CLI. Raw data (FASTQ) can go through +the XML interface. +

+ +

+EBI sequence resources are presented through ENA. For example +Sequence: MT394864.1. +

+ +

+EBI has XML Formats for +

+ +
    +
  • SUBMISSION
  • +
  • STUDY
  • +
  • SAMPLE
  • +
  • EXPERIMENT
  • +
  • RUN
  • +
  • ANALYSIS
  • +
  • DAC
  • +
  • POLICY
  • +
  • DATASET
  • +
  • PROJECT
  • +
+ +

+with the schemas listed here. Since we are submitting sequences we +should follow submitting full genome assembly guidelines and +ENA guidelines. The first step is to define the study, next the sample +and finally the sequence (assembly). +

+
+
+ +
+

2 Defining the EBI study

+
+

+A study is defined here and looks like +

+ +
+
<PROJECT_SET>
+   <PROJECT alias="COVID-19 Washington DC">
+      <TITLE>Sequencing SARS-CoV-2 in the Washington DC area</TITLE>
+      <DESCRIPTION>This study collects samples from COVID-19 patients in the Washington DC area</DESCRIPTION>
+      <SUBMISSION_PROJECT>
+         <SEQUENCING_PROJECT/>
+      </SUBMISSION_PROJECT>
+   </PROJECT>
+</PROJECT_SET>
+
+
+ +

+also a submission 'command' is required looking like +

+ +
+
<SUBMISSION>
+   <ACTIONS>
+      <ACTION>
+         <ADD/>
+      </ACTION>
+      <ACTION>
+         <HOLD HoldUntilDate="TODO: release date"/>
+      </ACTION>
+   </ACTIONS>
+</SUBMISSION>
+
+
+
+ +

+The webin system accepts such sources using a command like +

+ +
+curl -u username:password -F "SUBMISSION=@submission.xml" \
+  -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/"
+
+ + +

+as described here. Note that this is the test server. For the final +version use www.ebi.ac.uk instead of wwwdev.ebi.ac.uk. You may also +need the –insecure switch to circumvent certificate checking. +

+ +

+work in progress (WIP) +

+
+
+ +
+

3 Define the EBI sample

+
+

+work in progress (WIP) +

+
+
+ +
+

4 Define the EBI sequence

+
+

+work in progress (WIP) +

+
+
+
+
+
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-07-17 Fri 06:05
. +
+ + diff --git a/doc/blog/using-covid-19-pubseq-part6.org b/doc/blog/using-covid-19-pubseq-part6.org index 2d1c5e0..8964700 100644 --- a/doc/blog/using-covid-19-pubseq-part6.org +++ b/doc/blog/using-covid-19-pubseq-part6.org @@ -83,9 +83,12 @@ also a submission 'command' is required looking like The webin system accepts such sources using a command like -: curl -u username:password -F "SUBMISSION=@submission.xml" -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" +: curl -u username:password -F "SUBMISSION=@submission.xml" \ +: -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" -as described [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html#submit-the-xmls-using-curl][here]]. +as described [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html#submit-the-xmls-using-curl][here]]. Note that this is the test server. For the final +version use www.ebi.ac.uk instead of wwwdev.ebi.ac.uk. You may also +need the --insecure switch to circumvent certificate checking. /work in progress (WIP)/ diff --git a/scripts/submit_ebi/example/project-submission.xml b/scripts/submit_ebi/example/project-submission.xml index 2d3ddc1..1abb827 100644 --- a/scripts/submit_ebi/example/project-submission.xml +++ b/scripts/submit_ebi/example/project-submission.xml @@ -1,3 +1,4 @@ + @@ -6,6 +7,6 @@ - + diff --git a/scripts/submit_ebi/example/project.xml b/scripts/submit_ebi/example/project.xml index 90704ab..6a817e7 100644 --- a/scripts/submit_ebi/example/project.xml +++ b/scripts/submit_ebi/example/project.xml @@ -1,7 +1,8 @@ + Testing PubSeq Sample uploads - This study aimed to allow for uploading sequences from PubSeq + This is a test to allow for uploading sequences from PubSeq diff --git a/scripts/submit_ebi/example/sample-submission.xml b/scripts/submit_ebi/example/sample-submission.xml new file mode 100644 index 0000000..9d13512 --- /dev/null +++ b/scripts/submit_ebi/example/sample-submission.xml @@ -0,0 +1,8 @@ + + + + + + + + diff --git a/scripts/submit_ebi/example/sample.xml b/scripts/submit_ebi/example/sample.xml new file mode 100644 index 0000000..694c471 --- /dev/null +++ b/scripts/submit_ebi/example/sample.xml @@ -0,0 +1,68 @@ + + + + human gastric microbiota, mucosal + + 1284369 + stomach metagenome + + + + + investigation type + mimarks-survey + + + sequencing method + pyrosequencing + + + collection date + 2010 + + + host body site + Mucosa of stomach + + + human-associated environmental package + human-associated + + + geographic location (latitude) + 1.81 + DD + + + geographic location (longitude) + -78.76 + DD + + + geographic location (country and/or sea) + Colombia + + + geographic location (region and locality) + Tumaco + + + environment (biome) + coast + + + environment (feature) + human-associated habitat + + + environment (material) + gastric biopsy + + + ENA-CHECKLIST + ERC000011 + + + + + -- cgit v1.2.3