From 04ab343e57c7a23451164843d1922622c5f4f9f5 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Fri, 17 Jul 2020 12:05:53 +0100 Subject: Preparing for EBI submission --- doc/blog/using-covid-19-pubseq-part6.html | 393 ++++++++++++++++++++++++++++++ 1 file changed, 393 insertions(+) create mode 100644 doc/blog/using-covid-19-pubseq-part6.html (limited to 'doc/blog/using-covid-19-pubseq-part6.html') diff --git a/doc/blog/using-covid-19-pubseq-part6.html b/doc/blog/using-covid-19-pubseq-part6.html new file mode 100644 index 0000000..278abe8 --- /dev/null +++ b/doc/blog/using-covid-19-pubseq-part6.html @@ -0,0 +1,393 @@ + + + + + + + +COVID-19 PubSeq (part 6) + + + + + + + +
+

COVID-19 PubSeq (part 6)

+
+

Table of Contents

+ +
+ + +
+

1 Generating output for EBI

+
+

+Would it not be great an uploader to PubSeq also can export samples +to, say, EBI? That is what we discuss in this section. The submission +process is somewhat laborious and when you have submitted to PubSeq +why not export the same to EBI too with the least amount of effort? +

+ +

+COVID-19 PubSeq is a data source - both sequence data and metadata - +that can be used to push data to other sources, such as EBI. You can +register samples programmatically with a specific XML interface. Note +that (at this point) if you want to submit a sequence (FASTA) it can +only be done through the Webin-CLI. Raw data (FASTQ) can go through +the XML interface. +

+ +

+EBI sequence resources are presented through ENA. For example +Sequence: MT394864.1. +

+ +

+EBI has XML Formats for +

+ +
    +
  • SUBMISSION
  • +
  • STUDY
  • +
  • SAMPLE
  • +
  • EXPERIMENT
  • +
  • RUN
  • +
  • ANALYSIS
  • +
  • DAC
  • +
  • POLICY
  • +
  • DATASET
  • +
  • PROJECT
  • +
+ +

+with the schemas listed here. Since we are submitting sequences we +should follow submitting full genome assembly guidelines and +ENA guidelines. The first step is to define the study, next the sample +and finally the sequence (assembly). +

+
+
+ +
+

2 Defining the EBI study

+
+

+A study is defined here and looks like +

+ +
+
<PROJECT_SET>
+   <PROJECT alias="COVID-19 Washington DC">
+      <TITLE>Sequencing SARS-CoV-2 in the Washington DC area</TITLE>
+      <DESCRIPTION>This study collects samples from COVID-19 patients in the Washington DC area</DESCRIPTION>
+      <SUBMISSION_PROJECT>
+         <SEQUENCING_PROJECT/>
+      </SUBMISSION_PROJECT>
+   </PROJECT>
+</PROJECT_SET>
+
+
+ +

+also a submission 'command' is required looking like +

+ +
+
<SUBMISSION>
+   <ACTIONS>
+      <ACTION>
+         <ADD/>
+      </ACTION>
+      <ACTION>
+         <HOLD HoldUntilDate="TODO: release date"/>
+      </ACTION>
+   </ACTIONS>
+</SUBMISSION>
+
+
+
+ +

+The webin system accepts such sources using a command like +

+ +
+curl -u username:password -F "SUBMISSION=@submission.xml" \
+  -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/"
+
+ + +

+as described here. Note that this is the test server. For the final +version use www.ebi.ac.uk instead of wwwdev.ebi.ac.uk. You may also +need the –insecure switch to circumvent certificate checking. +

+ +

+work in progress (WIP) +

+
+
+ +
+

3 Define the EBI sample

+
+

+work in progress (WIP) +

+
+
+ +
+

4 Define the EBI sequence

+
+

+work in progress (WIP) +

+
+
+
+
+
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-07-17 Fri 06:05
. +
+ + -- cgit v1.2.3