From 0be9983ef88fd3b925d8fa53e7f9ab2a28703bc0 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Tue, 14 Jul 2020 11:29:44 +0100 Subject: Started documenting EBI submission --- doc/blog/using-covid-19-pubseq-part6.org | 96 ++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 doc/blog/using-covid-19-pubseq-part6.org (limited to 'doc/blog') diff --git a/doc/blog/using-covid-19-pubseq-part6.org b/doc/blog/using-covid-19-pubseq-part6.org new file mode 100644 index 0000000..2a7c593 --- /dev/null +++ b/doc/blog/using-covid-19-pubseq-part6.org @@ -0,0 +1,96 @@ +#+TITLE: COVID-19 PubSeq (part 6) +#+AUTHOR: Pjotr Prins +# C-c C-e h h publish +# C-c ! insert date (use . for active agenda, C-u C-c ! for date, C-u C-c . for time) +# C-c C-t task rotate +# RSS_IMAGE_URL: http://xxxx.xxxx.free.fr/rss_icon.png + +#+HTML_HEAD: + + +* Table of Contents :TOC:noexport: + - [[#generating-output-for-ebi][Generating output for EBI]] + - [[#defining-the-ebi-study][Defining the EBI study]] + - [[#define-the-ebi-sample][Define the EBI sample]] + - [[#define-the-ebi-sequence][Define the EBI sequence]] + +* Generating output for EBI + +Would it not be great an uploader to PubSeq also can export samples +to, say, EBI? That is what we discuss in this section. The submission +process is somewhat laborious and when you have submitted to PubSeq +why not export the same to EBI too with the least amount of effort? + +COVID-19 PubSeq is a data source - both sequence data and metadata - +that can be used to push data to other sources, such as EBI. You can +register [[https://ena-docs.readthedocs.io/en/latest/submit/samples/programmatic.html][samples programmatically]] with a specific XML interface. + +EBI sequence resources are presented through ENA. For example +[[https://www.ebi.ac.uk/ena/browser/view/MT394864][Sequence: MT394864.1]]. + +EBI has XML Formats for + +- SUBMISSION +- STUDY +- SAMPLE +- EXPERIMENT +- RUN +- ANALYSIS +- DAC +- POLICY +- DATASET +- PROJECT + +with the schemas listed [[ftp://ftp.ebi.ac.uk/pub/databases/ena/doc/xsd/sra_1_5/][here]]. Since we are submitting sequences we +should follow submitting [[https://ena-docs.readthedocs.io/en/latest/submit/assembly.html][full genome assembly guidelines]] and [[https://ena-docs.readthedocs.io/en/latest/submit/general-guide/programmatic.html][ENA +guidelines]]. The first step is to define the study, next the sample and +finally the sequence (assembly). + +* Defining the EBI study + +A study is defined [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html][here]] and looks like + +#+BEGIN_SRC xml + + + Sequencing SARS-CoV-2 in the Washington DC area + This study collects samples from COVID-19 patients in the Washington DC area + + + + + +#+END_SRC + +also a submission 'command' is required looking like + +#+BEGIN_SRC xml + + + + + + + + + + + +#+END_SRC + +The webin system accepts such sources using a command like + +: curl -u username:password -F "SUBMISSION=@submission.xml" -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" + +as described [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html#submit-the-xmls-using-curl][here]]. + +/work in progress (WIP)/ + +* Define the EBI sample + + +/work in progress (WIP)/ + +* Define the EBI sequence + +/work in progress (WIP)/ -- cgit v1.2.3