diff options
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part6.org | 96 |
1 files changed, 96 insertions, 0 deletions
diff --git a/doc/blog/using-covid-19-pubseq-part6.org b/doc/blog/using-covid-19-pubseq-part6.org new file mode 100644 index 0000000..2a7c593 --- /dev/null +++ b/doc/blog/using-covid-19-pubseq-part6.org @@ -0,0 +1,96 @@ +#+TITLE: COVID-19 PubSeq (part 6) +#+AUTHOR: Pjotr Prins +# C-c C-e h h publish +# C-c ! insert date (use . for active agenda, C-u C-c ! for date, C-u C-c . for time) +# C-c C-t task rotate +# RSS_IMAGE_URL: http://xxxx.xxxx.free.fr/rss_icon.png + +#+HTML_HEAD: <link rel="Blog stylesheet" type="text/css" href="blog.css" /> + + +* Table of Contents :TOC:noexport: + - [[#generating-output-for-ebi][Generating output for EBI]] + - [[#defining-the-ebi-study][Defining the EBI study]] + - [[#define-the-ebi-sample][Define the EBI sample]] + - [[#define-the-ebi-sequence][Define the EBI sequence]] + +* Generating output for EBI + +Would it not be great an uploader to PubSeq also can export samples +to, say, EBI? That is what we discuss in this section. The submission +process is somewhat laborious and when you have submitted to PubSeq +why not export the same to EBI too with the least amount of effort? + +COVID-19 PubSeq is a data source - both sequence data and metadata - +that can be used to push data to other sources, such as EBI. You can +register [[https://ena-docs.readthedocs.io/en/latest/submit/samples/programmatic.html][samples programmatically]] with a specific XML interface. + +EBI sequence resources are presented through ENA. For example +[[https://www.ebi.ac.uk/ena/browser/view/MT394864][Sequence: MT394864.1]]. + +EBI has XML Formats for + +- SUBMISSION +- STUDY +- SAMPLE +- EXPERIMENT +- RUN +- ANALYSIS +- DAC +- POLICY +- DATASET +- PROJECT + +with the schemas listed [[ftp://ftp.ebi.ac.uk/pub/databases/ena/doc/xsd/sra_1_5/][here]]. Since we are submitting sequences we +should follow submitting [[https://ena-docs.readthedocs.io/en/latest/submit/assembly.html][full genome assembly guidelines]] and [[https://ena-docs.readthedocs.io/en/latest/submit/general-guide/programmatic.html][ENA +guidelines]]. The first step is to define the study, next the sample and +finally the sequence (assembly). + +* Defining the EBI study + +A study is defined [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html][here]] and looks like + +#+BEGIN_SRC xml +<PROJECT_SET> + <PROJECT alias="COVID-19 Washington DC"> + <TITLE>Sequencing SARS-CoV-2 in the Washington DC area</TITLE> + <DESCRIPTION>This study collects samples from COVID-19 patients in the Washington DC area</DESCRIPTION> + <SUBMISSION_PROJECT> + <SEQUENCING_PROJECT/> + </SUBMISSION_PROJECT> + </PROJECT> +</PROJECT_SET> +#+END_SRC + +also a submission 'command' is required looking like + +#+BEGIN_SRC xml +<SUBMISSION> + <ACTIONS> + <ACTION> + <ADD/> + </ACTION> + <ACTION> + <HOLD HoldUntilDate="TODO: release date"/> + </ACTION> + </ACTIONS> +</SUBMISSION> + +#+END_SRC + +The webin system accepts such sources using a command like + +: curl -u username:password -F "SUBMISSION=@submission.xml" -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" + +as described [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html#submit-the-xmls-using-curl][here]]. + +/work in progress (WIP)/ + +* Define the EBI sample + + +/work in progress (WIP)/ + +* Define the EBI sequence + +/work in progress (WIP)/ |