#+TITLE: COVID-19 PubSeq (part 6) #+AUTHOR: Pjotr Prins # C-c C-e h h publish # C-c ! insert date (use . for active agenda, C-u C-c ! for date, C-u C-c . for time) # C-c C-t task rotate # RSS_IMAGE_URL: http://xxxx.xxxx.free.fr/rss_icon.png #+HTML_HEAD: * Table of Contents :TOC:noexport: - [[#generating-output-for-ebi][Generating output for EBI]] - [[#defining-the-ebi-study][Defining the EBI study]] - [[#define-the-ebi-sample][Define the EBI sample]] - [[#define-the-ebi-sequence][Define the EBI sequence]] * Generating output for EBI Would it not be great an uploader to PubSeq also can export samples to, say, EBI? That is what we discuss in this section. The submission process is somewhat laborious and when you have submitted to PubSeq why not export the same to EBI too with the least amount of effort? COVID-19 PubSeq is a data source - both sequence data and metadata - that can be used to push data to other sources, such as EBI. You can register [[https://ena-docs.readthedocs.io/en/latest/submit/samples/programmatic.html][samples programmatically]] with a specific XML interface. Note that (at this point) if you want to submit a sequence (FASTA) it can only be done through the [[https://ena-docs.readthedocs.io/en/latest/submit/general-guide/webin-cli.html][Webin-CLI]]. Raw data (FASTQ) can go through the XML interface. EBI sequence resources are presented through ENA. For example [[https://www.ebi.ac.uk/ena/browser/view/MT394864][Sequence: MT394864.1]]. EBI has XML Formats for - SUBMISSION - STUDY - SAMPLE - EXPERIMENT - RUN - ANALYSIS - DAC - POLICY - DATASET - PROJECT with the schemas listed [[ftp://ftp.ebi.ac.uk/pub/databases/ena/doc/xsd/sra_1_5/][here]]. Since we are submitting sequences we should follow submitting [[https://ena-docs.readthedocs.io/en/latest/submit/assembly.html][full genome assembly guidelines]] and [[https://ena-docs.readthedocs.io/en/latest/submit/general-guide/programmatic.html][ENA guidelines]]. The first step is to define the study, next the sample and finally the sequence (assembly). * Defining the EBI study A study is defined [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html][here]] and looks like #+BEGIN_SRC xml Sequencing SARS-CoV-2 in the Washington DC area This study collects samples from COVID-19 patients in the Washington DC area #+END_SRC also a submission 'command' is required looking like #+BEGIN_SRC xml #+END_SRC The webin system accepts such sources using a command like : curl -u username:password -F "SUBMISSION=@submission.xml" -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" as described [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html#submit-the-xmls-using-curl][here]]. /work in progress (WIP)/ * Define the EBI sample /work in progress (WIP)/ * Define the EBI sequence /work in progress (WIP)/