aboutsummaryrefslogtreecommitdiff
path: root/doc/blog
diff options
context:
space:
mode:
Diffstat (limited to 'doc/blog')
-rw-r--r--doc/blog/using-covid-19-pubseq-part6.org96
1 files changed, 96 insertions, 0 deletions
diff --git a/doc/blog/using-covid-19-pubseq-part6.org b/doc/blog/using-covid-19-pubseq-part6.org
new file mode 100644
index 0000000..2a7c593
--- /dev/null
+++ b/doc/blog/using-covid-19-pubseq-part6.org
@@ -0,0 +1,96 @@
+#+TITLE: COVID-19 PubSeq (part 6)
+#+AUTHOR: Pjotr Prins
+# C-c C-e h h publish
+# C-c ! insert date (use . for active agenda, C-u C-c ! for date, C-u C-c . for time)
+# C-c C-t task rotate
+# RSS_IMAGE_URL: http://xxxx.xxxx.free.fr/rss_icon.png
+
+#+HTML_HEAD: <link rel="Blog stylesheet" type="text/css" href="blog.css" />
+
+
+* Table of Contents :TOC:noexport:
+ - [[#generating-output-for-ebi][Generating output for EBI]]
+ - [[#defining-the-ebi-study][Defining the EBI study]]
+ - [[#define-the-ebi-sample][Define the EBI sample]]
+ - [[#define-the-ebi-sequence][Define the EBI sequence]]
+
+* Generating output for EBI
+
+Would it not be great an uploader to PubSeq also can export samples
+to, say, EBI? That is what we discuss in this section. The submission
+process is somewhat laborious and when you have submitted to PubSeq
+why not export the same to EBI too with the least amount of effort?
+
+COVID-19 PubSeq is a data source - both sequence data and metadata -
+that can be used to push data to other sources, such as EBI. You can
+register [[https://ena-docs.readthedocs.io/en/latest/submit/samples/programmatic.html][samples programmatically]] with a specific XML interface.
+
+EBI sequence resources are presented through ENA. For example
+[[https://www.ebi.ac.uk/ena/browser/view/MT394864][Sequence: MT394864.1]].
+
+EBI has XML Formats for
+
+- SUBMISSION
+- STUDY
+- SAMPLE
+- EXPERIMENT
+- RUN
+- ANALYSIS
+- DAC
+- POLICY
+- DATASET
+- PROJECT
+
+with the schemas listed [[ftp://ftp.ebi.ac.uk/pub/databases/ena/doc/xsd/sra_1_5/][here]]. Since we are submitting sequences we
+should follow submitting [[https://ena-docs.readthedocs.io/en/latest/submit/assembly.html][full genome assembly guidelines]] and [[https://ena-docs.readthedocs.io/en/latest/submit/general-guide/programmatic.html][ENA
+guidelines]]. The first step is to define the study, next the sample and
+finally the sequence (assembly).
+
+* Defining the EBI study
+
+A study is defined [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html][here]] and looks like
+
+#+BEGIN_SRC xml
+<PROJECT_SET>
+ <PROJECT alias="COVID-19 Washington DC">
+ <TITLE>Sequencing SARS-CoV-2 in the Washington DC area</TITLE>
+ <DESCRIPTION>This study collects samples from COVID-19 patients in the Washington DC area</DESCRIPTION>
+ <SUBMISSION_PROJECT>
+ <SEQUENCING_PROJECT/>
+ </SUBMISSION_PROJECT>
+ </PROJECT>
+</PROJECT_SET>
+#+END_SRC
+
+also a submission 'command' is required looking like
+
+#+BEGIN_SRC xml
+<SUBMISSION>
+ <ACTIONS>
+ <ACTION>
+ <ADD/>
+ </ACTION>
+ <ACTION>
+ <HOLD HoldUntilDate="TODO: release date"/>
+ </ACTION>
+ </ACTIONS>
+</SUBMISSION>
+
+#+END_SRC
+
+The webin system accepts such sources using a command like
+
+: curl -u username:password -F "SUBMISSION=@submission.xml" -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/"
+
+as described [[https://ena-docs.readthedocs.io/en/latest/submit/study/programmatic.html#submit-the-xmls-using-curl][here]].
+
+/work in progress (WIP)/
+
+* Define the EBI sample
+
+
+/work in progress (WIP)/
+
+* Define the EBI sequence
+
+/work in progress (WIP)/