aboutsummaryrefslogtreecommitdiff
path: root/doc/blog/using-covid-19-pubseq-part3.org
blob: 296bef63138c9a6e71fe030a53b231b02c4b56c5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#+TITLE: COVID-19 PubSeq Uploading Data (part 3)
#+AUTHOR: Pjotr Prins
# C-c C-e h h   publish
# C-c !         insert date (use . for active agenda, C-u C-c ! for date, C-u C-c . for time)
# C-c C-t       task rotate
# RSS_IMAGE_URL: http://xxxx.xxxx.free.fr/rss_icon.png

#+HTML_HEAD: <link rel="Blog stylesheet" type="text/css" href="blog.css" />

* Uploading Data

/Work in progress!/

* Table of Contents                                                     :TOC:noexport:
 - [[#uploading-data][Uploading Data]]
 - [[#introduction][Introduction]]
 - [[#step-1-sequence][Step 1: Sequence]]
 - [[#step-2-metadata][Step 2: Metadata]]

* Introduction

The COVID-19 PubSeq allows you to upload your SARS-Cov-2 strains to a
public resource for global comparisons. Compute it triggered on
upload. Read the [[./about][ABOUT]] page for more information.

* Step 1: Sequence

We start with an assembled or mapped sequence in FASTA format. The
PubSeq uploader contains a [[https://github.com/arvados/bh20-seq-resource/blob/master/bh20sequploader/qc_fasta.py][QC step]] which checks whether it is a likely
SARS-CoV-2 sequence. While PubSeq deduplicates sequences and never
overwrites metadata it probably pays to check whether your data
already is in the system by querying some metadata as described in
[[./blog?id=using-covid-19-pubseq-part1][Query metadata with SPARQL]].


* Step 2: Metadata