aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README.md20
-rw-r--r--bh20seqanalyzer/__init__.py0
-rw-r--r--bh20seqanalyzer/main.py32
-rw-r--r--bh20sequploader/__init__.py0
-rw-r--r--bh20sequploader/bh20seq-schema.yml56
-rw-r--r--bh20sequploader/main.py6
-rw-r--r--bh20sequploader/qc_fasta.py4
-rw-r--r--bh20sequploader/qc_metadata.py26
-rw-r--r--bh20sequploader/rdf-mappings.ttl0
-rw-r--r--bh20simplewebuploader/main.py21
-rw-r--r--bh20simplewebuploader/static/image/CWL-Logo-Header.pngbin0 -> 4771 bytes
-rw-r--r--bh20simplewebuploader/static/image/arvados-logo.pngbin0 -> 9514 bytes
-rw-r--r--bh20simplewebuploader/static/image/coronasmallcomp.gifbin0 -> 7335879 bytes
-rw-r--r--bh20simplewebuploader/static/image/covid19biohackathon.pngbin0 -> 21756 bytes
-rw-r--r--bh20simplewebuploader/templates/error.html5
-rw-r--r--bh20simplewebuploader/templates/form.html144
-rw-r--r--example/metadata.yaml15
-rw-r--r--example/minimal_example.yaml8
-rw-r--r--example/sequence.fasta642
-rw-r--r--image/website.pngbin0 -> 220860 bytes
-rw-r--r--paper/paper.bib16
-rw-r--r--paper/paper.md177
22 files changed, 823 insertions, 349 deletions
diff --git a/README.md b/README.md
index 8a5a6dd..b0b22ff 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,13 @@
# Sequence uploader
-This repository provides a sequence uploader for the COVID-19 Virtual Biohackathon's Public Sequence Resource project. You can use it to upload the genomes of SARS-CoV-2 samples to make them publicly and freely available to other researchers.
+This repository provides a sequence uploader for the COVID-19 Virtual
+Biohackathon's Public Sequence Resource project. There are two
+versions, one that runs on the command line and another that acts as
+web interface. You can use it to upload the genomes of SARS-CoV-2
+samples to make them publicly and freely available to other
+researchers.
+
+![alt text](./image/website.png "Website")
To get started, first [install the uploader](#installation), and use the `bh20-seq-uploader` command to [upload your data](#usage).
@@ -59,7 +66,7 @@ sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-
pip3 install --user git+https://github.com/arvados/bh20-seq-resource.git@master
```
-3. **Make sure the tool is on your `PATH`.** THe `pip3` command will install the uploader in `.local/bin` inside your home directory. Your shell may not know to look for commands there by default. To fix this for the terminal you currently have open, run:
+3. **Make sure the tool is on your `PATH`.** The `pip3` command will install the uploader in `.local/bin` inside your home directory. Your shell may not know to look for commands there by default. To fix this for the terminal you currently have open, run:
```sh
export PATH=$PATH:$HOME/.local/bin
@@ -126,10 +133,10 @@ For running/developing the uploader with GNU Guix see [INSTALL.md](./doc/INSTALL
# Usage
-Run the uploader with a FASTA file and accompanying metadata file in [JSON-LD format](https://json-ld.org/):
+Run the uploader with a FASTA or FASTQ file and accompanying metadata file in JSON or YAML:
```sh
-bh20-seq-uploader example/sequence.fasta example/metadata.json
+bh20-seq-uploader example/sequence.fasta example/metadata.yaml
```
## Workflow for Generating a Pangenome
@@ -174,7 +181,4 @@ pip3 install gunicorn
gunicorn bh20simplewebuploader.main:app
```
-This runs on [http://127.0.0.1:8000/](http://127.0.0.1:8000/) by default, but can be adjusted with various [gunicorn options](http://docs.gunicorn.org/en/latest/run.html#commonly-used-arguments)
-
-
-
+This runs on [http://127.0.0.1:8000/](http://127.0.0.1:8000/) by default, but can be adjusted with various [gunicorn options](http://docs.gunicorn.org/en/latest/run.html#commonly-used-arguments).
diff --git a/bh20seqanalyzer/__init__.py b/bh20seqanalyzer/__init__.py
new file mode 100644
index 0000000..e69de29
--- /dev/null
+++ b/bh20seqanalyzer/__init__.py
diff --git a/bh20seqanalyzer/main.py b/bh20seqanalyzer/main.py
index c05b402..193a268 100644
--- a/bh20seqanalyzer/main.py
+++ b/bh20seqanalyzer/main.py
@@ -29,7 +29,7 @@ def validate_upload(api, collection, validated_project,
else:
try:
metadata_content = ruamel.yaml.round_trip_load(col.open("metadata.yaml"))
- metadata_content["id"] = "keep:%s/metadata.yaml" % collection["portable_data_hash"]
+ metadata_content["id"] = "http://arvados.org/keep:%s/metadata.yaml" % collection["portable_data_hash"]
add_lc_filename(metadata_content, metadata_content["id"])
valid = qc_metadata(metadata_content) and valid
except Exception as e:
@@ -39,19 +39,21 @@ def validate_upload(api, collection, validated_project,
logging.warn("Failed metadata qc")
if valid:
- if "sequence.fasta" in col:
- try:
- qc_fasta(col.open("sequence.fasta"))
- except Exception as e:
- logging.warn(e)
- valid = False
- else:
- if "reads.fastq" in col:
- start_fastq_to_fasta(api, collection, fastq_project, fastq_workflow_uuid)
- return False
- else:
- valid = False
- logging.warn("Upload '%s' missing sequence.fasta", collection["name"])
+ tgt = None
+ for n in ("sequence.fasta", "reads.fastq"):
+ if n not in col:
+ continue
+ with col.open(n) as qf:
+ tgt = qc_fasta(qf)
+ if tgt != n:
+ logging.info("Expected %s but magic says it should be %s", n, tgt)
+ valid = False
+ elif tgt == "reads.fastq":
+ start_fastq_to_fasta(api, collection, fastq_project, fastq_workflow_uuid)
+ return False
+ if tgt is None:
+ valid = False
+ logging.warn("Upload '%s' does not contain sequence.fasta or reads.fastq", collection["name"])
dup = api.collections().list(filters=[["owner_uuid", "=", validated_project],
["portable_data_hash", "=", col.portable_data_hash()]]).execute()
@@ -144,7 +146,7 @@ def start_pangenome_analysis(api,
"class": "File",
"location": "keep:%s/metadata.yaml" % v["portable_data_hash"]
})
- inputobj["subjects"].append("keep:%s/sequence.fasta" % v["portable_data_hash"])
+ inputobj["subjects"].append("http://arvados.org/keep:%s/sequence.fasta" % v["portable_data_hash"])
run_workflow(api, analysis_project, pangenome_workflow_uuid, "Pangenome analysis", inputobj)
diff --git a/bh20sequploader/__init__.py b/bh20sequploader/__init__.py
new file mode 100644
index 0000000..e69de29
--- /dev/null
+++ b/bh20sequploader/__init__.py
diff --git a/bh20sequploader/bh20seq-schema.yml b/bh20sequploader/bh20seq-schema.yml
index a072bd7..2d2e4c9 100644
--- a/bh20sequploader/bh20seq-schema.yml
+++ b/bh20sequploader/bh20seq-schema.yml
@@ -13,41 +13,54 @@ $graph:
type: record
fields:
host_species:
+ doc: Host species as defined in NCBITaxon (e.g. http://purl.obolibrary.org/obo/NCBITaxon_9606 for Homo sapiens)
type: string
jsonldPredicate:
_id: http://www.ebi.ac.uk/efo/EFO_0000532
+ _type: "@id"
host_id:
+ doc: Identifer for the host. If you submit multiple samples from the same host, use the same host_id for those samples
type: string
jsonldPredicate:
_id: http://semanticscience.org/resource/SIO_000115
host_common_name:
+ doc: Text label for the host species (e.g. homo sapiens)
type: string?
jsonldPredicate:
_id: http://purl.obolibrary.org/obo/NOMEN_0000037
host_sex:
+ doc: Sex of the host as define in NCIT, IRI expected (http://purl.obolibrary.org/obo/C20197 (Male), http://purl.obolibrary.org/obo/NCIT_C27993 (Female) or unkown (http://purl.obolibrary.org/obo/NCIT_C17998))
type: string
jsonldPredicate:
_id: http://purl.obolibrary.org/obo/PATO_0000047
+ _type: "@id"
host_age:
+ doc: Age of the host as number (e.g. 50)
type: int?
jsonldPredicate:
_id: http://purl.obolibrary.org/obo/PATO_0000011
host_age_unit:
+ doc: Unit of host age e.g. http://purl.obolibrary.org/obo/UO_0000036
type: string?
jsonldPredicate:
- _id: http://purl.obolibrary.org/obo/UO_0000036
+ _id: http://purl.obolibrary.org/obo/NCIT_C42574
+ _type: "@id"
host_health_status:
+ doc: A condition or state at a particular time
type: string?
jsonldPredicate: http://purl.obolibrary.org/obo/NCIT_C25688
host_treatment:
+ doc: Process in which the act is intended to modify or alter
type: string?
jsonldPredicate:
_id: http://www.ebi.ac.uk/efo/EFO_0000727
host_vaccination:
+ doc: Field is unstable
type: string?
jsonldPredicate:
_id: http://purl.obolibrary.org/obo/VO_0000001
additional_host_information:
+ doc: Field for additional host information
type: string?
jsonldPredicate:
_id: http://semanticscience.org/resource/SIO_001167
@@ -56,38 +69,48 @@ $graph:
type: record
fields:
collector_name:
+ doc: Name of the person that took the sample
type: string
jsonldPredicate:
_id: http://purl.obolibrary.org/obo/OBI_0001895
collecting_institution:
+ doc: Institute that was responsible of sampeling
type: string
jsonldPredicate:
_id: http://semanticscience.org/resource/SIO_001167
specimen_source:
+ doc: A specimen that derives from an anatomical part or substance arising from an organism, e.g. tissue, organ
type: string?
jsonldPredicate:
_id: http://purl.obolibrary.org/obo/OBI_0001479
collection_date:
+ doc: Date when the sample was taken
type: string?
jsonldPredicate:
_id: http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C25164
collection_location:
+ doc: Geographical location where the sample was collected as Gazetteer (https://www.ebi.ac.uk/ols/ontologies/gaz) reference, e.g. http://purl.obolibrary.org/obo/GAZ_00002845 (China)
type: string?
jsonldPredicate:
_id: http://purl.obolibrary.org/obo/GAZ_00000448
+ _type: "@id"
sample_storage_conditions:
+ doc: Information aboout storage of a specified type, e.g. frozen specimen, paraffin, fresh ....
type: string?
jsonldPredicate:
_id: http://purl.obolibrary.org/obo/OBI_0001472
additional_collection_information:
+ doc: Add additional comment about the circumstances that a sample was taken
type: string?
jsonldPredicate:
_id: http://semanticscience.org/resource/SIO_001167
sample_id:
+ doc: Id of the sample as defined by the submitter
type: string
jsonldPredicate:
_id: http://semanticscience.org/resource/SIO_000115
source_database_accession:
+ doc: If data is deposit at a public resource (e.g. Genbank, ENA) enter the Accession Id here
type: string?
jsonldPredicate:
_id: http://edamontology.org/data_2091
@@ -96,10 +119,13 @@ $graph:
type: record
fields:
virus_species:
+ doc: The name of a taxon from the NCBI taxonomy database
type: string?
jsonldPredicate:
_id: http://edamontology.org/data_1875
+ _type: "@id"
virus_strain:
+ doc: Name of the virus strain
type: string?
jsonldPredicate:
_id: http://semanticscience.org/resource/SIO_010055
@@ -108,14 +134,17 @@ $graph:
type: record
fields:
sample_sequencing_technology:
+ doc: Technology that was used to sequence this sample (e.g Sanger, Nanopor MiniION)
type: string
jsonldPredicate:
_id: http://purl.obolibrary.org/obo/OBI_0600047
sequence_assembly_method:
+ doc: Protocol which provides instructions on the alignment of sequencing reads to reference genome
type: string?
jsonldPredicate:
_id: http://www.ebi.ac.uk/efo/EFO_0002699
sequencing_coverage:
+ doc: Sequence coverage defined as the average number of reads representing a given nucleotide (e.g. 100x)
type: string?
jsonldPredicate:
_id: http://purl.obolibrary.org/obo/FLU_0000848
@@ -124,22 +153,22 @@ $graph:
type: record
fields:
submitter_name:
+ doc: Name of the submitter
type: string
jsonldPredicate:
_id: http://semanticscience.org/resource/SIO_000116
- submitter_date:
- type: string
- jsonldPredicate:
- _id: http://purl.obolibrary.org/obo/NCIT_C94162
submitter_address:
+ doc: Address of the submitter
type: string?
jsonldPredicate:
_id: http://semanticscience.org/resource/SIO_000172
originating_lab:
+ doc: Name of the laboratory that took the sample
type: string
jsonldPredicate:
_id: http://purl.obolibrary.org/obo/NCIT_C37984
lab_address:
+ doc: Address of the laboratory where the sample was taken
type: string?
jsonldPredicate:
_id: http://purl.obolibrary.org/obo/OBI_0600047
@@ -152,10 +181,17 @@ $graph:
jsonldPredicate:
_id: http://www.ebi.ac.uk/efo/EFO_0001741
authors:
+ doc: Name of the author(s)
type: string?
jsonldPredicate:
_id: http://purl.obolibrary.org/obo/NCIT_C42781
- submitter_id:
+ publication:
+ doc: Reference to publication of this sample (e.g. DOI, pubmed ID, ...)
+ type: string?
+ jsonldPredicate:
+ _id: http://purl.obolibrary.org/obo/NCIT_C19026
+ submitter_orchid:
+ doc: ORCHID of the submitter
type: string?
jsonldPredicate:
_id: http://semanticscience.org/resource/SIO_000115
@@ -169,14 +205,10 @@ $graph:
virus: virusSchema?
technology: technologySchema
submitter: submitterSchema
- submission:
- type: string
- jsonldPredicate:
- _id: "@id"
- #_type: "@id"
id:
doc: The subject (eg the fasta/fastq file) that the metadata describes
- type: string?
+ type: string
jsonldPredicate:
_id: "@id"
_type: "@id"
+ noLinkCheck: true
diff --git a/bh20sequploader/main.py b/bh20sequploader/main.py
index 4a225f6..e0a6a9a 100644
--- a/bh20sequploader/main.py
+++ b/bh20sequploader/main.py
@@ -8,8 +8,10 @@ from pathlib import Path
import urllib.request
import socket
import getpass
-from .qc_metadata import qc_metadata
-from .qc_fasta import qc_fasta
+import sys
+sys.path.insert(0,'.')
+from bh20sequploader.qc_metadata import qc_metadata
+from bh20sequploader.qc_fasta import qc_fasta
ARVADOS_API_HOST='lugli.arvadosapi.com'
ARVADOS_API_TOKEN='2fbebpmbo3rw3x05ueu2i6nx70zhrsb1p22ycu3ry34m4x4462'
diff --git a/bh20sequploader/qc_fasta.py b/bh20sequploader/qc_fasta.py
index e3d4fe7..e47d66b 100644
--- a/bh20sequploader/qc_fasta.py
+++ b/bh20sequploader/qc_fasta.py
@@ -21,8 +21,8 @@ def qc_fasta(sequence):
raise ValueError("FASTA file contains multiple entries")
break
sequence.seek(0)
- return "reads.fastq"
- elif seq_type == "text/fastq":
return "sequence.fasta"
+ elif seq_type == "text/fastq":
+ return "reads.fastq"
else:
raise ValueError("Sequence file does not look like FASTA or FASTQ")
diff --git a/bh20sequploader/qc_metadata.py b/bh20sequploader/qc_metadata.py
index 38edcaa..e477f21 100644
--- a/bh20sequploader/qc_metadata.py
+++ b/bh20sequploader/qc_metadata.py
@@ -5,21 +5,10 @@ import pkg_resources
import logging
import traceback
-class CustomFetcher(schema_salad.ref_resolver.DefaultFetcher):
- def check_exists(sup, url):
- if url.startswith("keep:"):
- return True
- else:
- return super().check_exists(url)
-
- def supported_schemes(self): # type: () -> List[str]
- return ["file", "http", "https", "mailto", "keep"]
-
-
def qc_metadata(metadatafile):
schema_resource = pkg_resources.resource_stream(__name__, "bh20seq-schema.yml")
cache = {"https://raw.githubusercontent.com/arvados/bh20-seq-resource/master/bh20sequploader/bh20seq-schema.yml": schema_resource.read().decode("utf-8")}
- (loader,
+ (document_loader,
avsc_names,
schema_metadata,
metaschema_loader) = schema_salad.schema.load_schema("https://raw.githubusercontent.com/arvados/bh20-seq-resource/master/bh20sequploader/bh20seq-schema.yml", cache=cache)
@@ -28,19 +17,6 @@ def qc_metadata(metadatafile):
print(avsc_names)
return False
- document_loader = schema_salad.ref_resolver.Loader(
- loader.ctx,
- schemagraph=loader.graph,
- foreign_properties=loader.foreign_properties,
- idx=loader.idx,
- cache=loader.cache,
- fetcher_constructor=CustomFetcher,
- skip_schemas=loader.skip_schemas,
- url_fields=loader.url_fields,
- allow_attachments=loader.allow_attachments,
- session=loader.session,
- )
-
try:
doc, metadata = schema_salad.schema.load_and_validate(document_loader, avsc_names, metadatafile, True)
return True
diff --git a/bh20sequploader/rdf-mappings.ttl b/bh20sequploader/rdf-mappings.ttl
new file mode 100644
index 0000000..e69de29
--- /dev/null
+++ b/bh20sequploader/rdf-mappings.ttl
diff --git a/bh20simplewebuploader/main.py b/bh20simplewebuploader/main.py
index bfc7762..f5324a5 100644
--- a/bh20simplewebuploader/main.py
+++ b/bh20simplewebuploader/main.py
@@ -3,12 +3,18 @@ import tempfile
import shutil
import subprocess
import os
+import sys
import re
import string
import yaml
import urllib.request
from flask import Flask, request, redirect, send_file, send_from_directory, render_template
+import os.path
+
+if not os.path.isfile('bh20sequploader/mainx.py'):
+ print("WARNING: run FLASK from the root of the source repository!", file=sys.stderr)
+
app = Flask(__name__, static_url_path='/static', static_folder='static')
# Limit file upload size. We shouldn't be working with anything over 1 MB; these are small genomes.
@@ -126,6 +132,7 @@ def generate_form(schema):
return list(walk_fields(root_name))
+
# At startup, we need to load the current metadata schema so we can make a form for it
METADATA_SCHEMA = yaml.safe_load(urllib.request.urlopen('https://raw.githubusercontent.com/arvados/bh20-seq-resource/master/bh20sequploader/bh20seq-schema.yml'))
FORM_ITEMS = generate_form(METADATA_SCHEMA)
@@ -184,15 +191,17 @@ def receive_files():
# We're going to work in one directory per request
dest_dir = tempfile.mkdtemp()
+ # The uploader will happily accept a FASTQ with this name
fasta_dest = os.path.join(dest_dir, 'fasta.fa')
metadata_dest = os.path.join(dest_dir, 'metadata.json')
try:
if 'fasta' not in request.files:
return (render_template('error.html',
- error_message="You did not include a FASTA file."), 403)
+ error_message="You did not include a FASTA or FASTQ file."), 403)
try:
with open(fasta_dest, 'wb') as out_stream:
- copy_with_limit(request.files.get('fasta').stream, out_stream)
+ # Use a plausible file size limit for a little FASTQ
+ copy_with_limit(request.files.get('fasta').stream, out_stream, limit=50*1024*1024)
except FileTooBigError as e:
# Delegate to the 413 error handler
return handle_large_file(e)
@@ -247,12 +256,16 @@ def receive_files():
error_message="You did not include metadata."), 403)
# Try and upload files to Arvados using the sequence uploader CLI
- result = subprocess.run(['bh20-seq-uploader', fasta_dest, metadata_dest],
+
+ cmd = ['python3','bh20sequploader/main.py', fasta_dest, metadata_dest]
+ print(" ".join(cmd),file=sys.stderr)
+ result = subprocess.run(cmd,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
if result.returncode != 0:
# It didn't work. Complain.
- error_message="Upload failed. Uploader returned {} and said:\n{}".format(result.returncode, result.stderr)
+ error_message="Uploader returned value {} and said:".format(result.returncode) + str(result.stderr.decode('utf-8'))
+ print(error_message, file=sys.stderr)
return (render_template('error.html', error_message=error_message), 403)
else:
# It worked. Say so.
diff --git a/bh20simplewebuploader/static/image/CWL-Logo-Header.png b/bh20simplewebuploader/static/image/CWL-Logo-Header.png
new file mode 100644
index 0000000..86855b4
--- /dev/null
+++ b/bh20simplewebuploader/static/image/CWL-Logo-Header.png
Binary files differ
diff --git a/bh20simplewebuploader/static/image/arvados-logo.png b/bh20simplewebuploader/static/image/arvados-logo.png
new file mode 100644
index 0000000..fdbd88e
--- /dev/null
+++ b/bh20simplewebuploader/static/image/arvados-logo.png
Binary files differ
diff --git a/bh20simplewebuploader/static/image/coronasmallcomp.gif b/bh20simplewebuploader/static/image/coronasmallcomp.gif
new file mode 100644
index 0000000..7a16637
--- /dev/null
+++ b/bh20simplewebuploader/static/image/coronasmallcomp.gif
Binary files differ
diff --git a/bh20simplewebuploader/static/image/covid19biohackathon.png b/bh20simplewebuploader/static/image/covid19biohackathon.png
new file mode 100644
index 0000000..7a744b6
--- /dev/null
+++ b/bh20simplewebuploader/static/image/covid19biohackathon.png
Binary files differ
diff --git a/bh20simplewebuploader/templates/error.html b/bh20simplewebuploader/templates/error.html
index c2ab0a4..b1d9402 100644
--- a/bh20simplewebuploader/templates/error.html
+++ b/bh20simplewebuploader/templates/error.html
@@ -9,7 +9,10 @@
<h1>Upload Failed</h1>
<hr>
<p>
- Your upload has failed. {{error_message}}
+ Your upload has failed.
+ <pre>
+ {{error_message|safe}}
+ </pre>
</p>
<p>
<a href="/">Click here to try again.</a>
diff --git a/bh20simplewebuploader/templates/form.html b/bh20simplewebuploader/templates/form.html
index d960bb2..df66e8c 100644
--- a/bh20simplewebuploader/templates/form.html
+++ b/bh20simplewebuploader/templates/form.html
@@ -7,10 +7,12 @@
body {
color: #101010;
+ background-color: #F9EDE1;
}
- h1, h4 {
+ h1, h2, h3, h4 {
font-family: 'Roboto Slab', serif;
+ color: darkblue;
}
h1 {
@@ -21,8 +23,19 @@
color: #505050;
font-style: italic;
}
+ .header {
+ background-color: white;
+ margin: 0 auto;
+ padding: 20px;
+ text-align: center;
+ height: 150px;
+ }
- p, form {
+ .logo {
+ float: right;
+ }
+
+ p, form, .about, .footer {
font-family: 'Raleway', sans-serif;
line-height: 1.5;
}
@@ -36,9 +49,23 @@
}
.intro {
+ background-color: lightgrey;
+ margin: 0 auto;
+ padding: 20px;
+ }
+
+ .about {
+ background-color: lightgrey;
margin: 0 auto;
padding: 20px;
}
+ .footer {
+ background-color: white;
+ margin: 0 auto;
+ }
+
+ span.dropt {border-bottom: thin dotted; background: #ffeedd;}
+ span.dropt:hover {text-decoration: none; background: #ffffff; z-index: 6; }
.grid-container {
display: grid;
@@ -59,10 +86,16 @@
}
.fasta-file-select {
+ padding: 1em;
grid-area: b;
}
.metadata {
+ padding: 1em;
+ grid-area: c;
+ }
+ .metadata_upload_form {
+ padding: 1em;
grid-area: c;
}
@@ -115,40 +148,65 @@
<meta charset="UTF-8">
<link href="https://fonts.googleapis.com/css2?family=Raleway:wght@500&family=Roboto+Slab&display=swap" rel="stylesheet">
<meta name="viewport" content="width=device-width, initial-scale=1">
- <title>Simple Web Uploader for Public SARS-CoV-2 Sequence Resource</title>
+ <title>Web uploader for Public SARS-CoV-2 Sequence Resource</title>
</head>
<body>
- <h1>Simple Web Uploader for Public SARS-CoV-2 Sequence Resource</h1>
+ <section class="header">
+ <div class="logo"><a href="http://covid-19.genenetwork.org/"><img src="static/image/coronasmallcomp.gif" width="150" title="COVID-19 image by Tyler Morgan-Wall"></a></div>
+ <h1>Web uploader for Public SARS-CoV-2 Sequence Resource</h1>
+
+<small>Disabled until we got everything wired up</small>
+
+ </section>
<hr>
+
<section>
<form action="/submit" method="POST" enctype="multipart/form-data" id="main_form" class="grid-container">
- <p class="intro">
- This tool can be used to upload sequenced genomes of SARS-CoV-2 samples to the <a href="https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca">Public SARS-CoV-2 Sequence Resource</a>. Your uploaded sequence will automatically be processed and incorporated into the public pangenome.
- </p>
+ <p class="intro">
+ Upload your SARS-CoV-2 sequence (FASTA or FASTQ formats) with metadata (JSONLD) to the <a href="https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca">public sequence resource</a>. The upload will trigger a
+ recompute with all available sequences into a Pangenome
+ available for
+ <a href="https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca">download</a>!
+ Your uploaded sequence will automatically be processed
+ and incorporated into the public pangenome with
+ metadata using worklows from the High Performance Open Biology Lab defined <a href="https://github.com/hpobio-lab/viral-analysis/tree/master/cwl/pangenome-generate">here</a>. All data is published under
+ a <a href="https://creativecommons.org/licenses/by/4.0/">Creative
+ Commons 4.0 attribution license</a> (CC-BY-4.0). You
+ can take the published (GFA/RDF/FASTA) data and store it in
+ a triple store for further processing. We also plan to
+ combine identifiers with clinical data stored securely at <a href="https://redcap-covid19.elixir-luxembourg.org/redcap/">REDCap</a>.
+ A free command line version of the uploader can be
+ installed from <a href="https://github.com/arvados/bh20-seq-resource">source</a>.
+ </p>
+
<div class="fasta-file-select">
- <label for="fasta">Select FASTA file for assembled genome (max 1MB):</label>
+ <label for="fasta">Select FASTA file of assembled genome (max 50K), or FASTQ of reads (<span class="dropt" title="For a larger fastq file you'll need to use a CLI uploader">max 150MB<span style="width:500px;"></span></span>) : </label>
<br>
- <input type="file" id="fasta" name="fasta" accept=".fa,.fasta,.fna" required>
+ <input type="file" id="fasta" name="fasta" accept=".fa,.fasta,.fna,.fq" required>
<br>
+ <small>
+ Note that by uploading your data you automatically agree to a <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY-4.0 license</a>. </small>
</div>
<div class="metadata">
- <label>Select metadata submission method:</label>
- <br>
- <input type="radio" id="metadata_upload" name="metadata_type" value="upload" onchange="setMode()" checked required>
- <label for="metadata_upload">Upload metadata file</label>
- <input type="radio" id="metadata_form" name="metadata_type" value="fill" onchange="setMode()" required>
- <label for="metadata_form">Fill in metadata manually</label>
- <br>
- </div>
+ <label>Select metadata submission method:</label>
+ <br>
+ <input type="radio" id="metadata_form" name="metadata_type" value="fill" onchange="setMode()" checked required>
+ <label for="metadata_form">Fill in metadata manually</label>
+ <input type="radio" id="metadata_upload" name="metadata_type" value="upload" onchange="setMode()" required>
+ <label for="metadata_upload">Upload metadata file</label>
+ <br>
+ <small>Make sure the metadata has submitter attribution details.</small>
- <div id="metadata_upload_form_spot">
+ <div id="metadata_upload_form_spot">
<div id="metadata_upload_form">
- <label for="metadata">Select JSON or YAML metadata file following <a href="https://github.com/arvados/bh20-seq-resource/blob/master/bh20sequploader/bh20seq-schema.yml" target="_blank">this schema</a> (<a href="https://github.com/arvados/bh20-seq-resource/blob/master/example/metadata.yaml" target="_blank">Example 1</a>, <a href="https://github.com/arvados/bh20-seq-resource/blob/master/example/minimal_example.yaml" target="_blank">Example 2</a>, max 1MB):</label>
- <br>
- <input type="file" id="metadata" name="metadata" accept=".json,.yml,.yaml" required>
- <br>
+ <br>
+ <label for="metadata">Select JSON or YAML metadata file following <a href="https://github.com/arvados/bh20-seq-resource/blob/master/bh20sequploader/bh20seq-schema.yml" target="_blank">this schema</a> and <a href="https://github.com/arvados/bh20-seq-resource/blob/master/example/metadata.yaml" target="_blank">example</a> (max 50K):</label>
+ <br>
+ <input type="file" id="metadata" name="metadata" accept=".json,.yml,.yaml" required>
+ <br>
</div>
+ </div>
</div>
<div id="metadata_fill_form_spot">
@@ -157,7 +215,7 @@
{% for record in fields %}
{% if 'heading' in record %}
- {% if loop.index > 1 %}
+ {% if loop.index > 1 and 2 < 3 %}
</div>
{% endif %}
<div class="record">
@@ -180,11 +238,45 @@
</div>
- <input class="submit" type="submit" value="Add to Pangenome">
+<input class="submit" type="submit" value="Add to Pangenome">
</form>
</section>
- <hr>
- <small><a href="https://github.com/adamnovak/bh20-simple-web-uploader">Source</a> &middot; Made for <a href="https://github.com/virtual-biohackathons/covid-19-bh20">COVID-19-BH20</a></small>
+<hr>
+<br>
+<div class="about">
+ <h3>ABOUT</h3>
+ <p>
+ This a public repository created at the COVID-19 BioHackathon
+ that has a low barrier to entry for uploading sequence data using
+ best practices. I.e., data is published with a creative commons
+ 4.0 (CC-4.0) license with metadata using state-of-the art
+ standards and, perhaps most importantly, providing standardized
+ workflows that get triggered on upload, so that results are
+ immediately available in standardized data formats. The repository
+ will be maintained and expanded for the duration of the
+ pandemic. To contribute data simply upload it! To contribute code
+ and/or workflows see
+ the <a href="https://github.com/arvados/bh20-seq-resource">project
+ repository</a>. For more information see the <a href="https://github.com/arvados/bh20-seq-resource/blob/master/paper/paper.md">paper</a> (WIP).
+ </p>
+ <br>
+</div>
+
+ <hr>
+<div class="footer">
+ <a href="https://arvados.org/"><img src="static/image/arvados-logo.png" align="top"></a>
+ <a href="https://www.commonwl.org/"><img src="static/image/CWL-Logo-Header.png" height="70"></a>
+
+ <a href="https://github.com/virtual-biohackathons/covid-19-bh20">
+ <img src="static/image/covid19biohackathon.png" align="right" height="70"></a>
+
+ <center>
+ <small><a href="https://github.com/arvados/bh20-seq-resource">Source code</a> &middot; Powered by <a href="https://www.commonwl.org/">Common Workflow Language</a> &amp; <a href="https://arvados.org/">Arvados</a>; Made for <a href="https://github.com/virtual-biohackathons/covid-19-bh20">COVID-19-BH20</a>
+ </small>
+ </center>
+
+
+</div>
<script type="text/javascript">
let uploadForm = document.getElementById('metadata_upload_form')
diff --git a/example/metadata.yaml b/example/metadata.yaml
index a2f6e57..15e4e44 100644
--- a/example/metadata.yaml
+++ b/example/metadata.yaml
@@ -1,12 +1,12 @@
-submission: publicSequenceResource
+id: placeholder
host:
host_id: XX1
- host_species: string
+ host_species: http://purl.obolibrary.org/obo/NCBITaxon_9606
host_common_name: string
- host_sex: string
+ host_sex: http://purl.obolibrary.org/obo/NCIT_C27993
host_age: 20
- host_age_unit: string
+ host_age_unit: http://purl.obolibrary.org/obo/UO_0000036
host_health_status: string
host_treatment: string
additional_host_information: string
@@ -17,12 +17,12 @@ sample:
collecting_institution: XXX
specimen_source: XXX
collection_date: XXX
- collection_location: XXX
+ collection_location: http://purl.obolibrary.org/obo/NCIT_C16428
sample_storage_conditions: XXX
additional_collection_information: XXX
virus:
- virus_species: XX
+ virus_species: http://purl.obolibrary.org/obo/NCBITaxon_2697049
virus_strain: XX
technology:
@@ -38,5 +38,4 @@ submitter:
provider_sample_id: string
submitter_sample_id: string
authors: testAuthor
- submitter_id: X12
- submitter_date: Subdate
+ submitter_orchid: X12
diff --git a/example/minimal_example.yaml b/example/minimal_example.yaml
index f312ab7..43adfa8 100644
--- a/example/minimal_example.yaml
+++ b/example/minimal_example.yaml
@@ -1,8 +1,9 @@
-submission: publicSequenceResource
+id: placeholder
host:
host_id: XX
- host_species: string
+ host_species: http://purl.obolibrary.org/obo/NCBITaxon_9606
+ host_sex: http://purl.obolibrary.org/obo/NCIT_C20197
sample:
sample_id: XXX
@@ -14,5 +15,4 @@ technology:
submitter:
submitter_name: tester
- originating_lab: testLab
- submitter_date: Subdate \ No newline at end of file
+ originating_lab: testLab \ No newline at end of file
diff --git a/example/sequence.fasta b/example/sequence.fasta
index 3c4c0ef..b364687 100644
--- a/example/sequence.fasta
+++ b/example/sequence.fasta
@@ -1,214 +1,430 @@
->AF324493.2 HIV-1 vector pNL4-3, complete sequence
-TGGAAGGGCTAATTTGGTCCCAAAAAAGACAAGAGATCCTTGATCTGTGGATCTACCACACACAAGGCTA
-CTTCCCTGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTGCTTC
-AAGTTAGTACCAGTTGAACCAGAGCAAGTAGAAGAGGCCAATGAAGGAGAGAACAACAGCTTGTTACACC
-CTATGAGCCAGCATGGGATGGAGGACCCGGAGGGAGAAGTATTAGTGTGGAAGTTTGACAGCCTCCTAGC
-ATTTCGTCACATGGCCCGAGAGCTGCATCCGGAGTACTACAAAGACTGCTGACATCGAGCTTTCTACAAG
-GGACTTTCCGCTGGGGACTTTCCAGGGAGGTGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCCTCAGAT
-GCTACATATAAGCAGCTGCTTTTTGCCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGC
-TCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTCAAAGTAGTGTG
-TGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCT
-AGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGTAAAGCCAGAGGAGATCTCTCGACGCAGGACTCG
-GCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGC
-GGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCGTCGGTATTAAGCGGGGGAGAATTAGATAAATGGGAA
-AAAATTCGGTTAAGGCCAGGGGGAAAGAAACAATATAAACTAAAACATATAGTATGGGCAAGCAGGGAGC
-TAGAACGATTCGCAGTTAATCCTGGCCTTTTAGAGACATCAGAAGGCTGTAGACAAATACTGGGACAGCT
-ACAACCATCCCTTCAGACAGGATCAGAAGAACTTAGATCATTATATAATACAATAGCAGTCCTCTATTGT
-GTGCATCAAAGGATAGATGTAAAAGACACCAAGGAAGCCTTAGATAAGATAGAGGAAGAGCAAAACAAAA
-GTAAGAAAAAGGCACAGCAAGCAGCAGCTGACACAGGAAACAACAGCCAGGTCAGCCAAAATTACCCTAT
-AGTGCAGAACCTCCAGGGGCAAATGGTACATCAGGCCATATCACCTAGAACTTTAAATGCATGGGTAAAA
-GTAGTAGAAGAGAAGGCTTTCAGCCCAGAAGTAATACCCATGTTTTCAGCATTATCAGAAGGAGCCACCC
-CACAAGATTTAAATACCATGCTAAACACAGTGGGGGGACATCAAGCAGCCATGCAAATGTTAAAAGAGAC
-CATCAATGAGGAAGCTGCAGAATGGGATAGATTGCATCCAGTGCATGCAGGGCCTATTGCACCAGGCCAG
-ATGAGAGAACCAAGGGGAAGTGACATAGCAGGAACTACTAGTACCCTTCAGGAACAAATAGGATGGATGA
-CACATAATCCACCTATCCCAGTAGGAGAAATCTATAAAAGATGGATAATCCTGGGATTAAATAAAATAGT
-AAGAATGTATAGCCCTACCAGCATTCTGGACATAAGACAAGGACCAAAGGAACCCTTTAGAGACTATGTA
-GACCGATTCTATAAAACTCTAAGAGCCGAGCAAGCTTCACAAGAGGTAAAAAATTGGATGACAGAAACCT
-TGTTGGTCCAAAATGCGAACCCAGATTGTAAGACTATTTTAAAAGCATTGGGACCAGGAGCGACACTAGA
-AGAAATGATGACAGCATGTCAGGGAGTGGGGGGACCCGGCCATAAAGCAAGAGTTTTGGCTGAAGCAATG
-AGCCAAGTAACAAATCCAGCTACCATAATGATACAGAAAGGCAATTTTAGGAACCAAAGAAAGACTGTTA
-AGTGTTTCAATTGTGGCAAAGAAGGGCACATAGCCAAAAATTGCAGGGCCCCTAGGAAAAAGGGCTGTTG
-GAAATGTGGAAAGGAAGGACACCAAATGAAAGATTGTACTGAGAGACAGGCTAATTTTTTAGGGAAGATC
-TGGCCTTCCCACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGACCAGAGCCAACAGCCCCACCAGAAG
-AGAGCTTCAGGTTTGGGGAAGAGACAACAACTCCCTCTCAGAAGCAGGAGCCGATAGACAAGGAACTGTA
-TCCTTTAGCTTCCCTCAGATCACTCTTTGGCAGCGACCCCTCGTCACAATAAAGATAGGGGGGCAATTAA
-AGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAATTTGCCAGGAAGATGGAA
-ACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATC
-TGCGGACATAAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGT
-TGACTCAGATTGGCTGCACTTTAAATTTTCCCATTAGTCCTATTGAGACTGTACCAGTAAAATTAAAGCC
-AGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATT
-TGTACAGAAATGGAAAAGGAAGGAAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTAT
-TTGCCATAAAGAAAAAAGACAGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAAC
-TCAAGATTTCTGGGAAGTTCAATTAGGAATACCACATCCTGCAGGGTTAAAACAGAAAAAATCAGTAACA
-GTACTGGATGTGGGCGATGCATATTTTTCAGTTCCCTTAGATAAAGACTTCAGGAAGTATACTGCATTTA
-CCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAA
-AGGATCACCAGCAATATTCCAGTGTAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAGAC
-ATAGTCATCTATCAATACATGGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCAGCATAGAACAA
-AAATAGAGGAACTGAGACAACATCTGTTGAGGTGGGGATTTACCACACCAGACAAAAAACATCAGAAAGA
-ACCTCCATTCCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATAGTGCTGCCA
-GAAAAGGACAGCTGGACTGTCAATGACATACAGAAATTAGTGGGAAAATTGAATTGGGCAAGTCAGATTT
-ATGCAGGGATTAAAGTAAGGCAATTATGTAAACTTCTTAGGGGAACCAAAGCACTAACAGAAGTAGTACC
-ACTAACAGAAGAAGCAGAGCTAGAACTGGCAGAAAACAGGGAGATTCTAAAAGAACCGGTACATGGAGTG
-TATTATGACCCATCAAAAGACTTAATAGCAGAAATACAGAAGCAGGGGCAAGGCCAATGGACATATCAAA
-TTTATCAAGAGCCATTTAAAAATCTGAAAACAGGAAAGTATGCAAGAATGAAGGGTGCCCACACTAATGA
-TGTGAAACAATTAACAGAGGCAGTACAAAAAATAGCCACAGAAAGCATAGTAATATGGGGAAAGACTCCT
-AAATTTAAATTACCCATACAAAAGGAAACATGGGAAGCATGGTGGACAGAGTATTGGCAAGCCACCTGGA
-TTCCTGAGTGGGAGTTTGTCAATACCCCTCCCTTAGTGAAGTTATGGTACCAGTTAGAGAAAGAACCCAT
-AATAGGAGCAGAAACTTTCTATGTAGATGGGGCAGCCAATAGGGAAACTAAATTAGGAAAAGCAGGATAT
-GTAACTGACAGAGGAAGACAAAAAGTTGTCCCCCTAACGGACACAACAAATCAGAAGACTGAGTTACAAG
-CAATTCATCTAGCTTTGCAGGATTCGGGATTAGAAGTAAACATAGTGACAGACTCACAATATGCATTGGG
-AATCATTCAAGCACAACCAGATAAGAGTGAATCAGAGTTAGTCAGTCAAATAATAGAGCAGTTAATAAAA
-AAGGAAAAAGTCTACCTGGCATGGGTACCAGCACACAAAGGAATTGGAGGAAATGAACAAGTAGATAAAT
-TGGTCAGTGCTGGAATCAGGAAAGTACTATTTTTAGATGGAATAGATAAGGCCCAAGAAGAACATGAGAA
-ATATCACAGTAATTGGAGAGCAATGGCTAGTGATTTTAACCTACCACCTGTAGTAGCAAAAGAAATAGTA
-GCCAGCTGTGATAAATGTCAGCTAAAAGGGGAAGCCATGCATGGACAAGTAGACTGTAGCCCAGGAATAT
-GGCAGCTAGATTGTACACATTTAGAAGGAAAAGTTATCTTGGTAGCAGTTCATGTAGCCAGTGGATATAT
-AGAAGCAGAAGTAATTCCAGCAGAGACAGGGCAAGAAACAGCATACTTCCTCTTAAAATTAGCAGGAAGA
-TGGCCAGTAAAAACAGTACATACAGACAATGGCAGCAATTTCACCAGTACTACAGTTAAGGCCGCCTGTT
-GGTGGGCGGGGATCAAGCAGGAATTTGGCATTCCCTACAATCCCCAAAGTCAAGGAGTAATAGAATCTAT
-GAATAAAGAATTAAAGAAAATTATAGGACAGGTAAGAGATCAGGCTGAACATCTTAAGACAGCAGTACAA
-ATGGCAGTATTCATCCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAATAG
-TAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAAATTCAAAATTTTCG
-GGTTTATTACAGGGACAGCAGAGATCCAGTTTGGAAAGGACCAGCAAAGCTCCTCTGGAAAGGTGAAGGG
-GCAGTAGTAATACAAGATAATAGTGACATAAAAGTAGTGCCAAGAAGAAAAGCAAAGATCATCAGGGATT
-ATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAAGTAGACAGGATGAGGATTAACACATGGAAAAGAT
-TAGTAAAACACCATATGTATATTTCAAGGAAAGCTAAGGACTGGTTTTATAGACATCACTATGAAAGTAC
-TAATCCAAAAATAAGTTCAGAAGTACACATCCCACTAGGGGATGCTAAATTAGTAATAACAACATATTGG
-GGTCTGCATACAGGAGAAAGAGACTGGCATTTGGGTCAGGGAGTCTCCATAGAATGGAGGAAAAAGAGAT
-ATAGCACACAAGTAGACCCTGACCTAGCAGACCAACTAATTCATCTGCACTATTTTGATTGTTTTTCAGA
-ATCTGCTATAAGAAATACCATATTAGGACGTATAGTTAGTCCTAGGTGTGAATATCAAGCAGGACATAAC
-AAGGTAGGATCTCTACAGTACTTGGCACTAGCAGCATTAATAAAACCAAAACAGATAAAGCCACCTTTGC
-CTAGTGTTAGGAAACTGACAGAGGACAGATGGAACAAGCCCCAGAAGACCAAGGGCCACAGAGGGAGCCA
-TACAATGAATGGACACTAGAGCTTTTAGAGGAACTTAAGAGTGAAGCTGTTAGACATTTTCCTAGGATAT
-GGCTCCATAACTTAGGACAACATATCTATGAAACTTACGGGGATACTTGGGCAGGAGTGGAAGCCATAAT
-AAGAATTCTGCAACAACTGCTGTTTATCCATTTCAGAATTGGGTGTCGACATAGCAGAATAGGCGTTACT
-CGACAGAGGAGAGCAAGAAATGGAGCCAGTAGATCCTAGACTAGAGCCCTGGAAGCATCCAGGAAGTCAG
-CCTAAAACTGCTTGTACCAATTGCTATTGTAAAAAGTGTTGCTTTCATTGCCAAGTTTGTTTCATGACAA
-AAGCCTTAGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCATCAGAACAGTCAGAC
-TCATCAAGCTTCTCTATCAAAGCAGTAAGTAGTACATGTAATGCAACCTATAATAGTAGCAATAGTAGCA
-TTAGTAGTAGCAATAATAATAGCAATAGTTGTGTGGTCCATAGTAATCATAGAATATAGGAAAATATTAA
-GACAAAGAAAAATAGACAGGTTAATTGATAGACTAATAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGA
-AGGAGAAGTATCAGCACTTGTGGAGATGGGGGTGGAAATGGGGCACCATGCTCCTTGGGATATTGATGAT
-CTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAGGAAGCAACCACC
-ACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACATGCCT
-GTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGTAAATGTGACAGAAAATTTTAACATGTGGAA
-AAATGACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTA
-AAATTAACCCCACTCTGTGTTAGTTTAAAGTGCACTGATTTGAAGAATGATACTAATACCAATAGTAGTA
-GCGGGAGAATGATAATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAAGCATAAGAGA
-TAAGGTGCAGAAAGAATATGCATTCTTTTATAAACTTGATATAGTACCAATAGATAATACCAGCTATAGG
-TTGATAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATAC
-ATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAATGTAATAATAAGACGTTCAATGGAACAGGACCATG
-TACAAATGTCAGCACAGTACAATGTACACATGGAATCAGGCCAGTAGTATCAACTCAACTGCTGTTAAAT
-GGCAGTCTAGCAGAAGAAGATGTAGTAATTAGATCTGCCAATTTCACAGACAATGCTAAAACCATAATAG
-TACAGCTGAACACATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAAGTATCCGTAT
-CCAGAGGGGACCAGGGAGAGCATTTGTTACAATAGGAAAAATAGGAAATATGAGACAAGCACATTGTAAC
-ATTAGTAGAGCAAAATGGAATGCCACTTTAAAACAGATAGCTAGCAAATTAAGAGAACAATTTGGAAATA
-ATAAAACAATAATCTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTTTAATTGTGG
-AGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTGGAGTACT
-GAAGGGTCAAATAACACTGAAGGAAGTGACACAATCACACTCCCATGCAGAATAAAACAATTTATAAACA
-TGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGTGGACAAATTAGATGTTCATCAAATAT
-TACTGGGCTGCTATTAACAAGAGATGGTGGTAATAACAACAATGGGTCCGAGATCTTCAGACCTGGAGGA
-GGCGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAG
-TAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTT
-CCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGA
-CAATTATTGTCTGATATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGT
-TGCAACTCACAGTCTGGGGCATCAAACAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGA
-TCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCT
-AGTTGGAGTAATAAATCTCTGGAACAGATTTGGAATAACATGACCTGGATGGAGTGGGACAGAGAAATTA
-ACAATTACACAAGCTTAATACACTCCTTAATTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGA
-ATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTAACATAACAAATTGGCTGTGGTATATA
-AAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTATAGTGA
-ATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAATCCCGAGGGGACCCGACAG
-GCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGGATCC
-TTAGCACTTATCTGGGACGATCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCT
-TGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAGGGGGTGGGAAGCCCTCAAATATTGGTGGAATCT
-CCTACAGTATTGGAGTCAGGAACTAAAGAATAGTGCTGTTAACTTGCTCAATGCCACAGCCATAGCAGTA
-GCTGAGGGGACAGATAGGGTTATAGAAGTATTACAAGCAGCTTATAGAGCTATTCGCCACATACCTAGAA
-GAATAAGACAGGGCTTGGAAAGGATTTTGCTATAAGATGGGTGGCAAGTGGTCAAAAAGTAGTGTGATTG
-GATGGCCTGCTGTAAGGGAAAGAATGAGACGAGCTGAGCCAGCAGCAGATGGGGTGGGAGCAGTATCTCG
-AGACCTAGAAAAACATGGAGCAATCACAAGTAGCAATACAGCAGCTAACAATGCTGCTTGTGCCTGGCTA
-GAAGCACAAGAGGAGGAAGAGGTGGGTTTTCCAGTCACACCTCAGGTACCTTTAAGACCAATGACTTACA
-AGGCAGCTGTAGATCTTAGCCACTTTTTAAAAGAAAAGGGGGGACTGGAAGGGCTAATTCACTCCCAAAG
-AAGACAAGATATCCTTGATCTGTGGATCTACCACACACAAGGCTACTTCCCTGATTGGCAGAACTACACA
-CCAGGGCCAGGGGTCAGATATCCACTGACCTTTGGATGGTGCTACAAGCTAGTACCAGTTGAGCCAGATA
-AGGTAGAAGAGGCCAATAAAGGAGAGAACACCAGCTTGTTACACCCTGTGAGCCTGCATGGAATGGATGA
-CCCTGAGAGAGAAGTGTTAGAGTGGAGGTTTGACAGCCGCCTAGCATTTCATCACGTGGCCCGAGAGCTG
-CATCCGGAGTACTTCAAGAACTGCTGACATCGAGCTTGCTACAAGGGACTTTCCGCTGGGGACTTTCCAG
-GGAGGCGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCCTCAGATGCTGCATATAAGCAGCTGCTTTTTG
-CCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACT
-GCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGT
-AACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCACCCAGGAGGTAGAGGTTGCAG
-TGAGCCAAGATCGCGCCACTGCATTCCAGCCTGGGCAAGAAAACAAGACTGTCTAAAATAATAATAATAA
-GTTAAGGGTATTAAATATATTTATACATGGAGGTCATAAAAATATATATATTTGGGCTGGGCGCAGTGGC
-TCACACCTGCGCCCGGCCCTTTGGGAGGCCGAGGCAGGTGGATCACCTGAGTTTGGGAGTTCCAGACCAG
-CCTGACCAACATGGAGAAACCCCTTCTCTGTGTATTTTTAGTAGATTTTATTTTATGTGTATTTTATTCA
-CAGGTATTTCTGGAAAACTGAAACTGTTTTTCCTCTACTCTGATACCACAAGAATCATCAGCACAGAGGA
-AGACTTCTGTGATCAAATGTGGTGGGAGAGGGAGGTTTTCACCAGCACATGAGCAGTCAGTTCTGCCGCA
-GACTCGGCGGGTGTCCTTCGGTTCAGTTCCAACACCGCCTGCCTGGAGAGAGGTCAGACCACAGGGTGAG
-GGCTCAGTCCCCAAGACATAAACACCCAAGACATAAACACCCAACAGGTCCACCCCGCCTGCTGCCCAGG
-CAGAGCCGATTCACCAAGACGGGAATTAGGATAGAGAAAGAGTAAGTCACACAGAGCCGGCTGTGCGGGA
-GAACGGAGTTCTATTATGACTCAAATCAGTCTCCCCAAGCATTCGGGGATCAGAGTTTTTAAGGATAACT
-TAGTGTGTAGGGGGCCAGTGAGTTGGAGATGAAAGCGTAGGGAGTCGAAGGTGTCCTTTTGCGCCGAGTC
-AGTTCCTGGGTGGGGGCCACAAGATCGGATGAGCCAGTTTATCAATCCGGGGGTGCCAGCTGATCCATGG
-AGTGCAGGGTCTGCAAAATATCTCAAGCACTGATTGATCTTAGGTTTTACAATAGTGATGTTACCCCAGG
-AACAATTTGGGGAAGGTCAGAATCTTGTAGCCTGTAGCTGCATGACTCCTAAACCATAATTTCTTTTTTG
-TTTTTTTTTTTTTATTTTTGAGACAGGGTCTCACTCTGTCACCTAGGCTGGAGTGCAGTGGTGCAATCAC
-AGCTCACTGCAGCCTCAACGTCGTAAGCTCAAGCGATCCTCCCACCTCAGCCTGCCTGGTAGCTGAGACT
-ACAAGCGACGCCCCAGTTAATTTTTGTATTTTTGGTAGAGGCAGCGTTTTGCCGTGTGGCCCTGGCTGGT
-CTCGAACTCCTGGGCTCAAGTGATCCAGCCTCAGCCTCCCAAAGTGCTGGGACAACCGGGGCCAGTCACT
-GCACCTGGCCCTAAACCATAATTTCTAATCTTTTGGCTAATTTGTTAGTCCTACAAAGGCAGTCTAGTCC
-CCAGGCAAAAAGGGGGTTTGTTTCGGGAAAGGGCTGTTACTGTCTTTGTTTCAAACTATAAACTAAGTTC
-CTCCTAAACTTAGTTCGGCCTACACCCAGGAATGAACAAGGAGAGCTTGGAGGTTAGAAGCACGATGGAA
-TTGGTTAGGTCAGATCTCTTTCACTGTCTGAGTTATAATTTTGCAATGGTGGTTCAAAGACTGCCCGCTT
-CTGACACCAGTCGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCTCT
-TCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAA
-AGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCA
-AAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCAT
-CACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCC
-CTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCC
-TTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCC
-AAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTG
-AGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAG
-GTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTT
-GGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAA
-CCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGA
-AGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTC
-ATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAA
-GTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTG
-TCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCA
-TCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACC
-AGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTG
-TTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGC
-ATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTA
-CATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTT
-GGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGA
-TGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCT
-CTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAA
-ACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGT
-GCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAA
-ATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTA
-TTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAA
-ATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGTCTAAGAAACCATTATTATCATGACAT
-TAACCTATAAAAATAGGCGTATCACGAGGCCCTTTCGTCTCGCGCGTTTCGGTGATGACGGTGAAAACCT
-CTGACACATGCAGCTCCCGGAGACGGTCACAGCTTGTCTGTAAGCGGATGCCGGGAGCAGACAAGCCCGT
-CAGGGCGCGTCAGCGGGTGTTGGCGGGTGTCGGGGCTGGCTTAACTATGCGGCATCAGAGCAGATTGTAC
-TGAGAGTGCACCATATGCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCCA
-TTCGCCATTCAGGCTGCGCAACTGTTGGGAAGGGCGATCGGTGCGGGCCTCTTCGCTATTACGCCAGGGG
-AGGCAGAGATTGCAGTAAGCTGAGATCGCAGCACTGCACTCCAGCCTGGGCGACAGAGTAAGACTCTGTC
-TCAAAAATAAAATAAATAAATCAATCAGATATTCCAATCTTTTCCTTTATTTATTTATTTATTTTCTATT
-TTGGAAACACAGTCCTTCCTTATTCCAGAATTACACATATATTCTATTTTTCTTTATATGCTCCAGTTTT
-TTTTAGACCTTCACCTGAAATGTGTGTATACAAAATCTAGGCCAGTCCAGCAGAGCCTAAAGGTAAAAAA
-TAAAATAATAAAAAATAAATAAAATCTAGCTCACTCCTTCACATCAAAATGGAGATACAGCTGTTAGCAT
-TAAATACCAAATAACCCATCTTGTCCTCAATAATTTTAAGCGCCTCTCTCCACCACATCTAACTCCTGTC
-AAAGGCATGTGCCCCTTCCGGGCGCTCTGCTGTGCTGCCAACCAACTGGCATGTGGACTCTGCAGGGTCC
-CTAACTGCCAAGCCCCACAGTGTGCCCTGAGGCTGCCCCTTCCTTCTAGCGGCTGCCCCCACTCGGCTTT
-GCTTTCCCTAGTTTCAGTTACTTGCGTTCAGCCAAGGTCTGAAACTAGGTGCGCACAGAGCGGTAAGACT
-GCGAGAGAAAGAGACCAGCTTTACAGGGGGTTTATCACAGTGCACCCTGACAGTCGTCAGCCTCACAGGG
-GGTTTATCACATTGCACCCTGACAGTCGTCAGCCTCACAGGGGGTTTATCACAGTGCACCCTTACAATCA
-TTCCATTTGATTCACAATTTTTTTAGTCTCTACTGTGCCTAACTTGTAAGTTAAATTTGATCAGAGGTGT
-GTTCCCAGAGGGGAAAACAGTATATACAGGGTTCAGTACTATCGCATTTCAGGCCTCCACCTGGGTCTTG
-GAATGTGTCCCCCGAGGGGTGATGACTACCTCAGTTGGATCTCCACAGGTCACAGTGACACAAGATAACC
-AAGACACCTCCCAAGGCTACCACAATGGGCCGCCCTCCACGTGCACATGGCCGGAGGAACTGCCATGTCG
-GAGGTGCAAGCACACCTGCGCATCAGAGTCCTTGGTGTGGAGGGAGGGACCAGCGCAGCTTCCAGCCATC
-CACCTGATGAACAGAACCTAGGGAAAGCCCCAGTTCTACTTACACCAGGAAAGGC
+>NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
+ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA
+CGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAAC
+TAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTG
+TTGCAGCCGATCATCAGCACATCTAGGTTTCGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTC
+CCTGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTAC
+GTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGG
+CTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATCAAACGTTCGGAT
+GCTCGAACTGCACCTCATGGTCATGTTATGGTTGAGCTGGTAGCAGAACTCGAAGGCATTCAGTACGGTC
+GTAGTGGTGAGACACTTGGTGTCCTTGTCCCTCATGTGGGCGAAATACCAGTGGCTTACCGCAAGGTTCT
+TCTTCGTAAGAACGGTAATAAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCATTTGACTTA
+GGCGACGAGCTTGGCACTGATCCTTATGAAGATTTTCAAGAAAACTGGAACACTAAACATAGCAGTGGTG
+TTACCCGTGAACTCATGCGTGAGCTTAACGGAGGGGCATACACTCGCTATGTCGATAACAACTTCTGTGG
+CCCTGATGGCTACCCTCTTGAGTGCATTAAAGACCTTCTAGCACGTGCTGGTAAAGCTTCATGCACTTTG
+TCCGAACAACTGGACTTTATTGACACTAAGAGGGGTGTATACTGCTGCCGTGAACATGAGCATGAAATTG
+CTTGGTACACGGAACGTTCTGAAAAGAGCTATGAATTGCAGACACCTTTTGAAATTAAATTGGCAAAGAA
+ATTTGACACCTTCAATGGGGAATGTCCAAATTTTGTATTTCCCTTAAATTCCATAATCAAGACTATTCAA
+CCAAGGGTTGAAAAGAAAAAGCTTGATGGCTTTATGGGTAGAATTCGATCTGTCTATCCAGTTGCGTCAC
+CAAATGAATGCAACCAAATGTGCCTTTCAACTCTCATGAAGTGTGATCATTGTGGTGAAACTTCATGGCA
+GACGGGCGATTTTGTTAAAGCCACTTGCGAATTTTGTGGCACTGAGAATTTGACTAAAGAAGGTGCCACT
+ACTTGTGGTTACTTACCCCAAAATGCTGTTGTTAAAATTTATTGTCCAGCATGTCACAATTCAGAAGTAG
+GACCTGAGCATAGTCTTGCCGAATACCATAATGAATCTGGCTTGAAAACCATTCTTCGTAAGGGTGGTCG
+CACTATTGCCTTTGGAGGCTGTGTGTTCTCTTATGTTGGTTGCCATAACAAGTGTGCCTATTGGGTTCCA
+CGTGCTAGCGCTAACATAGGTTGTAACCATACAGGTGTTGTTGGAGAAGGTTCCGAAGGTCTTAATGACA
+ACCTTCTTGAAATACTCCAAAAAGAGAAAGTCAACATCAATATTGTTGGTGACTTTAAACTTAATGAAGA
+GATCGCCATTATTTTGGCATCTTTTTCTGCTTCCACAAGTGCTTTTGTGGAAACTGTGAAAGGTTTGGAT
+TATAAAGCATTCAAACAAATTGTTGAATCCTGTGGTAATTTTAAAGTTACAAAAGGAAAAGCTAAAAAAG
+GTGCCTGGAATATTGGTGAACAGAAATCAATACTGAGTCCTCTTTATGCATTTGCATCAGAGGCTGCTCG
+TGTTGTACGATCAATTTTCTCCCGCACTCTTGAAACTGCTCAAAATTCTGTGCGTGTTTTACAGAAGGCC
+GCTATAACAATACTAGATGGAATTTCACAGTATTCACTGAGACTCATTGATGCTATGATGTTCACATCTG
+ATTTGGCTACTAACAATCTAGTTGTAATGGCCTACATTACAGGTGGTGTTGTTCAGTTGACTTCGCAGTG
+GCTAACTAACATCTTTGGCACTGTTTATGAAAAACTCAAACCCGTCCTTGATTGGCTTGAAGAGAAGTTT
+AAGGAAGGTGTAGAGTTTCTTAGAGACGGTTGGGAAATTGTTAAATTTATCTCAACCTGTGCTTGTGAAA
+TTGTCGGTGGACAAATTGTCACCTGTGCAAAGGAAATTAAGGAGAGTGTTCAGACATTCTTTAAGCTTGT
+AAATAAATTTTTGGCTTTGTGTGCTGACTCTATCATTATTGGTGGAGCTAAACTTAAAGCCTTGAATTTA
+GGTGAAACATTTGTCACGCACTCAAAGGGATTGTACAGAAAGTGTGTTAAATCCAGAGAAGAAACTGGCC
+TACTCATGCCTCTAAAAGCCCCAAAAGAAATTATCTTCTTAGAGGGAGAAACACTTCCCACAGAAGTGTT
+AACAGAGGAAGTTGTCTTGAAAACTGGTGATTTACAACCATTAGAACAACCTACTAGTGAAGCTGTTGAA
+GCTCCATTGGTTGGTACACCAGTTTGTATTAACGGGCTTATGTTGCTCGAAATCAAAGACACAGAAAAGT
+ACTGTGCCCTTGCACCTAATATGATGGTAACAAACAATACCTTCACACTCAAAGGCGGTGCACCAACAAA
+GGTTACTTTTGGTGATGACACTGTGATAGAAGTGCAAGGTTACAAGAGTGTGAATATCACTTTTGAACTT
+GATGAAAGGATTGATAAAGTACTTAATGAGAAGTGCTCTGCCTATACAGTTGAACTCGGTACAGAAGTAA
+ATGAGTTCGCCTGTGTTGTGGCAGATGCTGTCATAAAAACTTTGCAACCAGTATCTGAATTACTTACACC
+ACTGGGCATTGATTTAGATGAGTGGAGTATGGCTACATACTACTTATTTGATGAGTCTGGTGAGTTTAAA
+TTGGCTTCACATATGTATTGTTCTTTCTACCCTCCAGATGAGGATGAAGAAGAAGGTGATTGTGAAGAAG
+AAGAGTTTGAGCCATCAACTCAATATGAGTATGGTACTGAAGATGATTACCAAGGTAAACCTTTGGAATT
+TGGTGCCACTTCTGCTGCTCTTCAACCTGAAGAAGAGCAAGAAGAAGATTGGTTAGATGATGATAGTCAA
+CAAACTGTTGGTCAACAAGACGGCAGTGAGGACAATCAGACAACTACTATTCAAACAATTGTTGAGGTTC
+AACCTCAATTAGAGATGGAACTTACACCAGTTGTTCAGACTATTGAAGTGAATAGTTTTAGTGGTTATTT
+AAAACTTACTGACAATGTATACATTAAAAATGCAGACATTGTGGAAGAAGCTAAAAAGGTAAAACCAACA
+GTGGTTGTTAATGCAGCCAATGTTTACCTTAAACATGGAGGAGGTGTTGCAGGAGCCTTAAATAAGGCTA
+CTAACAATGCCATGCAAGTTGAATCTGATGATTACATAGCTACTAATGGACCACTTAAAGTGGGTGGTAG
+TTGTGTTTTAAGCGGACACAATCTTGCTAAACACTGTCTTCATGTTGTCGGCCCAAATGTTAACAAAGGT
+GAAGACATTCAACTTCTTAAGAGTGCTTATGAAAATTTTAATCAGCACGAAGTTCTACTTGCACCATTAT
+TATCAGCTGGTATTTTTGGTGCTGACCCTATACATTCTTTAAGAGTTTGTGTAGATACTGTTCGCACAAA
+TGTCTACTTAGCTGTCTTTGATAAAAATCTCTATGACAAACTTGTTTCAAGCTTTTTGGAAATGAAGAGT
+GAAAAGCAAGTTGAACAAAAGATCGCTGAGATTCCTAAAGAGGAAGTTAAGCCATTTATAACTGAAAGTA
+AACCTTCAGTTGAACAGAGAAAACAAGATGATAAGAAAATCAAAGCTTGTGTTGAAGAAGTTACAACAAC
+TCTGGAAGAAACTAAGTTCCTCACAGAAAACTTGTTACTTTATATTGACATTAATGGCAATCTTCATCCA
+GATTCTGCCACTCTTGTTAGTGACATTGACATCACTTTCTTAAAGAAAGATGCTCCATATATAGTGGGTG
+ATGTTGTTCAAGAGGGTGTTTTAACTGCTGTGGTTATACCTACTAAAAAGGCTGGTGGCACTACTGAAAT
+GCTAGCGAAAGCTTTGAGAAAAGTGCCAACAGACAATTATATAACCACTTACCCGGGTCAGGGTTTAAAT
+GGTTACACTGTAGAGGAGGCAAAGACAGTGCTTAAAAAGTGTAAAAGTGCCTTTTACATTCTACCATCTA
+TTATCTCTAATGAGAAGCAAGAAATTCTTGGAACTGTTTCTTGGAATTTGCGAGAAATGCTTGCACATGC
+AGAAGAAACACGCAAATTAATGCCTGTCTGTGTGGAAACTAAAGCCATAGTTTCAACTATACAGCGTAAA
+TATAAGGGTATTAAAATACAAGAGGGTGTGGTTGATTATGGTGCTAGATTTTACTTTTACACCAGTAAAA
+CAACTGTAGCGTCACTTATCAACACACTTAACGATCTAAATGAAACTCTTGTTACAATGCCACTTGGCTA
+TGTAACACATGGCTTAAATTTGGAAGAAGCTGCTCGGTATATGAGATCTCTCAAAGTGCCAGCTACAGTT
+TCTGTTTCTTCACCTGATGCTGTTACAGCGTATAATGGTTATCTTACTTCTTCTTCTAAAACACCTGAAG
+AACATTTTATTGAAACCATCTCACTTGCTGGTTCCTATAAAGATTGGTCCTATTCTGGACAATCTACACA
+ACTAGGTATAGAATTTCTTAAGAGAGGTGATAAAAGTGTATATTACACTAGTAATCCTACCACATTCCAC
+CTAGATGGTGAAGTTATCACCTTTGACAATCTTAAGACACTTCTTTCTTTGAGAGAAGTGAGGACTATTA
+AGGTGTTTACAACAGTAGACAACATTAACCTCCACACGCAAGTTGTGGACATGTCAATGACATATGGACA
+ACAGTTTGGTCCAACTTATTTGGATGGAGCTGATGTTACTAAAATAAAACCTCATAATTCACATGAAGGT
+AAAACATTTTATGTTTTACCTAATGATGACACTCTACGTGTTGAGGCTTTTGAGTACTACCACACAACTG
+ATCCTAGTTTTCTGGGTAGGTACATGTCAGCATTAAATCACACTAAAAAGTGGAAATACCCACAAGTTAA
+TGGTTTAACTTCTATTAAATGGGCAGATAACAACTGTTATCTTGCCACTGCATTGTTAACACTCCAACAA
+ATAGAGTTGAAGTTTAATCCACCTGCTCTACAAGATGCTTATTACAGAGCAAGGGCTGGTGAAGCTGCTA
+ACTTTTGTGCACTTATCTTAGCCTACTGTAATAAGACAGTAGGTGAGTTAGGTGATGTTAGAGAAACAAT
+GAGTTACTTGTTTCAACATGCCAATTTAGATTCTTGCAAAAGAGTCTTGAACGTGGTGTGTAAAACTTGT
+GGACAACAGCAGACAACCCTTAAGGGTGTAGAAGCTGTTATGTACATGGGCACACTTTCTTATGAACAAT
+TTAAGAAAGGTGTTCAGATACCTTGTACGTGTGGTAAACAAGCTACAAAATATCTAGTACAACAGGAGTC
+ACCTTTTGTTATGATGTCAGCACCACCTGCTCAGTATGAACTTAAGCATGGTACATTTACTTGTGCTAGT
+GAGTACACTGGTAATTACCAGTGTGGTCACTATAAACATATAACTTCTAAAGAAACTTTGTATTGCATAG
+ACGGTGCTTTACTTACAAAGTCCTCAGAATACAAAGGTCCTATTACGGATGTTTTCTACAAAGAAAACAG
+TTACACAACAACCATAAAACCAGTTACTTATAAATTGGATGGTGTTGTTTGTACAGAAATTGACCCTAAG
+TTGGACAATTATTATAAGAAAGACAATTCTTATTTCACAGAGCAACCAATTGATCTTGTACCAAACCAAC
+CATATCCAAACGCAAGCTTCGATAATTTTAAGTTTGTATGTGATAATATCAAATTTGCTGATGATTTAAA
+CCAGTTAACTGGTTATAAGAAACCTGCTTCAAGAGAGCTTAAAGTTACATTTTTCCCTGACTTAAATGGT
+GATGTGGTGGCTATTGATTATAAACACTACACACCCTCTTTTAAGAAAGGAGCTAAATTGTTACATAAAC
+CTATTGTTTGGCATGTTAACAATGCAACTAATAAAGCCACGTATAAACCAAATACCTGGTGTATACGTTG
+TCTTTGGAGCACAAAACCAGTTGAAACATCAAATTCGTTTGATGTACTGAAGTCAGAGGACGCGCAGGGA
+ATGGATAATCTTGCCTGCGAAGATCTAAAACCAGTCTCTGAAGAAGTAGTGGAAAATCCTACCATACAGA
+AAGACGTTCTTGAGTGTAATGTGAAAACTACCGAAGTTGTAGGAGACATTATACTTAAACCAGCAAATAA
+TAGTTTAAAAATTACAGAAGAGGTTGGCCACACAGATCTAATGGCTGCTTATGTAGACAATTCTAGTCTT
+ACTATTAAGAAACCTAATGAATTATCTAGAGTATTAGGTTTGAAAACCCTTGCTACTCATGGTTTAGCTG
+CTGTTAATAGTGTCCCTTGGGATACTATAGCTAATTATGCTAAGCCTTTTCTTAACAAAGTTGTTAGTAC
+AACTACTAACATAGTTACACGGTGTTTAAACCGTGTTTGTACTAATTATATGCCTTATTTCTTTACTTTA
+TTGCTACAATTGTGTACTTTTACTAGAAGTACAAATTCTAGAATTAAAGCATCTATGCCGACTACTATAG
+CAAAGAATACTGTTAAGAGTGTCGGTAAATTTTGTCTAGAGGCTTCATTTAATTATTTGAAGTCACCTAA
+TTTTTCTAAACTGATAAATATTATAATTTGGTTTTTACTATTAAGTGTTTGCCTAGGTTCTTTAATCTAC
+TCAACCGCTGCTTTAGGTGTTTTAATGTCTAATTTAGGCATGCCTTCTTACTGTACTGGTTACAGAGAAG
+GCTATTTGAACTCTACTAATGTCACTATTGCAACCTACTGTACTGGTTCTATACCTTGTAGTGTTTGTCT
+TAGTGGTTTAGATTCTTTAGACACCTATCCTTCTTTAGAAACTATACAAATTACCATTTCATCTTTTAAA
+TGGGATTTAACTGCTTTTGGCTTAGTTGCAGAGTGGTTTTTGGCATATATTCTTTTCACTAGGTTTTTCT
+ATGTACTTGGATTGGCTGCAATCATGCAATTGTTTTTCAGCTATTTTGCAGTACATTTTATTAGTAATTC
+TTGGCTTATGTGGTTAATAATTAATCTTGTACAAATGGCCCCGATTTCAGCTATGGTTAGAATGTACATC
+TTCTTTGCATCATTTTATTATGTATGGAAAAGTTATGTGCATGTTGTAGACGGTTGTAATTCATCAACTT
+GTATGATGTGTTACAAACGTAATAGAGCAACAAGAGTCGAATGTACAACTATTGTTAATGGTGTTAGAAG
+GTCCTTTTATGTCTATGCTAATGGAGGTAAAGGCTTTTGCAAACTACACAATTGGAATTGTGTTAATTGT
+GATACATTCTGTGCTGGTAGTACATTTATTAGTGATGAAGTTGCGAGAGACTTGTCACTACAGTTTAAAA
+GACCAATAAATCCTACTGACCAGTCTTCTTACATCGTTGATAGTGTTACAGTGAAGAATGGTTCCATCCA
+TCTTTACTTTGATAAAGCTGGTCAAAAGACTTATGAAAGACATTCTCTCTCTCATTTTGTTAACTTAGAC
+AACCTGAGAGCTAATAACACTAAAGGTTCATTGCCTATTAATGTTATAGTTTTTGATGGTAAATCAAAAT
+GTGAAGAATCATCTGCAAAATCAGCGTCTGTTTACTACAGTCAGCTTATGTGTCAACCTATACTGTTACT
+AGATCAGGCATTAGTGTCTGATGTTGGTGATAGTGCGGAAGTTGCAGTTAAAATGTTTGATGCTTACGTT
+AATACGTTTTCATCAACTTTTAACGTACCAATGGAAAAACTCAAAACACTAGTTGCAACTGCAGAAGCTG
+AACTTGCAAAGAATGTGTCCTTAGACAATGTCTTATCTACTTTTATTTCAGCAGCTCGGCAAGGGTTTGT
+TGATTCAGATGTAGAAACTAAAGATGTTGTTGAATGTCTTAAATTGTCACATCAATCTGACATAGAAGTT
+ACTGGCGATAGTTGTAATAACTATATGCTCACCTATAACAAAGTTGAAAACATGACACCCCGTGACCTTG
+GTGCTTGTATTGACTGTAGTGCGCGTCATATTAATGCGCAGGTAGCAAAAAGTCACAACATTGCTTTGAT
+ATGGAACGTTAAAGATTTCATGTCATTGTCTGAACAACTACGAAAACAAATACGTAGTGCTGCTAAAAAG
+AATAACTTACCTTTTAAGTTGACATGTGCAACTACTAGACAAGTTGTTAATGTTGTAACAACAAAGATAG
+CACTTAAGGGTGGTAAAATTGTTAATAATTGGTTGAAGCAGTTAATTAAAGTTACACTTGTGTTCCTTTT
+TGTTGCTGCTATTTTCTATTTAATAACACCTGTTCATGTCATGTCTAAACATACTGACTTTTCAAGTGAA
+ATCATAGGATACAAGGCTATTGATGGTGGTGTCACTCGTGACATAGCATCTACAGATACTTGTTTTGCTA
+ACAAACATGCTGATTTTGACACATGGTTTAGCCAGCGTGGTGGTAGTTATACTAATGACAAAGCTTGCCC
+ATTGATTGCTGCAGTCATAACAAGAGAAGTGGGTTTTGTCGTGCCTGGTTTGCCTGGCACGATATTACGC
+ACAACTAATGGTGACTTTTTGCATTTCTTACCTAGAGTTTTTAGTGCAGTTGGTAACATCTGTTACACAC
+CATCAAAACTTATAGAGTACACTGACTTTGCAACATCAGCTTGTGTTTTGGCTGCTGAATGTACAATTTT
+TAAAGATGCTTCTGGTAAGCCAGTACCATATTGTTATGATACCAATGTACTAGAAGGTTCTGTTGCTTAT
+GAAAGTTTACGCCCTGACACACGTTATGTGCTCATGGATGGCTCTATTATTCAATTTCCTAACACCTACC
+TTGAAGGTTCTGTTAGAGTGGTAACAACTTTTGATTCTGAGTACTGTAGGCACGGCACTTGTGAAAGATC
+AGAAGCTGGTGTTTGTGTATCTACTAGTGGTAGATGGGTACTTAACAATGATTATTACAGATCTTTACCA
+GGAGTTTTCTGTGGTGTAGATGCTGTAAATTTACTTACTAATATGTTTACACCACTAATTCAACCTATTG
+GTGCTTTGGACATATCAGCATCTATAGTAGCTGGTGGTATTGTAGCTATCGTAGTAACATGCCTTGCCTA
+CTATTTTATGAGGTTTAGAAGAGCTTTTGGTGAATACAGTCATGTAGTTGCCTTTAATACTTTACTATTC
+CTTATGTCATTCACTGTACTCTGTTTAACACCAGTTTACTCATTCTTACCTGGTGTTTATTCTGTTATTT
+ACTTGTACTTGACATTTTATCTTACTAATGATGTTTCTTTTTTAGCACATATTCAGTGGATGGTTATGTT
+CACACCTTTAGTACCTTTCTGGATAACAATTGCTTATATCATTTGTATTTCCACAAAGCATTTCTATTGG
+TTCTTTAGTAATTACCTAAAGAGACGTGTAGTCTTTAATGGTGTTTCCTTTAGTACTTTTGAAGAAGCTG
+CGCTGTGCACCTTTTTGTTAAATAAAGAAATGTATCTAAAGTTGCGTAGTGATGTGCTATTACCTCTTAC
+GCAATATAATAGATACTTAGCTCTTTATAATAAGTACAAGTATTTTAGTGGAGCAATGGATACAACTAGC
+TACAGAGAAGCTGCTTGTTGTCATCTCGCAAAGGCTCTCAATGACTTCAGTAACTCAGGTTCTGATGTTC
+TTTACCAACCACCACAAACCTCTATCACCTCAGCTGTTTTGCAGAGTGGTTTTAGAAAAATGGCATTCCC
+ATCTGGTAAAGTTGAGGGTTGTATGGTACAAGTAACTTGTGGTACAACTACACTTAACGGTCTTTGGCTT
+GATGACGTAGTTTACTGTCCAAGACATGTGATCTGCACCTCTGAAGACATGCTTAACCCTAATTATGAAG
+ATTTACTCATTCGTAAGTCTAATCATAATTTCTTGGTACAGGCTGGTAATGTTCAACTCAGGGTTATTGG
+ACATTCTATGCAAAATTGTGTACTTAAGCTTAAGGTTGATACAGCCAATCCTAAGACACCTAAGTATAAG
+TTTGTTCGCATTCAACCAGGACAGACTTTTTCAGTGTTAGCTTGTTACAATGGTTCACCATCTGGTGTTT
+ACCAATGTGCTATGAGGCCCAATTTCACTATTAAGGGTTCATTCCTTAATGGTTCATGTGGTAGTGTTGG
+TTTTAACATAGATTATGACTGTGTCTCTTTTTGTTACATGCACCATATGGAATTACCAACTGGAGTTCAT
+GCTGGCACAGACTTAGAAGGTAACTTTTATGGACCTTTTGTTGACAGGCAAACAGCACAAGCAGCTGGTA
+CGGACACAACTATTACAGTTAATGTTTTAGCTTGGTTGTACGCTGCTGTTATAAATGGAGACAGGTGGTT
+TCTCAATCGATTTACCACAACTCTTAATGACTTTAACCTTGTGGCTATGAAGTACAATTATGAACCTCTA
+ACACAAGACCATGTTGACATACTAGGACCTCTTTCTGCTCAAACTGGAATTGCCGTTTTAGATATGTGTG
+CTTCATTAAAAGAATTACTGCAAAATGGTATGAATGGACGTACCATATTGGGTAGTGCTTTATTAGAAGA
+TGAATTTACACCTTTTGATGTTGTTAGACAATGCTCAGGTGTTACTTTCCAAAGTGCAGTGAAAAGAACA
+ATCAAGGGTACACACCACTGGTTGTTACTCACAATTTTGACTTCACTTTTAGTTTTAGTCCAGAGTACTC
+AATGGTCTTTGTTCTTTTTTTTGTATGAAAATGCCTTTTTACCTTTTGCTATGGGTATTATTGCTATGTC
+TGCTTTTGCAATGATGTTTGTCAAACATAAGCATGCATTTCTCTGTTTGTTTTTGTTACCTTCTCTTGCC
+ACTGTAGCTTATTTTAATATGGTCTATATGCCTGCTAGTTGGGTGATGCGTATTATGACATGGTTGGATA
+TGGTTGATACTAGTTTGTCTGGTTTTAAGCTAAAAGACTGTGTTATGTATGCATCAGCTGTAGTGTTACT
+AATCCTTATGACAGCAAGAACTGTGTATGATGATGGTGCTAGGAGAGTGTGGACACTTATGAATGTCTTG
+ACACTCGTTTATAAAGTTTATTATGGTAATGCTTTAGATCAAGCCATTTCCATGTGGGCTCTTATAATCT
+CTGTTACTTCTAACTACTCAGGTGTAGTTACAACTGTCATGTTTTTGGCCAGAGGTATTGTTTTTATGTG
+TGTTGAGTATTGCCCTATTTTCTTCATAACTGGTAATACACTTCAGTGTATAATGCTAGTTTATTGTTTC
+TTAGGCTATTTTTGTACTTGTTACTTTGGCCTCTTTTGTTTACTCAACCGCTACTTTAGACTGACTCTTG
+GTGTTTATGATTACTTAGTTTCTACACAGGAGTTTAGATATATGAATTCACAGGGACTACTCCCACCCAA
+GAATAGCATAGATGCCTTCAAACTCAACATTAAATTGTTGGGTGTTGGTGGCAAACCTTGTATCAAAGTA
+GCCACTGTACAGTCTAAAATGTCAGATGTAAAGTGCACATCAGTAGTCTTACTCTCAGTTTTGCAACAAC
+TCAGAGTAGAATCATCATCTAAATTGTGGGCTCAATGTGTCCAGTTACACAATGACATTCTCTTAGCTAA
+AGATACTACTGAAGCCTTTGAAAAAATGGTTTCACTACTTTCTGTTTTGCTTTCCATGCAGGGTGCTGTA
+GACATAAACAAGCTTTGTGAAGAAATGCTGGACAACAGGGCAACCTTACAAGCTATAGCCTCAGAGTTTA
+GTTCCCTTCCATCATATGCAGCTTTTGCTACTGCTCAAGAAGCTTATGAGCAGGCTGTTGCTAATGGTGA
+TTCTGAAGTTGTTCTTAAAAAGTTGAAGAAGTCTTTGAATGTGGCTAAATCTGAATTTGACCGTGATGCA
+GCCATGCAACGTAAGTTGGAAAAGATGGCTGATCAAGCTATGACCCAAATGTATAAACAGGCTAGATCTG
+AGGACAAGAGGGCAAAAGTTACTAGTGCTATGCAGACAATGCTTTTCACTATGCTTAGAAAGTTGGATAA
+TGATGCACTCAACAACATTATCAACAATGCAAGAGATGGTTGTGTTCCCTTGAACATAATACCTCTTACA
+ACAGCAGCCAAACTAATGGTTGTCATACCAGACTATAACACATATAAAAATACGTGTGATGGTACAACAT
+TTACTTATGCATCAGCATTGTGGGAAATCCAACAGGTTGTAGATGCAGATAGTAAAATTGTTCAACTTAG
+TGAAATTAGTATGGACAATTCACCTAATTTAGCATGGCCTCTTATTGTAACAGCTTTAAGGGCCAATTCT
+GCTGTCAAATTACAGAATAATGAGCTTAGTCCTGTTGCACTACGACAGATGTCTTGTGCTGCCGGTACTA
+CACAAACTGCTTGCACTGATGACAATGCGTTAGCTTACTACAACACAACAAAGGGAGGTAGGTTTGTACT
+TGCACTGTTATCCGATTTACAGGATTTGAAATGGGCTAGATTCCCTAAGAGTGATGGAACTGGTACTATC
+TATACAGAACTGGAACCACCTTGTAGGTTTGTTACAGACACACCTAAAGGTCCTAAAGTGAAGTATTTAT
+ACTTTATTAAAGGATTAAACAACCTAAATAGAGGTATGGTACTTGGTAGTTTAGCTGCCACAGTACGTCT
+ACAAGCTGGTAATGCAACAGAAGTGCCTGCCAATTCAACTGTATTATCTTTCTGTGCTTTTGCTGTAGAT
+GCTGCTAAAGCTTACAAAGATTATCTAGCTAGTGGGGGACAACCAATCACTAATTGTGTTAAGATGTTGT
+GTACACACACTGGTACTGGTCAGGCAATAACAGTTACACCGGAAGCCAATATGGATCAAGAATCCTTTGG
+TGGTGCATCGTGTTGTCTGTACTGCCGTTGCCACATAGATCATCCAAATCCTAAAGGATTTTGTGACTTA
+AAAGGTAAGTATGTACAAATACCTACAACTTGTGCTAATGACCCTGTGGGTTTTACACTTAAAAACACAG
+TCTGTACCGTCTGCGGTATGTGGAAAGGTTATGGCTGTAGTTGTGATCAACTCCGCGAACCCATGCTTCA
+GTCAGCTGATGCACAATCGTTTTTAAACGGGTTTGCGGTGTAAGTGCAGCCCGTCTTACACCGTGCGGCA
+CAGGCACTAGTACTGATGTCGTATACAGGGCTTTTGACATCTACAATGATAAAGTAGCTGGTTTTGCTAA
+ATTCCTAAAAACTAATTGTTGTCGCTTCCAAGAAAAGGACGAAGATGACAATTTAATTGATTCTTACTTT
+GTAGTTAAGAGACACACTTTCTCTAACTACCAACATGAAGAAACAATTTATAATTTACTTAAGGATTGTC
+CAGCTGTTGCTAAACATGACTTCTTTAAGTTTAGAATAGACGGTGACATGGTACCACATATATCACGTCA
+ACGTCTTACTAAATACACAATGGCAGACCTCGTCTATGCTTTAAGGCATTTTGATGAAGGTAATTGTGAC
+ACATTAAAAGAAATACTTGTCACATACAATTGTTGTGATGATGATTATTTCAATAAAAAGGACTGGTATG
+ATTTTGTAGAAAACCCAGATATATTACGCGTATACGCCAACTTAGGTGAACGTGTACGCCAAGCTTTGTT
+AAAAACAGTACAATTCTGTGATGCCATGCGAAATGCTGGTATTGTTGGTGTACTGACATTAGATAATCAA
+GATCTCAATGGTAACTGGTATGATTTCGGTGATTTCATACAAACCACGCCAGGTAGTGGAGTTCCTGTTG
+TAGATTCTTATTATTCATTGTTAATGCCTATATTAACCTTGACCAGGGCTTTAACTGCAGAGTCACATGT
+TGACACTGACTTAACAAAGCCTTACATTAAGTGGGATTTGTTAAAATATGACTTCACGGAAGAGAGGTTA
+AAACTCTTTGACCGTTATTTTAAATATTGGGATCAGACATACCACCCAAATTGTGTTAACTGTTTGGATG
+ACAGATGCATTCTGCATTGTGCAAACTTTAATGTTTTATTCTCTACAGTGTTCCCACCTACAAGTTTTGG
+ACCACTAGTGAGAAAAATATTTGTTGATGGTGTTCCATTTGTAGTTTCAACTGGATACCACTTCAGAGAG
+CTAGGTGTTGTACATAATCAGGATGTAAACTTACATAGCTCTAGACTTAGTTTTAAGGAATTACTTGTGT
+ATGCTGCTGACCCTGCTATGCACGCTGCTTCTGGTAATCTATTACTAGATAAACGCACTACGTGCTTTTC
+AGTAGCTGCACTTACTAACAATGTTGCTTTTCAAACTGTCAAACCCGGTAATTTTAACAAAGACTTCTAT
+GACTTTGCTGTGTCTAAGGGTTTCTTTAAGGAAGGAAGTTCTGTTGAATTAAAACACTTCTTCTTTGCTC
+AGGATGGTAATGCTGCTATCAGCGATTATGACTACTATCGTTATAATCTACCAACAATGTGTGATATCAG
+ACAACTACTATTTGTAGTTGAAGTTGTTGATAAGTACTTTGATTGTTACGATGGTGGCTGTATTAATGCT
+AACCAAGTCATCGTCAACAACCTAGACAAATCAGCTGGTTTTCCATTTAATAAATGGGGTAAGGCTAGAC
+TTTATTATGATTCAATGAGTTATGAGGATCAAGATGCACTTTTCGCATATACAAAACGTAATGTCATCCC
+TACTATAACTCAAATGAATCTTAAGTATGCCATTAGTGCAAAGAATAGAGCTCGCACCGTAGCTGGTGTC
+TCTATCTGTAGTACTATGACCAATAGACAGTTTCATCAAAAATTATTGAAATCAATAGCCGCCACTAGAG
+GAGCTACTGTAGTAATTGGAACAAGCAAATTCTATGGTGGTTGGCACAACATGTTAAAAACTGTTTATAG
+TGATGTAGAAAACCCTCACCTTATGGGTTGGGATTATCCTAAATGTGATAGAGCCATGCCTAACATGCTT
+AGAATTATGGCCTCACTTGTTCTTGCTCGCAAACATACAACGTGTTGTAGCTTGTCACACCGTTTCTATA
+GATTAGCTAATGAGTGTGCTCAAGTATTGAGTGAAATGGTCATGTGTGGCGGTTCACTATATGTTAAACC
+AGGTGGAACCTCATCAGGAGATGCCACAACTGCTTATGCTAATAGTGTTTTTAACATTTGTCAAGCTGTC
+ACGGCCAATGTTAATGCACTTTTATCTACTGATGGTAACAAAATTGCCGATAAGTATGTCCGCAATTTAC
+AACACAGACTTTATGAGTGTCTCTATAGAAATAGAGATGTTGACACAGACTTTGTGAATGAGTTTTACGC
+ATATTTGCGTAAACATTTCTCAATGATGATACTCTCTGACGATGCTGTTGTGTGTTTCAATAGCACTTAT
+GCATCTCAAGGTCTAGTGGCTAGCATAAAGAACTTTAAGTCAGTTCTTTATTATCAAAACAATGTTTTTA
+TGTCTGAAGCAAAATGTTGGACTGAGACTGACCTTACTAAAGGACCTCATGAATTTTGCTCTCAACATAC
+AATGCTAGTTAAACAGGGTGATGATTATGTGTACCTTCCTTACCCAGATCCATCAAGAATCCTAGGGGCC
+GGCTGTTTTGTAGATGATATCGTAAAAACAGATGGTACACTTATGATTGAACGGTTCGTGTCTTTAGCTA
+TAGATGCTTACCCACTTACTAAACATCCTAATCAGGAGTATGCTGATGTCTTTCATTTGTACTTACAATA
+CATAAGAAAGCTACATGATGAGTTAACAGGACACATGTTAGACATGTATTCTGTTATGCTTACTAATGAT
+AACACTTCAAGGTATTGGGAACCTGAGTTTTATGAGGCTATGTACACACCGCATACAGTCTTACAGGCTG
+TTGGGGCTTGTGTTCTTTGCAATTCACAGACTTCATTAAGATGTGGTGCTTGCATACGTAGACCATTCTT
+ATGTTGTAAATGCTGTTACGACCATGTCATATCAACATCACATAAATTAGTCTTGTCTGTTAATCCGTAT
+GTTTGCAATGCTCCAGGTTGTGATGTCACAGATGTGACTCAACTTTACTTAGGAGGTATGAGCTATTATT
+GTAAATCACATAAACCACCCATTAGTTTTCCATTGTGTGCTAATGGACAAGTTTTTGGTTTATATAAAAA
+TACATGTGTTGGTAGCGATAATGTTACTGACTTTAATGCAATTGCAACATGTGACTGGACAAATGCTGGT
+GATTACATTTTAGCTAACACCTGTACTGAAAGACTCAAGCTTTTTGCAGCAGAAACGCTCAAAGCTACTG
+AGGAGACATTTAAACTGTCTTATGGTATTGCTACTGTACGTGAAGTGCTGTCTGACAGAGAATTACATCT
+TTCATGGGAAGTTGGTAAACCTAGACCACCACTTAACCGAAATTATGTCTTTACTGGTTATCGTGTAACT
+AAAAACAGTAAAGTACAAATAGGAGAGTACACCTTTGAAAAAGGTGACTATGGTGATGCTGTTGTTTACC
+GAGGTACAACAACTTACAAATTAAATGTTGGTGATTATTTTGTGCTGACATCACATACAGTAATGCCATT
+AAGTGCACCTACACTAGTGCCACAAGAGCACTATGTTAGAATTACTGGCTTATACCCAACACTCAATATC
+TCAGATGAGTTTTCTAGCAATGTTGCAAATTATCAAAAGGTTGGTATGCAAAAGTATTCTACACTCCAGG
+GACCACCTGGTACTGGTAAGAGTCATTTTGCTATTGGCCTAGCTCTCTACTACCCTTCTGCTCGCATAGT
+GTATACAGCTTGCTCTCATGCCGCTGTTGATGCACTATGTGAGAAGGCATTAAAATATTTGCCTATAGAT
+AAATGTAGTAGAATTATACCTGCACGTGCTCGTGTAGAGTGTTTTGATAAATTCAAAGTGAATTCAACAT
+TAGAACAGTATGTCTTTTGTACTGTAAATGCATTGCCTGAGACGACAGCAGATATAGTTGTCTTTGATGA
+AATTTCAATGGCCACAAATTATGATTTGAGTGTTGTCAATGCCAGATTACGTGCTAAGCACTATGTGTAC
+ATTGGCGACCCTGCTCAATTACCTGCACCACGCACATTGCTAACTAAGGGCACACTAGAACCAGAATATT
+TCAATTCAGTGTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGTCC
+TGCTGAAATTGTTGACACTGTGAGTGCTTTGGTTTATGATAATAAGCTTAAAGCACATAAAGACAAATCA
+GCTCAATGCTTTAAAATGTTTTATAAGGGTGTTATCACGCATGATGTTTCATCTGCAATTAACAGGCCAC
+AAATAGGCGTGGTAAGAGAATTCCTTACACGTAACCCTGCTTGGAGAAAAGCTGTCTTTATTTCACCTTA
+TAATTCACAGAATGCTGTAGCCTCAAAGATTTTGGGACTACCAACTCAAACTGTTGATTCATCACAGGGC
+TCAGAATATGACTATGTCATATTCACTCAAACCACTGAAACAGCTCACTCTTGTAATGTAAACAGATTTA
+ATGTTGCTATTACCAGAGCAAAAGTAGGCATACTTTGCATAATGTCTGATAGAGACCTTTATGACAAGTT
+GCAATTTACAAGTCTTGAAATTCCACGTAGGAATGTGGCAACTTTACAAGCTGAAAATGTAACAGGACTC
+TTTAAAGATTGTAGTAAGGTAATCACTGGGTTACATCCTACACAGGCACCTACACACCTCAGTGTTGACA
+CTAAATTCAAAACTGAAGGTTTATGTGTTGACATACCTGGCATACCTAAGGACATGACCTATAGAAGACT
+CATCTCTATGATGGGTTTTAAAATGAATTATCAAGTTAATGGTTACCCTAACATGTTTATCACCCGCGAA
+GAAGCTATAAGACATGTACGTGCATGGATTGGCTTCGATGTCGAGGGGTGTCATGCTACTAGAGAAGCTG
+TTGGTACCAATTTACCTTTACAGCTAGGTTTTTCTACAGGTGTTAACCTAGTTGCTGTACCTACAGGTTA
+TGTTGATACACCTAATAATACAGATTTTTCCAGAGTTAGTGCTAAACCACCGCCTGGAGATCAATTTAAA
+CACCTCATACCACTTATGTACAAAGGACTTCCTTGGAATGTAGTGCGTATAAAGATTGTACAAATGTTAA
+GTGACACACTTAAAAATCTCTCTGACAGAGTCGTATTTGTCTTATGGGCACATGGCTTTGAGTTGACATC
+TATGAAGTATTTTGTGAAAATAGGACCTGAGCGCACCTGTTGTCTATGTGATAGACGTGCCACATGCTTT
+TCCACTGCTTCAGACACTTATGCCTGTTGGCATCATTCTATTGGATTTGATTACGTCTATAATCCGTTTA
+TGATTGATGTTCAACAATGGGGTTTTACAGGTAACCTACAAAGCAACCATGATCTGTATTGTCAAGTCCA
+TGGTAATGCACATGTAGCTAGTTGTGATGCAATCATGACTAGGTGTCTAGCTGTCCACGAGTGCTTTGTT
+AAGCGTGTTGACTGGACTATTGAATATCCTATAATTGGTGATGAACTGAAGATTAATGCGGCTTGTAGAA
+AGGTTCAACACATGGTTGTTAAAGCTGCATTATTAGCAGACAAATTCCCAGTTCTTCACGACATTGGTAA
+CCCTAAAGCTATTAAGTGTGTACCTCAAGCTGATGTAGAATGGAAGTTCTATGATGCACAGCCTTGTAGT
+GACAAAGCTTATAAAATAGAAGAATTATTCTATTCTTATGCCACACATTCTGACAAATTCACAGATGGTG
+TATGCCTATTTTGGAATTGCAATGTCGATAGATATCCTGCTAATTCCATTGTTTGTAGATTTGACACTAG
+AGTGCTATCTAACCTTAACTTGCCTGGTTGTGATGGTGGCAGTTTGTATGTAAATAAACATGCATTCCAC
+ACACCAGCTTTTGATAAAAGTGCTTTTGTTAATTTAAAACAATTACCATTTTTCTATTACTCTGACAGTC
+CATGTGAGTCTCATGGAAAACAAGTAGTGTCAGATATAGATTATGTACCACTAAAGTCTGCTACGTGTAT
+AACACGTTGCAATTTAGGTGGTGCTGTCTGTAGACATCATGCTAATGAGTACAGATTGTATCTCGATGCT
+TATAACATGATGATCTCAGCTGGCTTTAGCTTGTGGGTTTACAAACAATTTGATACTTATAACCTCTGGA
+ACACTTTTACAAGACTTCAGAGTTTAGAAAATGTGGCTTTTAATGTTGTAAATAAGGGACACTTTGATGG
+ACAACAGGGTGAAGTACCAGTTTCTATCATTAATAACACTGTTTACACAAAAGTTGATGGTGTTGATGTA
+GAATTGTTTGAAAATAAAACAACATTACCTGTTAATGTAGCATTTGAGCTTTGGGCTAAGCGCAACATTA
+AACCAGTACCAGAGGTGAAAATACTCAATAATTTGGGTGTGGACATTGCTGCTAATACTGTGATCTGGGA
+CTACAAAAGAGATGCTCCAGCACATATATCTACTATTGGTGTTTGTTCTATGACTGACATAGCCAAGAAA
+CCAACTGAAACGATTTGTGCACCACTCACTGTCTTTTTTGATGGTAGAGTTGATGGTCAAGTAGACTTAT
+TTAGAAATGCCCGTAATGGTGTTCTTATTACAGAAGGTAGTGTTAAAGGTTTACAACCATCTGTAGGTCC
+CAAACAAGCTAGTCTTAATGGAGTCACATTAATTGGAGAAGCCGTAAAAACACAGTTCAATTATTATAAG
+AAAGTTGATGGTGTTGTCCAACAATTACCTGAAACTTACTTTACTCAGAGTAGAAATTTACAAGAATTTA
+AACCCAGGAGTCAAATGGAAATTGATTTCTTAGAATTAGCTATGGATGAATTCATTGAACGGTATAAATT
+AGAAGGCTATGCCTTCGAACATATCGTTTATGGAGATTTTAGTCATAGTCAGTTAGGTGGTTTACATCTA
+CTGATTGGACTAGCTAAACGTTTTAAGGAATCACCTTTTGAATTAGAAGATTTTATTCCTATGGACAGTA
+CAGTTAAAAACTATTTCATAACAGATGCGCAAACAGGTTCATCTAAGTGTGTGTGTTCTGTTATTGATTT
+ATTACTTGATGATTTTGTTGAAATAATAAAATCCCAAGATTTATCTGTAGTTTCTAAGGTTGTCAAAGTG
+ACTATTGACTATACAGAAATTTCATTTATGCTTTGGTGTAAAGATGGCCATGTAGAAACATTTTACCCAA
+AATTACAATCTAGTCAAGCGTGGCAACCGGGTGTTGCTATGCCTAATCTTTACAAAATGCAAAGAATGCT
+ATTAGAAAAGTGTGACCTTCAAAATTATGGTGATAGTGCAACATTACCTAAAGGCATAATGATGAATGTC
+GCAAAATATACTCAACTGTGTCAATATTTAAACACATTAACATTAGCTGTACCCTATAATATGAGAGTTA
+TACATTTTGGTGCTGGTTCTGATAAAGGAGTTGCACCAGGTACAGCTGTTTTAAGACAGTGGTTGCCTAC
+GGGTACGCTGCTTGTCGATTCAGATCTTAATGACTTTGTCTCTGATGCAGATTCAACTTTGATTGGTGAT
+TGTGCAACTGTACATACAGCTAATAAATGGGATCTCATTATTAGTGATATGTACGACCCTAAGACTAAAA
+ATGTTACAAAAGAAAATGACTCTAAAGAGGGTTTTTTCACTTACATTTGTGGGTTTATACAACAAAAGCT
+AGCTCTTGGAGGTTCCGTGGCTATAAAGATAACAGAACATTCTTGGAATGCTGATCTTTATAAGCTCATG
+GGACACTTCGCATGGTGGACAGCCTTTGTTACTAATGTGAATGCGTCATCATCTGAAGCATTTTTAATTG
+GATGTAATTATCTTGGCAAACCACGCGAACAAATAGATGGTTATGTCATGCATGCAAATTACATATTTTG
+GAGGAATACAAATCCAATTCAGTTGTCTTCCTATTCTTTATTTGACATGAGTAAATTTCCCCTTAAATTA
+AGGGGTACTGCTGTTATGTCTTTAAAAGAAGGTCAAATCAATGATATGATTTTATCTCTTCTTAGTAAAG
+GTAGACTTATAATTAGAGAAAACAACAGAGTTGTTATTTCTAGTGATGTTCTTGTTAACAACTAAACGAA
+CAATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCA
+ATTACCCCCTGCATACACTAATTCTTTCACACGTGGTGTTTATTACCCTGACAAAGTTTTCAGATCCTCA
+GTTTTACATTCAACTCAGGACTTGTTCTTACCTTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATG
+TCTCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGC
+TTCCACTGAGAAGTCTAACATAATAAGAGGCTGGATTTTTGGTACTACTTTAGATTCGAAGACCCAGTCC
+CTACTTATTGTTAATAACGCTACTAATGTTGTTATTAAAGTCTGTGAATTTCAATTTTGTAATGATCCAT
+TTTTGGGTGTTTATTACCACAAAAACAACAAAAGTTGGATGGAAAGTGAGTTCAGAGTTTATTCTAGTGC
+GAATAATTGCACTTTTGAATATGTCTCTCAGCCTTTTCTTATGGACCTTGAAGGAAAACAGGGTAATTTC
+AAAAATCTTAGGGAATTTGTGTTTAAGAATATTGATGGTTATTTTAAAATATATTCTAAGCACACGCCTA
+TTAATTTAGTGCGTGATCTCCCTCAGGGTTTTTCGGCTTTAGAACCATTGGTAGATTTGCCAATAGGTAT
+TAACATCACTAGGTTTCAAACTTTACTTGCTTTACATAGAAGTTATTTGACTCCTGGTGATTCTTCTTCA
+GGTTGGACAGCTGGTGCTGCAGCTTATTATGTGGGTTATCTTCAACCTAGGACTTTTCTATTAAAATATA
+ATGAAAATGGAACCATTACAGATGCTGTAGACTGTGCACTTGACCCTCTCTCAGAAACAAAGTGTACGTT
+GAAATCCTTCACTGTAGAAAAAGGAATCTATCAAACTTCTAACTTTAGAGTCCAACCAACAGAATCTATT
+GTTAGATTTCCTAATATTACAAACTTGTGCCCTTTTGGTGAAGTTTTTAACGCCACCAGATTTGCATCTG
+TTTATGCTTGGAACAGGAAGAGAATCAGCAACTGTGTTGCTGATTATTCTGTCCTATATAATTCCGCATC
+ATTTTCCACTTTTAAGTGTTATGGAGTGTCTCCTACTAAATTAAATGATCTCTGCTTTACTAATGTCTAT
+GCAGATTCATTTGTAATTAGAGGTGATGAAGTCAGACAAATCGCTCCAGGGCAAACTGGAAAGATTGCTG
+ATTATAATTATAAATTACCAGATGATTTTACAGGCTGCGTTATAGCTTGGAATTCTAACAATCTTGATTC
+TAAGGTTGGTGGTAATTATAATTACCTGTATAGATTGTTTAGGAAGTCTAATCTCAAACCTTTTGAGAGA
+GATATTTCAACTGAAATCTATCAGGCCGGTAGCACACCTTGTAATGGTGTTGAAGGTTTTAATTGTTACT
+TTCCTTTACAATCATATGGTTTCCAACCCACTAATGGTGTTGGTTACCAACCATACAGAGTAGTAGTACT
+TTCTTTTGAACTTCTACATGCACCAGCAACTGTTTGTGGACCTAAAAAGTCTACTAATTTGGTTAAAAAC
+AAATGTGTCAATTTCAACTTCAATGGTTTAACAGGCACAGGTGTTCTTACTGAGTCTAACAAAAAGTTTC
+TGCCTTTCCAACAATTTGGCAGAGACATTGCTGACACTACTGATGCTGTCCGTGATCCACAGACACTTGA
+GATTCTTGACATTACACCATGTTCTTTTGGTGGTGTCAGTGTTATAACACCAGGAACAAATACTTCTAAC
+CAGGTTGCTGTTCTTTATCAGGATGTTAACTGCACAGAAGTCCCTGTTGCTATTCATGCAGATCAACTTA
+CTCCTACTTGGCGTGTTTATTCTACAGGTTCTAATGTTTTTCAAACACGTGCAGGCTGTTTAATAGGGGC
+TGAACATGTCAACAACTCATATGAGTGTGACATACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACT
+CAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTG
+GTGCAGAAAATTCAGTTGCTTACTCTAATAACTCTATTGCCATACCCACAAATTTTACTATTAGTGTTAC
+CACAGAAATTCTACCAGTGTCTATGACCAAGACATCAGTAGATTGTACAATGTACATTTGTGGTGATTCA
+ACTGAATGCAGCAATCTTTTGTTGCAATATGGCAGTTTTTGTACACAATTAAACCGTGCTTTAACTGGAA
+TAGCTGTTGAACAAGACAAAAACACCCAAGAAGTTTTTGCACAAGTCAAACAAATTTACAAAACACCACC
+AATTAAAGATTTTGGTGGTTTTAATTTTTCACAAATATTACCAGATCCATCAAAACCAAGCAAGAGGTCA
+TTTATTGAAGATCTACTTTTCAACAAAGTGACACTTGCAGATGCTGGCTTCATCAAACAATATGGTGATT
+GCCTTGGTGATATTGCTGCTAGAGACCTCATTTGTGCACAAAAGTTTAACGGCCTTACTGTTTTGCCACC
+TTTGCTCACAGATGAAATGATTGCTCAATACACTTCTGCACTGTTAGCGGGTACAATCACTTCTGGTTGG
+ACCTTTGGTGCAGGTGCTGCATTACAAATACCATTTGCTATGCAAATGGCTTATAGGTTTAATGGTATTG
+GAGTTACACAGAATGTTCTCTATGAGAACCAAAAATTGATTGCCAACCAATTTAATAGTGCTATTGGCAA
+AATTCAAGACTCACTTTCTTCCACAGCAAGTGCACTTGGAAAACTTCAAGATGTGGTCAACCAAAATGCA
+CAAGCTTTAAACACGCTTGTTAAACAACTTAGCTCCAATTTTGGTGCAATTTCAAGTGTTTTAAATGATA
+TCCTTTCACGTCTTGACAAAGTTGAGGCTGAAGTGCAAATTGATAGGTTGATCACAGGCAGACTTCAAAG
+TTTGCAGACATATGTGACTCAACAATTAATTAGAGCTGCAGAAATCAGAGCTTCTGCTAATCTTGCTGCT
+ACTAAAATGTCAGAGTGTGTACTTGGACAATCAAAAAGAGTTGATTTTTGTGGAAAGGGCTATCATCTTA
+TGTCCTTCCCTCAGTCAGCACCTCATGGTGTAGTCTTCTTGCATGTGACTTATGTCCCTGCACAAGAAAA
+GAACTTCACAACTGCTCCTGCCATTTGTCATGATGGAAAAGCACACTTTCCTCGTGAAGGTGTCTTTGTT
+TCAAATGGCACACACTGGTTTGTAACACAAAGGAATTTTTATGAACCACAAATCATTACTACAGACAACA
+CATTTGTGTCTGGTAACTGTGATGTTGTAATAGGAATTGTCAACAACACAGTTTATGATCCTTTGCAACC
+TGAATTAGACTCATTCAAGGAGGAGTTAGATAAATATTTTAAGAATCATACATCACCAGATGTTGATTTA
+GGTGACATCTCTGGCATTAATGCTTCAGTTGTAAACATTCAAAAAGAAATTGACCGCCTCAATGAGGTTG
+CCAAGAATTTAAATGAATCTCTCATCGATCTCCAAGAACTTGGAAAGTATGAGCAGTATATAAAATGGCC
+ATGGTACATTTGGCTAGGTTTTATAGCTGGCTTGATTGCCATAGTAATGGTGACAATTATGCTTTGCTGT
+ATGACCAGTTGCTGTAGTTGTCTCAAGGGCTGTTGTTCTTGTGGATCCTGCTGCAAATTTGATGAAGACG
+ACTCTGAGCCAGTGCTCAAAGGAGTCAAATTACATTACACATAAACGAACTTATGGATTTGTTTATGAGA
+ATCTTCACAATTGGAACTGTAACTTTGAAGCAAGGTGAAATCAAGGATGCTACTCCTTCAGATTTTGTTC
+GCGCTACTGCAACGATACCGATACAAGCCTCACTCCCTTTCGGATGGCTTATTGTTGGCGTTGCACTTCT
+TGCTGTTTTTCAGAGCGCTTCCAAAATCATAACCCTCAAAAAGAGATGGCAACTAGCACTCTCCAAGGGT
+GTTCACTTTGTTTGCAACTTGCTGTTGTTGTTTGTAACAGTTTACTCACACCTTTTGCTCGTTGCTGCTG
+GCCTTGAAGCCCCTTTTCTCTATCTTTATGCTTTAGTCTACTTCTTGCAGAGTATAAACTTTGTAAGAAT
+AATAATGAGGCTTTGGCTTTGCTGGAAATGCCGTTCCAAAAACCCATTACTTTATGATGCCAACTATTTT
+CTTTGCTGGCATACTAATTGTTACGACTATTGTATACCTTACAATAGTGTAACTTCTTCAATTGTCATTA
+CTTCAGGTGATGGCACAACAAGTCCTATTTCTGAACATGACTACCAGATTGGTGGTTATACTGAAAAATG
+GGAATCTGGAGTAAAAGACTGTGTTGTATTACACAGTTACTTCACTTCAGACTATTACCAGCTGTACTCA
+ACTCAATTGAGTACAGACACTGGTGTTGAACATGTTACCTTCTTCATCTACAATAAAATTGTTGATGAGC
+CTGAAGAACATGTCCAAATTCACACAATCGACGGTTCATCCGGAGTTGTTAATCCAGTAATGGAACCAAT
+TTATGATGAACCGACGACGACTACTAGCGTGCCTTTGTAAGCACAAGCTGATGAGTACGAACTTATGTAC
+TCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAATAGCGTACTTCTTTTTCTTGCTTTCGTGGTAT
+TCTTGCTAGTTACACTAGCCATCCTTACTGCGCTTCGATTGTGTGCGTACTGCTGCAATATTGTTAACGT
+GAGTCTTGTAAAACCTTCTTTTTACGTTTACTCTCGTGTTAAAAATCTGAATTCTTCTAGAGTTCCTGAT
+CTTCTGGTCTAAACGAACTAAATATTATATTAGTTTTTCTGTTTGGAACTTTAATTTTAGCCATGGCAGA
+TTCCAACGGTACTATTACCGTTGAAGAGCTTAAAAAGCTCCTTGAACAATGGAACCTAGTAATAGGTTTC
+CTATTCCTTACATGGATTTGTCTTCTACAATTTGCCTATGCCAACAGGAATAGGTTTTTGTATATAATTA
+AGTTAATTTTCCTCTGGCTGTTATGGCCAGTAACTTTAGCTTGTTTTGTGCTTGCTGCTGTTTACAGAAT
+AAATTGGATCACCGGTGGAATTGCTATCGCAATGGCTTGTCTTGTAGGCTTGATGTGGCTCAGCTACTTC
+ATTGCTTCTTTCAGACTGTTTGCGCGTACGCGTTCCATGTGGTCATTCAATCCAGAAACTAACATTCTTC
+TCAACGTGCCACTCCATGGCACTATTCTGACCAGACCGCTTCTAGAAAGTGAACTCGTAATCGGAGCTGT
+GATCCTTCGTGGACATCTTCGTATTGCTGGACACCATCTAGGACGCTGTGACATCAAGGACCTGCCTAAA
+GAAATCACTGTTGCTACATCACGAACGCTTTCTTATTACAAATTGGGAGCTTCGCAGCGTGTAGCAGGTG
+ACTCAGGTTTTGCTGCATACAGTCGCTACAGGATTGGCAACTATAAATTAAACACAGACCATTCCAGTAG
+CAGTGACAATATTGCTTTGCTTGTACAGTAAGTGACAACAGATGTTTCATCTCGTTGACTTTCAGGTTAC
+TATAGCAGAGATATTACTAATTATTATGAGGACTTTTAAAGTTTCCATTTGGAATCTTGATTACATCATA
+AACCTCATAATTAAAAATTTATCTAAGTCACTAACTGAGAATAAATATTCTCAATTAGATGAAGAGCAAC
+CAATGGAGATTGATTAAACGAACATGAAAATTATTCTTTTCTTGGCACTGATAACACTCGCTACTTGTGA
+GCTTTATCACTACCAAGAGTGTGTTAGAGGTACAACAGTACTTTTAAAAGAACCTTGCTCTTCTGGAACA
+TACGAGGGCAATTCACCATTTCATCCTCTAGCTGATAACAAATTTGCACTGACTTGCTTTAGCACTCAAT
+TTGCTTTTGCTTGTCCTGACGGCGTAAAACACGTCTATCAGTTACGTGCCAGATCAGTTTCACCTAAACT
+GTTCATCAGACAAGAGGAAGTTCAAGAACTTTACTCTCCAATTTTTCTTATTGTTGCGGCAATAGTGTTT
+ATAACACTTTGCTTCACACTCAAAAGAAAGACAGAATGATTGAACTTTCATTAATTGACTTCTATTTGTG
+CTTTTTAGCCTTTCTGCTATTCCTTGTTTTAATTATGCTTATTATCTTTTGGTTCTCACTTGAACTGCAA
+GATCATAATGAAACTTGTCACGCCTAAACGAACATGAAATTTCTTGTTTTCTTAGGAATCATCACAACTG
+TAGCTGCATTTCACCAAGAATGTAGTTTACAGTCATGTACTCAACATCAACCATATGTAGTTGATGACCC
+GTGTCCTATTCACTTCTATTCTAAATGGTATATTAGAGTAGGAGCTAGAAAATCAGCACCTTTAATTGAA
+TTGTGCGTGGATGAGGCTGGTTCTAAATCACCCATTCAGTACATCGATATCGGTAATTATACAGTTTCCT
+GTTTACCTTTTACAATTAATTGCCAGGAACCTAAATTGGGTAGTCTTGTAGTGCGTTGTTCGTTCTATGA
+AGACTTTTTAGAGTATCATGACGTTCGTGTTGTTTTAGATTTCATCTAAACGAACAAACTAAAATGTCTG
+ATAATGGACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGCAG
+TAACCAGAATGGAGAACGCAGTGGGGCGCGATCAAAACAACGTCGGCCCCAAGGTTTACCCAATAATACT
+GCGTCTTGGTTCACCGCTCTCACTCAACATGGCAAGGAAGACCTTAAATTCCCTCGAGGACAAGGCGTTC
+CAATTAACACCAATAGCAGTCCAGATGACCAAATTGGCTACTACCGAAGAGCTACCAGACGAATTCGTGG
+TGGTGACGGTAAAATGAAAGATCTCAGTCCAAGATGGTATTTCTACTACCTAGGAACTGGGCCAGAAGCT
+GGACTTCCCTATGGTGCTAACAAAGACGGCATCATATGGGTTGCAACTGAGGGAGCCTTGAATACACCAA
+AAGATCACATTGGCACCCGCAATCCTGCTAACAATGCTGCAATCGTGCTACAACTTCCTCAAGGAACAAC
+ATTGCCAAAAGGCTTCTACGCAGAAGGGAGCAGAGGCGGCAGTCAAGCCTCTTCTCGTTCCTCATCACGT
+AGTCGCAACAGTTCAAGAAATTCAACTCCAGGCAGCAGTAGGGGAACTTCTCCTGCTAGAATGGCTGGCA
+ATGGCGGTGATGCTGCTCTTGCTTTGCTGCTGCTTGACAGATTGAACCAGCTTGAGAGCAAAATGTCTGG
+TAAAGGCCAACAACAACAAGGCCAAACTGTCACTAAGAAATCTGCTGCTGAGGCTTCTAAGAAGCCTCGG
+CAAAAACGTACTGCCACTAAAGCATACAATGTAACACAAGCTTTCGGCAGACGTGGTCCAGAACAAACCC
+AAGGAAATTTTGGGGACCAGGAACTAATCAGACAAGGAACTGATTACAAACATTGGCCGCAAATTGCACA
+ATTTGCCCCCAGCGCTTCAGCGTTCTTCGGAATGTCGCGCATTGGCATGGAAGTCACACCTTCGGGAACG
+TGGTTGACCTACACAGGTGCCATCAAATTGGATGACAAAGATCCAAATTTCAAAGATCAAGTCATTTTGC
+TGAATAAGCATATTGACGCATACAAAACATTCCCACCAACAGAGCCTAAAAAGGACAAAAAGAAGAAGGC
+TGATGAAACTCAAGCCTTACCGCAGAGACAGAAGAAACAGCAAACTGTGACTCTTCTTCCTGCTGCAGAT
+TTGGATGATTTCTCCAAACAATTGCAACAATCCATGAGCAGTGCTGACTCAACTCAGGCCTAAACTCATG
+CAGACCACACAAGGCAGATGGGCTATATAAACGTTTTCGCTTTTCCGTTTACGATATATAGTCTACTCTT
+GTGCAGAATGAATTCTCGTAACTACATAGCACAAGTAGATGTAGTTAACTTTAATCTCACATAGCAATCT
+TTAATCAGTGTGTAACATTAGGGAGGACTTGAAAGAGCCACCACATTTTCACCGAGGCCACGCGGAGTAC
+GATCGAGTGTACAGTGAACAATGCTAGGGAGAGCTGCCTATATGGAAGAGCCCTAATGTGTAAAATTAAT
+TTTAGTAGTGCTATCCCCATGTGATTTTAATAGCTTCTTAGGAGAATGACAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAA
diff --git a/image/website.png b/image/website.png
new file mode 100644
index 0000000..fa57ca5
--- /dev/null
+++ b/image/website.png
Binary files differ
diff --git a/paper/paper.bib b/paper/paper.bib
index e69de29..bcb9c0b 100644
--- a/paper/paper.bib
+++ b/paper/paper.bib
@@ -0,0 +1,16 @@
+@book{CWL,
+title = "Common Workflow Language, v1.0",
+abstract = "The Common Workflow Language (CWL) is an informal, multi-vendor working group consisting of various organizations and individuals that have an interest in portability of data analysis workflows. Our goal is to create specifications that enable data scientists to describe analysis tools and workflows that are powerful, easy to use, portable, and support reproducibility.CWL builds on technologies such as JSON-LD and Avro for data modeling and Docker for portable runtime environments. CWL is designed to express workflows for data-intensive science, such as Bioinformatics, Medical Imaging, Chemistry, Physics, and Astronomy.This is v1.0 of the CWL tool and workflow specification, released on 2016-07-08",
+keywords = "cwl, workflow, specification",
+author = "Brad Chapman and John Chilton and Michael Heuer and Andrey Kartashov and Dan Leehr and Herv{\'e} M{\'e}nager and Maya Nedeljkovich and Matt Scales and Stian Soiland-Reyes and Luka Stojanovic",
+editor = "Peter Amstutz and Crusoe, {Michael R.} and Nebojša Tijanić",
+note = "Specification, product of the Common Workflow Language working group. http://www.commonwl.org/v1.0/",
+year = "2016",
+month = "7",
+day = "8",
+doi = "10.6084/m9.figshare.3115156.v2",
+language = "English",
+publisher = "figshare",
+address = "United States",
+
+} \ No newline at end of file
diff --git a/paper/paper.md b/paper/paper.md
index caa9903..48eac07 100644
--- a/paper/paper.md
+++ b/paper/paper.md
@@ -1,8 +1,9 @@
---
-title: 'Public Sequence Resource for COVID-19'
+title: 'CPSR: COVID-19 Public Sequence Resource'
+title_short: 'CPSR: COVID-19 Public Sequence Resource'
tags:
- Sequencing
- - COVID
+ - COVID-19
authors:
- name: Pjotr Prins
orcid: 0000-0002-8021-9162
@@ -19,22 +20,45 @@ authors:
- name: Erik Garrison
orcid: 0000
affiliation: 5
- - name: Michael Crusoe
- orcid: 0000
- affiliation: 6
+ - name: Michael R. Crusoe
+ orcid: 0000-0002-2961-9670
+ affiliation: 6, 2
- name: Rutger Vos
orcid: 0000
affiliation: 7
- - Michael Heuer
- orcid: 0000
+ - name: Michael Heuer
+ orcid: 0000-0002-9052-6000
affiliation: 8
-
+ - name: Adam M Novak
+ orcid: 0000-0001-5828-047X
+ affiliation: 5
+ - name: Alex Kanitz
+ orcid: 0000
+ affiliation: 10
+ - name: Jerven Bolleman
+ orcid: 0000
+ affiliation: 11
+ - name: Joep de Ligt
+ orcid: 0000
+ affiliation: 12
+ - name: Bonface Munyoki
+ orcid: 0000
+ affiliation: 13
affiliations:
- name: Department of Genetics, Genomics and Informatics, The University of Tennessee Health Science Center, Memphis, TN, USA.
index: 1
- name: Curii, Boston, USA
index: 2
+ - name: UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA.
+ index: 5
+ - name: Department of Computer Science, Faculty of Sciences, Vrije Universiteit Amsterdam, The Netherlands
+ index: 6
+ - name: RISE Lab, University of California Berkeley, Berkeley, CA, USA.
+ index: 8
date: 11 April 2020
+event: COVID2020
+group: Public Sequence Uploader
+authors_short: Pjotr Prins & Peter Amstutz \emph{et al.}
bibliography: paper.bib
---
@@ -49,13 +73,48 @@ pasting above link (or yours) with
https://github.com/biohackrxiv/bhxiv-gen-pdf
+Note that author order will change!
+
-->
# Introduction
-As part of the one week COVID-19 Biohackathion 2020, we formed a
-working group on creating a public sequence resource for Corona virus.
-
+As part of the COVID-19 Biohackathion 2020 we formed a working
+group to create a COVID-19 Public Sequence Resource (CPSR) for
+Corona virus sequences. The general idea was to create a
+repository that has a low barrier to entry for uploading sequence
+data using best practices. I.e., data published with a creative
+commons 4.0 (CC-4.0) license with metadata using state-of-the art
+standards and, perhaps most importantly, providing standardized
+workflows that get triggered on upload, so that results are
+immediately available in standardized data formats.
+
+Existing data repositories for viral data include GISAID, EBI ENA
+and NCBI. These repositories allow for free sharing of data, but
+do not add value in terms of running immediate
+computations. Also, GISAID, at this point, has the most complete
+collection of genetic sequence data of influenza viruses and
+related clinical and epidemiological data through its
+database. But, due to a restricted license, data submitted to
+GISAID can not be used for online web services and on-the-fly
+computation. In addition GISAID registration which can take weeks
+and, painfully, forces users to download sequences one at a time
+to do any type of analysis. In our opinion this does not fit a
+pandemic scenario where fast turnaround times are key and data
+analysis has to be agile.
+
+We managed to create a useful sequence uploader utility within
+one week by leveraging existing technologies, such as the Arvados
+Cloud platform [@Arvados], the Common Workflow Langauge (CWL)
+[@CWL], Docker images built with Debian packages, and the many
+free and open source software packages that are available for
+bioinformatics.
+
+The source code for the CLI uploader and web uploader can be
+found [here](https://github.com/arvados/bh20-seq-resource)
+(FIXME: we'll have a full page). The CWL workflow definitions can
+be found [here](https://github.com/hpobio-lab/viral-analysis) and
+on CWL hub (FIXME).
<!--
@@ -73,38 +132,98 @@ working group on creating a public sequence resource for Corona virus.
## Cloud computing backend
-Peter, Pjotr, MichaelC
+The development of CPSR was accelerated by using the Arvados
+Cloud platform. Arvados is an open source platform for managing,
+processing, and sharing genomic and other large scientific and
+biomedical data. The Arvados instance was deployed on Amazon AWS
+for testing and development and a project was created that
+allows for uploading data.
-## A command-line sequence uploader
+## Sequence uploader
-Peter, Pjotr
+We wrote a Python-based uploader that authenticates with Arvados
+using a token. Data gets validated for being a FASTA sequence,
+FASTQ raw data and/or metadata in the form of JSON LD that gets
+validated against a schema. The uploader can be used
+from a command line or using a simple web interface.
-## Metadata uploader
+## Creating a Pangenome
-With Thomas
+### FASTA to GFA workflow
-## FASTA to GFA workflow
+The first workflow (1) we implemented was a FASTA to Graphical
+Fragment Assembly (GFA) Format conversion. When someone uploads a
+sequence in FASTA format it gets combined with all known viral
+sequences in our storage to generate a pangenome or variation
+graph (VG). The full pangenome is made available as a
+downloadable GFA file together with a visualisation (Figure 1).
-Michael Heuer
+### FASTQ to GFA workflow
-## BAM to GFA workflow
+In the next step we introduced a workflow (2) that takes raw
+sequence data in fastq format and converts that into FASTA.
+This FASTA file, in turn, gets fed to workflow (1) to generate
+the pangenome.
-Tazro & Erik
+## Creating linked data workflow
-## Phylogeny app
+We created a workflow (3) that takes GFA and turns that into
+RDF. Together with the metadata at upload time a single RDF
+resource is compiled that can be linked against external
+resources such as Uniprot and Wikidata. The generated RDF file
+can be hosted in any triple store and queried using SPARQL.
-With Rutger
+## Creating a Phylogeny workflow
-## RDF app
+WIP
-Jerven?
-
-## EBI app
-
-?
+## Other workflows?
# Discussion
-Future work...
+CPSR is a data repository with computational pipelines that will
+persist during pandemics. Unlike other data repositories for
+Sars-COV-2 we created a repository that immediately computes the
+pangenome of all available data and presents that in useful
+formats for futher analysis, including visualisations, GFA and
+RDF. Code and data are available and written using best practises
+and state-of-the-art standards. CPSR can be deployed by anyone,
+anywhere.
+
+CPSR is designed to abide by FAIR data principles (expand...)
+
+CPSR is primed with viral data coming from repositories that have
+no sharing restrictions. The metadata includes relevant
+attribution to uploaders. Some institutes have already committed
+to uploading their data to CPSR first so as to warrant sharing
+for computation.
+
+CPSR is currently running on an Arvados cluster in the cloud. To
+ascertain the service remains running we will source money from
+project during pandemics. The workflows are written in CWL which
+means they can be deployed on any infrastructure that runs
+CWL. One of the advantages of the CC-4.0 license is that we make
+available all uploaded sequence and meta data, as well as
+results, online to anyone. So the data can be mirrored by any
+party. This guarantees the data will live on.
+
+<!-- Future work... -->
+
+We aim to add more workflows to CPSR, for example to prepare
+sequence data for submitting in other public repositories, such
+as EBI ENA and GISAID. This will allow researchers to share data
+in multiple systems without pain, circumventing current sharing
+restrictions.
+
+# Acknowledgements
+
+We thank the COVID-19 BioHackathon 2020 and ELIXIR for creating a
+unique event that triggered many collaborations. We thank Curii
+Corporation for their financial support for creating and running
+Arvados instances. We thank Amazon AWS for their financial
+support to run COVID-19 workflows. We also want to thank the
+other working groups in the BioHackathon who generously
+contributed onthologies, workflows and software.
+
# References