From 37652786cb6605a4862e820f2ba85f2fe818952f Mon Sep 17 00:00:00 2001 From: Adam Novak Date: Tue, 7 Apr 2020 11:58:33 -0700 Subject: Make README more didactic --- README.md | 168 ++++++++++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 141 insertions(+), 27 deletions(-) (limited to 'README.md') diff --git a/README.md b/README.md index ec9afb1..a6fe052 100644 --- a/README.md +++ b/README.md @@ -1,48 +1,162 @@ # Sequence uploader -This repository provides a sequence uploader for the +This repository provides a sequence uploader for the COVID-19 Virtual Biohackathon's Public Sequence Resource project. You can use it to upload the genomes of SARS-CoV-2 samples to make them publicly and freely available to other researchers. -# Run +To get started, first [install the uploader](#installation), and use the `bh20-seq-uploader` command to [uplaod your data](#usage). -Run the uploader with a FASTA file and accompanying metadata: +# Installation - python3 bh20sequploader/main.py example/sequence.fasta example/metadata.json +There are several ways to install the uploader. The most portable is with a [virtualenv](#installation-with-virtualenv). -# Add a workflow +## Installation with `virtualenv` -get your SARS-CoV-2 sequences from GenBank in seqs.fa +1. **Prepare your system.** You need to make sure you have Python, and the ability to install modules such as `pycurl` and `pyopenssl`. On Ubuntu 18.04, you can run: ```sh -minimap2 -cx asm20 -X seqs.fa seqs.fa >seqs.paf -seqwish -s seqs.fa -p seqs.paf -g seqs.gfa -odgi build -g seqs.gfa -s -o seqs.odgi -odgi viz -i seqs.odgi -o seqs.png -x 4000 -y 500 -R -P 5 +sudo apt update +sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev ``` -from https://github.com/virtual-biohackathons/covid-19-bh20/wiki/Pangenome#pangenome-model-from-available-genomes +2. **Create and enter your virtualenv.** Go to some memorable directory and make and enter a virtualenv: -# Installation +```sh +virtualenv --python python3 venv +. venv/bin/activate +``` + +Note that you will need to repeat the `. venv/bin/activate` step from this directory to enter your virtualenv whenever you want to use the installed tool. + +3. **Install the tool.** Once in your virtualenv, install this project: + +```sh +pip3 install git+https://github.com/arvados/bh20-seq-resource.git@master +``` + +4. **Test the tool.** Try running: + +```sh +bh20-seq-uploader --help +``` + +It should print some instructions about how to use the uploader. + +**Make sure you are in your virtualenv whenever you run the tool!** If you ever can't run the tool, and your prompt doesn't say `(venv)`, try going to the directory where you put the virtualenv and running `. venv/bin/activate`. It only works for the current terminal window; you will need to run it again if you open a new terminal. + +## Installation with `pip3 --user` + +If you don't want to have to enter a virtualenv every time you use the uploader, you can use the `--user` feature of `pip3` to install the tool for your user. + +1. **Prepare your system.** Just as for the `virtualenv` method, you need to install some dependencies. On Ubuntu 18.04, you can run: + +```sh +sudo apt update +sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev +``` + +2. **Install the tool.** You can run: + +```sh +pip3 install --user git+https://github.com/arvados/bh20-seq-resource.git@master +``` + +3. **Make sure the tool is on your `PATH`.** THe `pip3` command will install the uploader in `.local/bin` inside your home directory. Your shell may not know to look for commands there by default. To fix this for the terminal you currently have open, run: + +```sh +export PATH=$PATH:$HOME/.local/bin +``` + +To make this change permanent, assuming your shell is Bash, run: + +```sh +echo 'export PATH=$PATH:$HOME/.local/bin' >>~/.bashrc +``` + +4. **Test the tool.** Try running: + +```sh +bh20-seq-uploader --help +``` + +It should print some instructions about how to use the uploader. -This tool requires the arvados Python module which can be installed -using .deb or .rpm packages through -https://doc.arvados.org/v2.0/sdk/python/sdk-python.html. The actual -code lives [here](https://github.com/arvados/arvados/tree/master/sdk/python) and -suggests a local install using +## Installation from Source for Development - apt-get install libcurl4-openssl-dev libssl1.0-dev - pip3 install --user arvados-python-client +If you plan to contribute to the project, you may want to install an editable copy from source. With this method, changes to the source code are automatically reflected in the installed copy of the tool. -Next update +1. **Prepare your system.** On Ubuntu 18.04, you can run: - export PATH=$PATH:$HOME/.local/bin +```sh +sudo apt update +sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev +``` + +2. **Clone and enter the repository.** You can run: + +```sh +git clone https://github.com/arvados/bh20-seq-resource.git +cd bh20-seq-resource +``` + +3. **Create and enter a virtualenv.** Go to some memorable directory and make and enter a virtualenv: + +```sh +virtualenv --python python3 venv +. venv/bin/activate +``` + +Note that you will need to repeat the `. venv/bin/activate` step from this directory to enter your virtualenv whenever you want to use the installed tool. + +4. **Install the checked-out repository in editable mode.** Once in your virtualenv, install with this special pip command: + +```sh +pip3 install -e . +``` + +5. **Test the tool.** Try running: + +```sh +bh20-seq-uploader --help +``` + +It should print some instructions about how to use the uploader. + +## Installation with GNU Guix -## Install with GNU Guix +Another way to install this tool is inside a [GNU Guix Environment](https://guix.gnu.org/manual/en/html_node/Invoking-guix-environment.html), which can handle installing dependencies for you even when you don't have root access on an Ubuntu system. -Set up a container: +1. **Set up and enter a container with the necessary dependencies.** After installing Guix as `~/opt/guix/bin/guix`, run: + +```sh +~/opt/guix/bin/guix environment -C guix --ad-hoc git python openssl python-pycurl nss-certs +``` + +2. **Install the tool.** From there you can follow the [user installation instructions](#installation-with-pip3---user). In brief: + +```sh +pip3 install --user git+https://github.com/arvados/bh20-seq-resource.git@master +``` + +# Usage + +Run the uploader with a FASTA file and accompanying metadata file in [JSON-LD format](https://json-ld.org/): + +```sh +bh20-seq-uploader example/sequence.fasta example/metadata.json +``` + +## Workflow for Generating a Pangenome + +All these uploaded sequences are being fed into a workflow to generate a [pangenome](https://academic.oup.com/bib/article/19/1/118/2566735) for the virus. You can replicate this workflow yourself. + +Get your SARS-CoV-2 sequences from GenBank in `seqs.fa`, and then run: + +```sh +minimap2 -cx asm20 -X seqs.fa seqs.fa >seqs.paf +seqwish -s seqs.fa -p seqs.paf -g seqs.gfa +odgi build -g seqs.gfa -s -o seqs.odgi +odgi viz -i seqs.odgi -o seqs.png -x 4000 -y 500 -R -P 5 +``` - ~/opt/guix/bin/guix environment -C guix --ad-hoc python openssl python-pycurl nss-certs - pip3 install --user arvados-python-client +For more information on building pangenome models, [see this wiki page](https://github.com/virtual-biohackathons/covid-19-bh20/wiki/Pangenome#pangenome-model-from-available-genomes). -Pip installed the following modules - arvados-python-client-2.0.1 ciso8601-2.1.3 future-0.18.2 google-api-python-client-1.6.7 httplib2-0.17.1 oauth2client-4.1.3 pyasn1-0.4.8 pyasn1-modules-0.2.8 rsa-4.0 ruamel.yaml-0.15.77 six-1.14.0 uritemplate-3.0.1 ws4py-0.5.1 -- cgit v1.2.3 From 14ff178ed7f77a996f47e2115e2a1429f6b69356 Mon Sep 17 00:00:00 2001 From: Adam Novak Date: Wed, 8 Apr 2020 12:12:49 -0700 Subject: Spell correctly --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'README.md') diff --git a/README.md b/README.md index a6fe052..1448f4c 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ This repository provides a sequence uploader for the COVID-19 Virtual Biohackathon's Public Sequence Resource project. You can use it to upload the genomes of SARS-CoV-2 samples to make them publicly and freely available to other researchers. -To get started, first [install the uploader](#installation), and use the `bh20-seq-uploader` command to [uplaod your data](#usage). +To get started, first [install the uploader](#installation), and use the `bh20-seq-uploader` command to [upload your data](#usage). # Installation -- cgit v1.2.3 From bf93a6a2fec690eee4bff4891469cd5947102b3a Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Thu, 9 Apr 2020 17:02:38 -0500 Subject: Moved Guix documentation into separate file (as it confused people ;) --- README.md | 21 +++++---------------- bh20sequploader/main.py | 2 +- doc/INSTALL.md | 31 +++++++++++++++++++++++++++++++ 3 files changed, 37 insertions(+), 17 deletions(-) create mode 100644 doc/INSTALL.md (limited to 'README.md') diff --git a/README.md b/README.md index a6fe052..3a8e5f0 100644 --- a/README.md +++ b/README.md @@ -122,19 +122,7 @@ It should print some instructions about how to use the uploader. ## Installation with GNU Guix -Another way to install this tool is inside a [GNU Guix Environment](https://guix.gnu.org/manual/en/html_node/Invoking-guix-environment.html), which can handle installing dependencies for you even when you don't have root access on an Ubuntu system. - -1. **Set up and enter a container with the necessary dependencies.** After installing Guix as `~/opt/guix/bin/guix`, run: - -```sh -~/opt/guix/bin/guix environment -C guix --ad-hoc git python openssl python-pycurl nss-certs -``` - -2. **Install the tool.** From there you can follow the [user installation instructions](#installation-with-pip3---user). In brief: - -```sh -pip3 install --user git+https://github.com/arvados/bh20-seq-resource.git@master -``` +For running/developing the uploader with GNU Guix see [INSTALL.md](./doc/INSTALL.md) # Usage @@ -148,7 +136,7 @@ bh20-seq-uploader example/sequence.fasta example/metadata.json All these uploaded sequences are being fed into a workflow to generate a [pangenome](https://academic.oup.com/bib/article/19/1/118/2566735) for the virus. You can replicate this workflow yourself. -Get your SARS-CoV-2 sequences from GenBank in `seqs.fa`, and then run: +An example is to get your SARS-CoV-2 sequences from GenBank in `seqs.fa`, and then run a series of commands ```sh minimap2 -cx asm20 -X seqs.fa seqs.fa >seqs.paf @@ -157,6 +145,7 @@ odgi build -g seqs.gfa -s -o seqs.odgi odgi viz -i seqs.odgi -o seqs.png -x 4000 -y 500 -R -P 5 ``` -For more information on building pangenome models, [see this wiki page](https://github.com/virtual-biohackathons/covid-19-bh20/wiki/Pangenome#pangenome-model-from-available-genomes). - +Here we convert such a pipeline into the Common Workflow Language (CWL) and +sources can be found [here](https://github.com/hpobio-lab/viral-analysis/tree/master/cwl/pangenome-generate). +For more information on building pangenome models, [see this wiki page](https://github.com/virtual-biohackathons/covid-19-bh20/wiki/Pangenome#pangenome-model-from-available-genomes). diff --git a/bh20sequploader/main.py b/bh20sequploader/main.py index 56cbe22..bf74ea5 100644 --- a/bh20sequploader/main.py +++ b/bh20sequploader/main.py @@ -6,7 +6,7 @@ import json import urllib.request import socket import getpass -from .qc_metadata import qc_metadata +import qc_metadata ARVADOS_API_HOST='lugli.arvadosapi.com' ARVADOS_API_TOKEN='2fbebpmbo3rw3x05ueu2i6nx70zhrsb1p22ycu3ry34m4x4462' diff --git a/doc/INSTALL.md b/doc/INSTALL.md new file mode 100644 index 0000000..c5c486c --- /dev/null +++ b/doc/INSTALL.md @@ -0,0 +1,31 @@ +# INSTALLATION + +Other options for running this tool. + +## GNU Guix + +Another way to install this tool is inside a [GNU Guix Environment](https://guix.gnu.org/manual/en/html_node/Invoking-guix-environment.html), which can handle installing dependencies for you even when you don't have root access on an Ubuntu system. + +1. **Set up and enter a container with the necessary dependencies.** After installing Guix as `~/opt/guix/bin/guix`, run: + +```sh +~/opt/guix/bin/guix environment -C guix --ad-hoc git python openssl python-pycurl nss-certs +``` + +2. **Install the tool.** From there you can follow the [user installation instructions](#installation-with-pip3---user). In brief: + +```sh +pip3 install --user schema-salad arvados-python-client +``` + +Pip installed the following modules + +``` +arvados-python-client-2.0.1 ciso8601-2.1.3 future-0.18.2 google-api-python-client-1.6.7 httplib2-0.17.1 oauth2client-4.1.3 pyasn1-0.4.8 pyasn1-modules-0.2.8 rsa-4.0 ruamel.yaml-0.15.77 six-1.14.0 uritemplate-3.0.1 ws4py-0.5.1 +``` + +3. Run the tool directly with + +```sh +~/opt/guix/bin/guix environment guix --ad-hoc git python openssl python-pycurl nss-certs -- python3 bh20sequploader/main.py +``` -- cgit v1.2.3 From 2cd6623aa0ddfe4e42b2d434e0523773bb3536ef Mon Sep 17 00:00:00 2001 From: Adam Novak Date: Thu, 9 Apr 2020 15:52:23 -0700 Subject: Copy over/combine top-level project components --- Dockerfile | 19 +++++++++++++++++++ README.md | 45 +++++++++++++++++++++++++++++++++++++++++++++ bh20sequploader/main.py | 7 ++----- setup.py | 6 +++++- 4 files changed, 71 insertions(+), 6 deletions(-) create mode 100644 Dockerfile (limited to 'README.md') diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 0000000..43fa8f2 --- /dev/null +++ b/Dockerfile @@ -0,0 +1,19 @@ +# Dockerfile for containerizing the web interface +FROM python:3.6-jessie +WORKDIR /app + +RUN pip3 install gunicorn + +ADD LICENSE /app/ +ADD gittaggers.py /app/ +ADD setup.py /app/ +ADD README.md /app/ +ADD example /app/example +ADD bh20seqanalyzer /app/bh20simplewebuploader +ADD bh20sequploader /app/bh20sequploader +ADD bh20simplewebuploader /app/bh20simplewebuploader + +RUN pip3 install -e . + +ENV PORT 8080 +CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:8080", "bh20simplewebuploader.main:app"] diff --git a/README.md b/README.md index a6fe052..4667310 100644 --- a/README.md +++ b/README.md @@ -159,4 +159,49 @@ odgi viz -i seqs.odgi -o seqs.png -x 4000 -y 500 -R -P 5 For more information on building pangenome models, [see this wiki page](https://github.com/virtual-biohackathons/covid-19-bh20/wiki/Pangenome#pangenome-model-from-available-genomes). +# Web Interface + +This project comes with a simple web server that lets you use the sequence uploader from a browser. It will work as long as you install the packager with the `web` extra. + +To run it locally: + +``` +virtualenv --python python3 venv +. venv/bin/activate +pip install -e .[web] +env FLASK_APP=bh20simplewebuploader/main.py flask run +``` + +Then visit [http://127.0.0.1:5000/](http://127.0.0.1:5000/). + +## Production + +For production deployment, you can use [gunicorn](https://flask.palletsprojects.com/en/1.1.x/deploying/wsgi-standalone/#gunicorn): + +``` +pip3 install gunicorn +gunicorn bh20simplewebuploader.main:app +``` + +This runs on [http://127.0.0.1:8000/](http://127.0.0.1:8000/) by default, but can be adjusted with various [gunicorn options](http://docs.gunicorn.org/en/latest/run.html#commonly-used-arguments) + +## GNU Guix + +To run the web uploader in a GNU Guix environment + +``` +guix environment guix --ad-hoc git python python-flask python-pyyaml nss-certs --network openssl -- env FLASK_APP=bh20simplewebuploader/main.py flask run +``` + +The containerized version looks like + +``` +guix environment -C guix --ad-hoc git python python-flask python-pyyaml nss-certs --network openssl +``` + +and + +``` +env FLASK_APP=bh20simplewebuploader/main.py flask run +``` diff --git a/bh20sequploader/main.py b/bh20sequploader/main.py index 8b8fefe..d3ebc0c 100644 --- a/bh20sequploader/main.py +++ b/bh20sequploader/main.py @@ -6,7 +6,6 @@ import json import urllib.request import socket import getpass -from .qc_metadata import qc_metadata ARVADOS_API_HOST='lugli.arvadosapi.com' ARVADOS_API_TOKEN='2fbebpmbo3rw3x05ueu2i6nx70zhrsb1p22ycu3ry34m4x4462' @@ -20,8 +19,6 @@ def main(): api = arvados.api(host=ARVADOS_API_HOST, token=ARVADOS_API_TOKEN, insecure=True) - qc_metadata(args.metadata.name) - col = arvados.collection.Collection(api_client=api) print("Reading FASTA") @@ -32,8 +29,8 @@ def main(): f.write(r) r = args.sequence.read(65536) - print("Reading metadata") - with col.open("metadata.yaml", "w") as f: + print("Reading JSONLD") + with col.open("metadata.jsonld", "w") as f: r = args.metadata.read(65536) print(r[0:20]) while r: diff --git a/setup.py b/setup.py index 48c25aa..41ace7b 100644 --- a/setup.py +++ b/setup.py @@ -16,6 +16,7 @@ except ImportError: tagger = egg_info_cmd.egg_info install_requires = ["arvados-python-client", "schema-salad"] +web_requires = ["flask", "pyyaml"] needs_pytest = {"pytest", "test", "ptr"}.intersection(sys.argv) pytest_runner = ["pytest < 6", "pytest-runner < 5"] if needs_pytest else [] @@ -29,9 +30,12 @@ setup( author="Peter Amstutz", author_email="peter.amstutz@curii.com", license="Apache 2.0", - packages=["bh20sequploader", "bh20seqanalyzer"], + packages=["bh20sequploader", "bh20seqanalyzer", "bh20simplewebuploader"], package_data={"bh20sequploader": ["bh20seq-schema.yml"]}, install_requires=install_requires, + extras_require={ + 'web': web_requires + }, setup_requires=[] + pytest_runner, tests_require=["pytest<5"], entry_points={ -- cgit v1.2.3 From d53e1e98b800d7dc5720de0b3c14c94452159315 Mon Sep 17 00:00:00 2001 From: Adam Novak Date: Thu, 9 Apr 2020 16:11:03 -0700 Subject: Move the web uploader GUIX instructions to the GUIX file --- README.md | 18 ------------------ doc/INSTALL.md | 20 ++++++++++++++++++++ 2 files changed, 20 insertions(+), 18 deletions(-) (limited to 'README.md') diff --git a/README.md b/README.md index 960472e..d83eaac 100644 --- a/README.md +++ b/README.md @@ -176,23 +176,5 @@ gunicorn bh20simplewebuploader.main:app This runs on [http://127.0.0.1:8000/](http://127.0.0.1:8000/) by default, but can be adjusted with various [gunicorn options](http://docs.gunicorn.org/en/latest/run.html#commonly-used-arguments) -## GNU Guix -To run the web uploader in a GNU Guix environment - -``` -guix environment guix --ad-hoc git python python-flask python-pyyaml nss-certs --network openssl -- env FLASK_APP=bh20simplewebuploader/main.py flask run -``` - -The containerized version looks like - -``` -guix environment -C guix --ad-hoc git python python-flask python-pyyaml nss-certs --network openssl -``` - -and - -``` -env FLASK_APP=bh20simplewebuploader/main.py flask run -``` diff --git a/doc/INSTALL.md b/doc/INSTALL.md index c5c486c..f7fd811 100644 --- a/doc/INSTALL.md +++ b/doc/INSTALL.md @@ -29,3 +29,23 @@ arvados-python-client-2.0.1 ciso8601-2.1.3 future-0.18.2 google-api-python-clien ```sh ~/opt/guix/bin/guix environment guix --ad-hoc git python openssl python-pycurl nss-certs -- python3 bh20sequploader/main.py ``` + +### Using the Web Uploader + +To run the web uploader in a GNU Guix environment + +``` +guix environment guix --ad-hoc git python python-flask python-pyyaml nss-certs --network openssl -- env FLASK_APP=bh20simplewebuploader/main.py flask run +``` + +The containerized version looks like + +``` +guix environment -C guix --ad-hoc git python python-flask python-pyyaml nss-certs --network openssl +``` + +and + +``` +env FLASK_APP=bh20simplewebuploader/main.py flask run +``` -- cgit v1.2.3