COVID-19 PubSeq (part 2)

1. Finding output of workflows
2. Introduction
3. The Arvados file interface
4. Using the Arvados API

As part of the COVID-19 Biohackathon 2020 we formed a working group to create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for Corona virus sequences. The general idea is to create a repository that has a low barrier to entry for uploading sequence data using best practices. I.e., data published with a creative commons 4.0 (CC-4.0) license with metadata using state-of-the art standards and, perhaps most importantly, providing standardised workflows that get triggered on upload, so that results are immediately available in standardised data formats.

1 Finding output of workflows

2 Introduction

We are using Arvados to run common workflow language (CWL) pipelines. The most recent output is on display on a web page (with time stamp) and a full list is generated here. It is nice to start up, but for most users we need a dedicated and themed results page. People don't want to wade through thousands of output files!

3 The Arvados file interface

Arvados has the web server, but it also has a REST API and associated command line tools. We are already using the API to upload data. If you follow the pip or ../INSTALL.md GNU Guix instructions for installing Arvados API you'll find the following command line tools (also documented here):

Command	Description
arv-ls	list files in Arvados
arv-put	upload a file to Arvados
arv-get	get a textual representation of Arvados objects from the command line. The output can be limited to a subset of the object’s fields. This command can be used with only the knowledge of an object’s UUID

Now, this is a public instance so we can use the tokens from the uploader.

export ARVADOS_API_HOST='lugli.arvadosapi.com' export ARVADOS_API_TOKEN='2fbebpmbo3rw3x05ueu2i6nx70zhrsb1p22ycu3ry34m4x4462' arv-ls lugli-4zz18-z513nlpqm03hpca

will list all files (the UUID we got from the Arvados results page). To get the UUID of the files

curl https://lugli.arvadosapi.com/arvados/v1/config | jq .Users.AnonymousUserToken env ARVADOS_API_TOKEN=5o42qdxpxp5cj15jqjf7vnxx5xduhm4ret703suuoa3ivfglfh \ arv-get lugli-4zz18-z513nlpqm03hpca

and fetch one listed JSON file chunk001_bin4000.schematic.json with its listed UUID:

arv-get 2be6af7b4741f2a5c5f8ff2bc6152d73+1955623+Ab9ad65d7fe958a053b3a57d545839de18290843a@5ed7f3c5

COVID-19 PubSeq (part 2)

Table of Contents

1 Finding output of workflows

2 Introduction

3 The Arvados file interface

4 Using the Arvados API