COVID-19 PubSeq (part 2)

1. Finding output of workflows
2. Introduction
3. The Arvados file interface
4. Using the Arvados API

+As part of the COVID-19 Biohackathon 2020 we formed a working group to +create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for +Corona virus sequences. The general idea is to create a repository +that has a low barrier to entry for uploading sequence data using best +practices. I.e., data published with a creative commons 4.0 (CC-4.0) +license with metadata using state-of-the art standards and, perhaps +most importantly, providing standardised workflows that get triggered +on upload, so that results are immediately available in standardised +data formats. +

+ +

1 Finding output of workflows

+ +

2 Introduction

+We are using Arvados to run common workflow language (CWL) pipelines. +The most recent output is on display on a web page (with time stamp) +and a full list is generated here. It is nice to start up, but for +most users we need a dedicated and themed results page. People don't +want to wade through thousands of output files! +

+ +

3 The Arvados file interface

+Arvados has the web server, but it also has a REST API and associated +command line tools. We are already using the API to upload data. If +you follow the pip or ../INSTALL.md GNU Guix instructions for +installing Arvados API you'll find the following command line tools +(also documented here): +

+ + + + +++ ++ + + + + + + + + + + + + + + + + + + + + + + +

Command	Description
arv-ls	list files in Arvados
arv-put	upload a file to Arvados
arv-get	get a textual representation of Arvados objects from the command line. The output can be limited to a subset of the object’s fields. This command can be used with only the knowledge of an object’s UUID

+ +

+Now, this is a public instance so we can use the tokens from +the uploader. +

+ +

+export ARVADOS_API_HOST='lugli.arvadosapi.com' +export ARVADOS_API_TOKEN='2fbebpmbo3rw3x05ueu2i6nx70zhrsb1p22ycu3ry34m4x4462' +arv-ls lugli-4zz18-z513nlpqm03hpca +

+ +

+will list all files (the UUID we got from the Arvados results page). To +get the UUID of the files +

+ +

+curl https://lugli.arvadosapi.com/arvados/v1/config | jq .Users.AnonymousUserToken +env ARVADOS_API_TOKEN=5o42qdxpxp5cj15jqjf7vnxx5xduhm4ret703suuoa3ivfglfh \ + arv-get lugli-4zz18-z513nlpqm03hpca +

+ +

+and fetch one listed JSON file chunk001_bin4000.schematic.json with +its listed UUID: +

+ +

+arv-get 2be6af7b4741f2a5c5f8ff2bc6152d73+1955623+Ab9ad65d7fe958a053b3a57d545839de18290843a@5ed7f3c5
+

+ +

COVID-19 PubSeq (part 2)

Table of Contents

1 Finding output of workflows

2 Introduction

3 The Arvados file interface

4 Using the Arvados API