COVID-19 PubSeq (part 2)
+Table of Contents
+ ++As part of the COVID-19 Biohackathon 2020 we formed a working group to +create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for +Corona virus sequences. The general idea is to create a repository +that has a low barrier to entry for uploading sequence data using best +practices. I.e., data published with a creative commons 4.0 (CC-4.0) +license with metadata using state-of-the art standards and, perhaps +most importantly, providing standardised workflows that get triggered +on upload, so that results are immediately available in standardised +data formats. +
+ +1 Finding output of workflows
++As part of the COVID-19 Biohackathon 2020 we formed a working group to +create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for +Corona virus sequences. The general idea is to create a repository +that has a low barrier to entry for uploading sequence data using best +practices. I.e., data published with a creative commons 4.0 (CC-4.0) +license with metadata using state-of-the art standards and, perhaps +most importantly, providing standardised workflows that get triggered +on upload, so that results are immediately available in standardised +data formats. +
+2 Introduction
++We are using Arvados to run common workflow language (CWL) pipelines. +The most recent output is on display on a web page (with time stamp) +and a full list is generated here. It is nice to start up, but for +most users we need a dedicated and themed results page. People don't +want to wade through thousands of output files! +
+3 The Arvados file interface
++Arvados has the web server, but it also has a REST API and associated +command line tools. We are already using the API to upload data. If +you follow the pip or ../INSTALL.md GNU Guix instructions for +installing Arvados API you'll find the following command line tools +(also documented here): +
+ +Command | +Description | +
---|---|
arv-ls | +list files in Arvados | +
arv-put | +upload a file to Arvados | +
arv-get | +get a textual representation of Arvados objects from the command line. The output can be limited to a subset of the object’s fields. This command can be used with only the knowledge of an object’s UUID | +
+Now, this is a public instance so we can use the tokens from +the uploader. +
+ ++export ARVADOSAPIHOST='lugli.arvadosapi.com' +export ARVADOSAPITOKEN='2fbebpmbo3rw3x05ueu2i6nx70zhrsb1p22ycu3ry34m4x4462' +arv-ls lugli-4zz18-z513nlpqm03hpca +
+ ++will list all files (the UUID we got from the Arvados results page). To +get the UUID of the files +
+ ++curl https://lugli.arvadosapi.com/arvados/v1/config | jq .Users.AnonymousUserToken +env ARVADOSAPITOKEN=5o42qdxpxp5cj15jqjf7vnxx5xduhm4ret703suuoa3ivfglfh \ + arv-get lugli-4zz18-z513nlpqm03hpca +
+ +
+and fetch one listed JSON file chunk001_bin4000.schematic.json
with
+its listed UUID:
+
+arv-get 2be6af7b4741f2a5c5f8ff2bc6152d73+1955623+Ab9ad65d7fe958a053b3a57d545839de18290843a@5ed7f3c5 ++