From 264be797c55aaff6eb9639d5a15d9081e2256253 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sat, 30 May 2020 18:13:48 -0500 Subject: BLOG --- doc/blog/using-covid-19-pubseq-part2.html | 394 ++++++++++++++++++++++++++++++ 1 file changed, 394 insertions(+) create mode 100644 doc/blog/using-covid-19-pubseq-part2.html (limited to 'doc/blog/using-covid-19-pubseq-part2.html') diff --git a/doc/blog/using-covid-19-pubseq-part2.html b/doc/blog/using-covid-19-pubseq-part2.html new file mode 100644 index 0000000..c047441 --- /dev/null +++ b/doc/blog/using-covid-19-pubseq-part2.html @@ -0,0 +1,394 @@ + + + + + + + +COVID-19 PubSeq (part 2) + + + + + + + +
+ UP + | + HOME +
+

COVID-19 PubSeq (part 2)

+
+

Table of Contents

+ +
+

+As part of the COVID-19 Biohackathon 2020 we formed a working group to +create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for +Corona virus sequences. The general idea is to create a repository +that has a low barrier to entry for uploading sequence data using best +practices. I.e., data published with a creative commons 4.0 (CC-4.0) +license with metadata using state-of-the art standards and, perhaps +most importantly, providing standardised workflows that get triggered +on upload, so that results are immediately available in standardised +data formats. +

+ +
+

1 Finding output of workflows

+
+

+As part of the COVID-19 Biohackathon 2020 we formed a working group to +create a COVID-19 Public Sequence Resource (COVID-19 PubSeq) for +Corona virus sequences. The general idea is to create a repository +that has a low barrier to entry for uploading sequence data using best +practices. I.e., data published with a creative commons 4.0 (CC-4.0) +license with metadata using state-of-the art standards and, perhaps +most importantly, providing standardised workflows that get triggered +on upload, so that results are immediately available in standardised +data formats. +

+
+
+ +
+

2 Introduction

+
+

+We are using Arvados to run common workflow language (CWL) pipelines. +The most recent output is on display on a web page (with time stamp) +and a full list is generated here. It is nice to start up, but for +most users we need a dedicated and themed results page. People don't +want to wade through thousands of output files! +

+
+
+ +
+

3 The Arvados file interface

+
+

+Arvados has the web server, but it also has a REST API and associated +command line tools. We are already using the API to upload data. If +you follow the pip or ../INSTALL.md GNU Guix instructions for +installing Arvados API you'll find the following command line tools +(also documented here): +

+ + + + +++ ++ + + + + + + + + + + + + + + + + + + + + + + +
CommandDescription
arv-lslist files in Arvados
arv-putupload a file to Arvados
arv-getget a textual representation of Arvados objects from the command line. The output can be limited to a subset of the object’s fields. This command can be used with only the knowledge of an object’s UUID
+ +

+Now, this is a public instance so we can use the tokens from +the uploader. +

+ +
+

+export ARVADOSAPIHOST='lugli.arvadosapi.com' +export ARVADOSAPITOKEN='2fbebpmbo3rw3x05ueu2i6nx70zhrsb1p22ycu3ry34m4x4462' +arv-ls lugli-4zz18-z513nlpqm03hpca +

+ +
+ +

+will list all files (the UUID we got from the Arvados results page). To +get the UUID of the files +

+ +
+

+curl https://lugli.arvadosapi.com/arvados/v1/config | jq .Users.AnonymousUserToken +env ARVADOSAPITOKEN=5o42qdxpxp5cj15jqjf7vnxx5xduhm4ret703suuoa3ivfglfh \ + arv-get lugli-4zz18-z513nlpqm03hpca +

+ +
+ +

+and fetch one listed JSON file chunk001_bin4000.schematic.json with +its listed UUID: +

+ +
+arv-get 2be6af7b4741f2a5c5f8ff2bc6152d73+1955623+Ab9ad65d7fe958a053b3a57d545839de18290843a@5ed7f3c5
+
+
+
+ +
+

4 Using the Arvados API

+
+
+
+
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-05-30 Sat 11:50
. +
+ + -- cgit v1.2.3