diff options
author | Pjotr Prins | 2020-07-12 12:25:24 +0100 |
---|---|---|
committer | Pjotr Prins | 2020-07-12 12:25:24 +0100 |
commit | 3dd94e87c25ff0b2942dc59c919a9e6e45fe45be (patch) | |
tree | e5bc7e6498457efc90668d7673a423e01275c9a0 /doc/blog | |
parent | fba4474b5e2e7c069bb9158089ecb873ff8e6c5c (diff) | |
download | bh20-seq-resource-3dd94e87c25ff0b2942dc59c919a9e6e45fe45be.tar.gz bh20-seq-resource-3dd94e87c25ff0b2942dc59c919a9e6e45fe45be.tar.lz bh20-seq-resource-3dd94e87c25ff0b2942dc59c919a9e6e45fe45be.zip |
Docs: started on metadata modification
Diffstat (limited to 'doc/blog')
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part4.html | 44 | ||||
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part4.org | 21 | ||||
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part5.html | 79 | ||||
-rw-r--r-- | doc/blog/using-covid-19-pubseq-part5.org | 39 |
4 files changed, 141 insertions, 42 deletions
diff --git a/doc/blog/using-covid-19-pubseq-part4.html b/doc/blog/using-covid-19-pubseq-part4.html index 67d299e..b5a05ca 100644 --- a/doc/blog/using-covid-19-pubseq-part4.html +++ b/doc/blog/using-covid-19-pubseq-part4.html @@ -3,10 +3,10 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> -<!-- 2020-05-30 Sat 11:52 --> +<!-- 2020-07-12 Sun 06:24 --> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> -<title>‎</title> +<title>COVID-19 PubSeq (part 4)</title> <meta name="generator" content="Org mode" /> <meta name="author" content="Pjotr Prins" /> <style type="text/css"> @@ -161,19 +161,6 @@ .footdef { margin-bottom: 1em; } .figure { padding: 1em; } .figure p { text-align: center; } - .equation-container { - display: table; - text-align: center; - width: 100%; - } - .equation { - vertical-align: middle; - } - .equation-label { - display: table-cell; - text-align: right; - vertical-align: middle; - } .inlinetask { padding: 10px; border: 2px solid gray; @@ -193,12 +180,13 @@ .org-svg { width: 90%; } /*]]>*/--> </style> +<link rel="Blog stylesheet" type="text/css" href="blog.css" /> <script type="text/javascript"> /* @licstart The following is the entire license notice for the JavaScript code in this tag. -Copyright (C) 2012-2020 Free Software Foundation, Inc. +Copyright (C) 2012-2018 Free Software Foundation, Inc. The JavaScript code in this tag is free software: you can redistribute it and/or modify it under the terms of the GNU @@ -242,25 +230,41 @@ for the JavaScript code in this tag. </head> <body> <div id="content"> +<h1 class="title">COVID-19 PubSeq (part 4)</h1> <div id="table-of-contents"> <h2>Table of Contents</h2> <div id="text-table-of-contents"> <ul> -<li><a href="#orgda6f48c">1. Modify Workflow</a></li> +<li><a href="#org8f8b64a">1. What does this mean?</a></li> +<li><a href="#orgcc7a403">2. Modify Workflow</a></li> </ul> </div> </div> -<div id="outline-container-orgda6f48c" class="outline-2"> -<h2 id="orgda6f48c"><span class="section-number-2">1</span> Modify Workflow</h2> + + +<div id="outline-container-org8f8b64a" class="outline-2"> +<h2 id="org8f8b64a"><span class="section-number-2">1</span> What does this mean?</h2> <div class="outline-text-2" id="text-1"> <p> +This means that when someone uploads a SARS-CoV-2 sequence using one +of our tools (CLI or web-based) they add a sequence and some metadata +which triggers a rerun of our workflows. +</p> +</div> +</div> + + +<div id="outline-container-orgcc7a403" class="outline-2"> +<h2 id="orgcc7a403"><span class="section-number-2">2</span> Modify Workflow</h2> +<div class="outline-text-2" id="text-2"> +<p> <i>Work in progress!</i> </p> </div> </div> </div> <div id="postamble" class="status"> -<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-05-30 Sat 11:52</small>. +<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-07-12 Sun 06:24</small>. </div> </body> </html> diff --git a/doc/blog/using-covid-19-pubseq-part4.org b/doc/blog/using-covid-19-pubseq-part4.org index 58a1f56..5fe71d1 100644 --- a/doc/blog/using-covid-19-pubseq-part4.org +++ b/doc/blog/using-covid-19-pubseq-part4.org @@ -1,3 +1,24 @@ +#+TITLE: COVID-19 PubSeq (part 4) +#+AUTHOR: Pjotr Prins +# C-c C-e h h publish +# C-c ! insert date (use . for active agenda, C-u C-c ! for date, C-u C-c . for time) +# C-c C-t task rotate +# RSS_IMAGE_URL: http://xxxx.xxxx.free.fr/rss_icon.png + +#+HTML_HEAD: <link rel="Blog stylesheet" type="text/css" href="blog.css" /> + + +* Table of Contents :TOC:noexport: + - [[#what-does-this-mean][What does this mean?]] + - [[#modify-workflow][Modify Workflow]] + +* What does this mean? + +This means that when someone uploads a SARS-CoV-2 sequence using one +of our tools (CLI or web-based) they add a sequence and some metadata +which triggers a rerun of our workflows. + + * Modify Workflow /Work in progress!/ diff --git a/doc/blog/using-covid-19-pubseq-part5.html b/doc/blog/using-covid-19-pubseq-part5.html index 30a3f83..80bf559 100644 --- a/doc/blog/using-covid-19-pubseq-part5.html +++ b/doc/blog/using-covid-19-pubseq-part5.html @@ -3,10 +3,10 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> -<!-- 2020-05-30 Sat 11:59 --> +<!-- 2020-07-12 Sun 06:24 --> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> -<title>‎</title> +<title>COVID-19 PubSeq (part 4)</title> <meta name="generator" content="Org mode" /> <meta name="author" content="Pjotr Prins" /> <style type="text/css"> @@ -161,19 +161,6 @@ .footdef { margin-bottom: 1em; } .figure { padding: 1em; } .figure p { text-align: center; } - .equation-container { - display: table; - text-align: center; - width: 100%; - } - .equation { - vertical-align: middle; - } - .equation-label { - display: table-cell; - text-align: right; - vertical-align: middle; - } .inlinetask { padding: 10px; border: 2px solid gray; @@ -193,12 +180,13 @@ .org-svg { width: 90%; } /*]]>*/--> </style> +<link rel="Blog stylesheet" type="text/css" href="blog.css" /> <script type="text/javascript"> /* @licstart The following is the entire license notice for the JavaScript code in this tag. -Copyright (C) 2012-2020 Free Software Foundation, Inc. +Copyright (C) 2012-2018 Free Software Foundation, Inc. The JavaScript code in this tag is free software: you can redistribute it and/or modify it under the terms of the GNU @@ -242,16 +230,22 @@ for the JavaScript code in this tag. </head> <body> <div id="content"> +<h1 class="title">COVID-19 PubSeq (part 4)</h1> <div id="table-of-contents"> <h2>Table of Contents</h2> <div id="text-table-of-contents"> <ul> -<li><a href="#org31c224e">1. Modify Metadata</a></li> +<li><a href="#org871ad58">1. Modify Metadata</a></li> +<li><a href="#org07e8755">2. What is the schema?</a></li> +<li><a href="#org4857280">3. How is the website generated?</a></li> +<li><a href="#orge709ae2">4. Modifying the schema</a></li> </ul> </div> </div> -<div id="outline-container-org31c224e" class="outline-2"> -<h2 id="org31c224e"><span class="section-number-2">1</span> Modify Metadata</h2> + + +<div id="outline-container-org871ad58" class="outline-2"> +<h2 id="org871ad58"><span class="section-number-2">1</span> Modify Metadata</h2> <div class="outline-text-2" id="text-1"> <p> The public sequence resource uses multiple data formats listed on the @@ -265,13 +259,56 @@ data are listed <a href="./blog?id=using-covid-19-pubseq-part1">here</a>. <p> In this BLOG we are going to look at the metadata entered on the -<a href="./">COVID-19 PubSeq</a> website (or command line client). +<a href="./">COVID-19 PubSeq</a> website (or command line client). It is important to +understand that anyone, including you, can change that information! +</p> +</div> +</div> + +<div id="outline-container-org07e8755" class="outline-2"> +<h2 id="org07e8755"><span class="section-number-2">2</span> What is the schema?</h2> +<div class="outline-text-2" id="text-2"> +<p> +The default metadata schema is listed <a href="https://github.com/arvados/bh20-seq-resource/blob/master/bh20sequploader/bh20seq-schema.yml">here</a>. +</p> +</div> +</div> + +<div id="outline-container-org4857280" class="outline-2"> +<h2 id="org4857280"><span class="section-number-2">3</span> How is the website generated?</h2> +<div class="outline-text-2" id="text-3"> +<p> +Using the schema we use <a href="https://pypi.org/project/PyShEx/">pyshex</a> shex expressions and <a href="https://github.com/common-workflow-language/schema_salad">schema salad</a> to +generate the <a href="https://github.com/arvados/bh20-seq-resource/blob/edb17e7f7caebfa1e76b21006b1772a33f4f7887/bh20simplewebuploader/templates/form.html#L47">input form</a>, <a href="https://github.com/arvados/bh20-seq-resource/blob/edb17e7f7caebfa1e76b21006b1772a33f4f7887/bh20sequploader/qc_metadata.py#L13">validate</a> the user input and to build <a href="https://github.com/arvados/bh20-seq-resource/blob/edb17e7f7caebfa1e76b21006b1772a33f4f7887/workflows/pangenome-generate/merge-metadata.py#L24">RDF</a>! +All from that one metadata schema. +</p> +</div> +</div> + +<div id="outline-container-orge709ae2" class="outline-2"> +<h2 id="orge709ae2"><span class="section-number-2">4</span> Modifying the schema</h2> +<div class="outline-text-2" id="text-4"> +<p> +One of the first things we wanted to do is to add a field for the data +license. Initially we only support CC-4.0 as a license by default, but +now we want to give uploaders the option to make it an even more +liberal CC0 license. The first step is to find a good ontology term +for the field. Searching for `creative commons cc0 rdf' rendered this +useful <a href="https://creativecommons.org/ns">page</a>. We also find an <a href="https://wiki.creativecommons.org/wiki/CC_License_Rdf_Overview">overview</a> where CC0 is represented as URI +<a href="https://creativecommons.org/publicdomain/zero/1.0/">https://creativecommons.org/publicdomain/zero/1.0/</a>. Meanwhile the +attribution license <a href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</a>. +According to this <a href="https://wiki.creativecommons.org/images/d/d6/Ccrel-1.0.pdf">document</a> we should really also add fields for +attributionName and attributionURL. +</p> + +<p> +<i>Note: work in progress</i> </p> </div> </div> </div> <div id="postamble" class="status"> -<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-05-30 Sat 11:59</small>. +<hr><small>Created by <a href="http://thebird.nl/">Pjotr Prins</a> (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!<br />Modified 2020-07-12 Sun 06:24</small>. </div> </body> </html> diff --git a/doc/blog/using-covid-19-pubseq-part5.org b/doc/blog/using-covid-19-pubseq-part5.org index 8d7504e..fe1908a 100644 --- a/doc/blog/using-covid-19-pubseq-part5.org +++ b/doc/blog/using-covid-19-pubseq-part5.org @@ -1,3 +1,19 @@ +#+TITLE: COVID-19 PubSeq (part 4) +#+AUTHOR: Pjotr Prins +# C-c C-e h h publish +# C-c ! insert date (use . for active agenda, C-u C-c ! for date, C-u C-c . for time) +# C-c C-t task rotate +# RSS_IMAGE_URL: http://xxxx.xxxx.free.fr/rss_icon.png + +#+HTML_HEAD: <link rel="Blog stylesheet" type="text/css" href="blog.css" /> + + +* Table of Contents :TOC:noexport: + - [[#modify-metadata][Modify Metadata]] + - [[#what-is-the-schema][What is the schema?]] + - [[#how-is-the-website-generated][How is the website generated?]] + - [[#modifying-the-schema][Modifying the schema]] + * Modify Metadata The public sequence resource uses multiple data formats listed on the @@ -10,8 +26,29 @@ data are listed [[./blog?id=using-covid-19-pubseq-part1][here]]. In this BLOG we are going to look at the metadata entered on the [[./][COVID-19 PubSeq]] website (or command line client). It is important to -understand that you and us can change that information. +understand that anyone, including you, can change that information! * What is the schema? +The default metadata schema is listed [[https://github.com/arvados/bh20-seq-resource/blob/master/bh20sequploader/bh20seq-schema.yml][here]]. + * How is the website generated? + +Using the schema we use [[https://pypi.org/project/PyShEx/][pyshex]] shex expressions and [[https://github.com/common-workflow-language/schema_salad][schema salad]] to +generate the [[https://github.com/arvados/bh20-seq-resource/blob/edb17e7f7caebfa1e76b21006b1772a33f4f7887/bh20simplewebuploader/templates/form.html#L47][input form]], [[https://github.com/arvados/bh20-seq-resource/blob/edb17e7f7caebfa1e76b21006b1772a33f4f7887/bh20sequploader/qc_metadata.py#L13][validate]] the user input and to build [[https://github.com/arvados/bh20-seq-resource/blob/edb17e7f7caebfa1e76b21006b1772a33f4f7887/workflows/pangenome-generate/merge-metadata.py#L24][RDF]]! +All from that one metadata schema. + +* Modifying the schema + +One of the first things we wanted to do is to add a field for the data +license. Initially we only support CC-4.0 as a license by default, but +now we want to give uploaders the option to make it an even more +liberal CC0 license. The first step is to find a good ontology term +for the field. Searching for `creative commons cc0 rdf' rendered this +useful [[https://creativecommons.org/ns][page]]. We also find an [[https://wiki.creativecommons.org/wiki/CC_License_Rdf_Overview][overview]] where CC0 is represented as URI +https://creativecommons.org/publicdomain/zero/1.0/. Meanwhile the +attribution license https://creativecommons.org/licenses/by/4.0/. +According to this [[https://wiki.creativecommons.org/images/d/d6/Ccrel-1.0.pdf][document]] we should really also add fields for +attributionName and attributionURL. + +/Note: work in progress/ |