From c69046ee9a5e24eadcd8cb885633328b0fd88011 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Fri, 17 Jul 2020 11:06:33 +0100 Subject: Update generated docs --- doc/blog/using-covid-19-pubseq-part5.html | 194 ++++++++++++++++++++++++++---- 1 file changed, 172 insertions(+), 22 deletions(-) (limited to 'doc/blog/using-covid-19-pubseq-part5.html') diff --git a/doc/blog/using-covid-19-pubseq-part5.html b/doc/blog/using-covid-19-pubseq-part5.html index 80bf559..4caa5ac 100644 --- a/doc/blog/using-covid-19-pubseq-part5.html +++ b/doc/blog/using-covid-19-pubseq-part5.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - + COVID-19 PubSeq (part 4) @@ -161,6 +161,19 @@ .footdef { margin-bottom: 1em; } .figure { padding: 1em; } .figure p { text-align: center; } + .equation-container { + display: table; + text-align: center; + width: 100%; + } + .equation { + vertical-align: middle; + } + .equation-label { + display: table-cell; + text-align: right; + vertical-align: middle; + } .inlinetask { padding: 10px; border: 2px solid gray; @@ -186,7 +199,7 @@ @licstart The following is the entire license notice for the JavaScript code in this tag. -Copyright (C) 2012-2018 Free Software Foundation, Inc. +Copyright (C) 2012-2020 Free Software Foundation, Inc. The JavaScript code in this tag is free software: you can redistribute it and/or modify it under the terms of the GNU @@ -235,38 +248,40 @@ for the JavaScript code in this tag.

Table of Contents

-
-

1 Modify Metadata

+
+

1 Modify Metadata

The public sequence resource uses multiple data formats listed on the -DOWNLOAD page. One of the most exciting features is the full support +download page. One of the most exciting features is the full support for RDF and semantic web/linked data ontologies. This technology allows for querying data in unprescribed ways - that is, you can formulate your own queries without dealing with a preset model of that data (so typical of CSV files and SQL tables). Examples of exploring -data are listed here. +data are listed here.

In this BLOG we are going to look at the metadata entered on the -COVID-19 PubSeq website (or command line client). It is important to +COVID-19 PubSeq website (or command line client). It is important to understand that anyone, including you, can change that information!

-
-

2 What is the schema?

+
+

2 What is the schema?

The default metadata schema is listed here. @@ -274,8 +289,8 @@ The default metadata schema is listed -

3 How is the website generated?

+
+

3 How is the website generated?

Using the schema we use pyshex shex expressions and schema salad to @@ -285,13 +300,13 @@ All from that one metadata schema.

-
-

4 Modifying the schema

+
+

4 Modifying the schema

-One of the first things we wanted to do is to add a field for the data -license. Initially we only support CC-4.0 as a license by default, but -now we want to give uploaders the option to make it an even more +One of the first things we want to do is to add a field for the data +license. Initially we only supported CC-4.0 as a license, but +we wanted to give uploaders the option to use an even more liberal CC0 license. The first step is to find a good ontology term for the field. Searching for `creative commons cc0 rdf' rendered this useful page. We also find an overview where CC0 is represented as URI @@ -302,13 +317,148 @@ attributionName and attributionURL.

-Note: work in progress +A minimal triple should be +

+ +
+id  xhtml:license  <http://creativecommons.org/licenses/by/4.0/> .
+
+ + +

+Other suggestions are +

+ +
+id  dc:title "Description" .
+id  cc:attributionName "Your Name" .
+id  cc:attributionURL <http://resource.org/id>
+
+ + +

+and 'dc:source' which indicates the original source of any modified +work, specified as a URI. +The prefix 'cc:' is an abbreviation for http://creativecommons.org/ns#. +

+ +

+Going back to the schema, where does it fit? Under host, sample, +virus, technology or submitter block? It could fit under sample, but +actually the license concerns the whole metadata block and sequence, +so I think we can fit under its own license tag. For example +

+ + +

+id: placeholder +

+ +
+license:
+    license_type: http://creativecommons.org/licenses/by/4.0/
+    attribution_title: "Sample ID"
+    attribution_name: "John doe, Joe Boe, Jonny Oe"
+    attribution_url: http://covid19.genenetwork.org/id
+    attribution_source: https://www.ncbi.nlm.nih.gov/pubmed/323088888
+
+ + +

+So, let's update the example. Notice the license info is optional - if it is missing +we just assume the default CC-4.0. +

+ +

+One thing that is interesting is that in the name space https://creativecommons.org/ns there +is no mention of a title. I think it is useful, however, because we have no such field. +So, we'll add it simply as a title field. Now the draft schema is

+ +
+
- name: licenseSchema
+  type: record
+  fields:
+    license_type:
+      doc: License types as refined in https://wiki.creativecommons.org/images/d/d6/Ccrel-1.0.pdf
+      type: string?
+      jsonldPredicate:
+          _id: https://creativecommons.org/ns#License
+    title:
+      doc: Attribution title related to license
+      type: string?
+      jsonldPredicate:
+          _id: http://semanticscience.org/resource/SIO_001167
+    attribution_url:
+      doc: Attribution URL related to license
+      type: string?
+      jsonldPredicate:
+          _id: https://creativecommons.org/ns#Work
+    attribution_source:
+      doc: Attribution source URL
+      type: string?
+      jsonldPredicate:
+          _id: https://creativecommons.org/ns#Work
+
+
+ +

+Now, we are no ontology experts, right? So, next we submit a patch to +our source tree and ask for feedback before wiring it up in the data +entry form. The pull request was submitted here and reviewed on the +gitter channel and I merged it. +

+
+ +
+

5 Adding fields to the form

+
+

+To add the new fields to the form we have to modify it a little. If we +go to the upload form we need to add the license box. The schema is +loaded in main.py in the 'generateform' function. +

+ +

+With this patch the website adds the license input fields on the form. +

+ +

+Finally, to make RDF output work we need to add expressions to bh20seq-shex.rdf. This +was done with this patch. In the end we decided to use the Dublin core title, +http://purl.org/metadata/dublin_core_elements#Title: +

+ +
+
:licenseShape{
+    cc:License xsd:string;
+    dc:Title xsd:string ?;
+    cc:attributionName xsd:string ?;
+    cc:attributionURL xsd:string ?;
+    cc:attributionSource xsd:string ?;
+}
+
+
+ +

+Note that cc:AttributionSource is not really defined in the cc standard. +

+ +

+When pushing the license info we discovered the workflow broke because +the existing data had no licensing info. So we changed the license +field to be optional - a missing license assumes it is CC-BY-4.0. +

+
+
+ +
+

6 TODO Testing the license fields

-
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-07-12 Sun 06:24
. +
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-07-16 Thu 03:27
.
-- cgit v1.2.3