From 246c516e4a8c98394c695dcb446995319d557e01 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sat, 22 Aug 2020 14:07:53 +0100 Subject: Generated --- doc/blog/using-covid-19-pubseq-part5.html | 130 +++++++++++++++++++++++------- 1 file changed, 101 insertions(+), 29 deletions(-) (limited to 'doc/blog/using-covid-19-pubseq-part5.html') diff --git a/doc/blog/using-covid-19-pubseq-part5.html b/doc/blog/using-covid-19-pubseq-part5.html index 4caa5ac..5d640f9 100644 --- a/doc/blog/using-covid-19-pubseq-part5.html +++ b/doc/blog/using-covid-19-pubseq-part5.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - + COVID-19 PubSeq (part 4) @@ -248,19 +248,28 @@ for the JavaScript code in this tag.

Table of Contents

-
-

1 Modify Metadata

+
+

1 Modify Metadata

The public sequence resource uses multiple data formats listed on the @@ -268,8 +277,8 @@ The public sequence resource uses multiple data formats listed on the for RDF and semantic web/linked data ontologies. This technology allows for querying data in unprescribed ways - that is, you can formulate your own queries without dealing with a preset model of that -data (so typical of CSV files and SQL tables). Examples of exploring -data are listed here. +data (which is how one has to approach CSV files and SQL +tables). Examples of exploring data are listed here.

@@ -280,8 +289,8 @@ understand that anyone, including you, can change that information!

-
-

2 What is the schema?

+
+

2 What is the schema?

The default metadata schema is listed here. @@ -289,8 +298,8 @@ The default metadata schema is listed -

3 How is the website generated?

+
+

3 How is the website generated?

Using the schema we use pyshex shex expressions and schema salad to @@ -300,9 +309,13 @@ All from that one metadata schema.

-
-

4 Modifying the schema

+
+

4 Changing the license field

+
+
+

4.1 Modifying the schema

+

One of the first things we want to do is to add a field for the data license. Initially we only supported CC-4.0 as a license, but @@ -380,25 +393,25 @@ So, we'll add it simply as a title field. Now the draft schema is type: record fields: license_type: - doc: License types as refined in https://wiki.creativecommons.org/images/d/d6/Ccrel-1.0.pdf + doc: License types as refined in https://wiki.creativecommons.org/images/d/d6/Ccrel-1.0.pdf type: string? jsonldPredicate: - _id: https://creativecommons.org/ns#License + _id: https://creativecommons.org/ns#License title: doc: Attribution title related to license type: string? jsonldPredicate: - _id: http://semanticscience.org/resource/SIO_001167 + _id: http://semanticscience.org/resource/SIO_001167 attribution_url: doc: Attribution URL related to license type: string? jsonldPredicate: - _id: https://creativecommons.org/ns#Work + _id: https://creativecommons.org/ns#Work attribution_source: doc: Attribution source URL type: string? jsonldPredicate: - _id: https://creativecommons.org/ns#Work + _id: https://creativecommons.org/ns#Work

@@ -411,13 +424,13 @@ gitter channel and I merged it.
-
-

5 Adding fields to the form

-
+
+

4.2 Adding fields to the form

+

To add the new fields to the form we have to modify it a little. If we go to the upload form we need to add the license box. The schema is -loaded in main.py in the 'generateform' function. +loaded in main.py in the 'generate-form' function.

@@ -453,12 +466,71 @@ field to be optional - a missing license assumes it is CC-BY-4.0.

-
-

6 TODO Testing the license fields

+
+

4.3 TODO Testing the license fields

+
+
+ +
+

5 Changing GEO or location field

+
+

+When fetching information from GenBank and EBI/ENA we also translate +the location into an unambiguous identifier. We opted for the wikidata +tag. E.g. for New York city it is https://www.wikidata.org/wiki/Q60 +and for New York state it is https://www.wikidata.org/wiki/Q1384. If +everyone uses these metadata URIs it is easy to group when making +queries. Note that we should be using +http://www.wikidata.org/entity/Q60 in the dataset (http instead of +https and entitity instead of wiki). +

+ +

+Unfortunately the main repositories of SARS-CoV-2 have variable +strings of text for location and/or GPS coordinates. For us to support +our schema we had to translate all options and this proves expensive. +

+
+ +
+

5.1 Relaxing the shex constraint

+
+

+So we decide to relax the enforcement of this type of metadata and to +allow for a free form string. +

+ +

+The schema already used http://purl.obolibrary.org/obo/GAZ_00000448 +which states: +

+ +
+
Class: geographic
+  location
+  Term IRI: http://purl.obolibrary.org/obo/GAZ_00000448
+Definition: A reference to a place on
+  the Earth, by its name or by its geographical location.
+
+
+ +

+and when you check count by location in the DEMO it lists a free +format. +

+ +

+So, why does the validation step balk when importing GenBank? +The problem was in the shex check for RDF generation. +Removing the wikidata requirement relaxed the imports with this +patch. +

+
+
-
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-07-16 Thu 03:27
. +
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-08-22 Sat 07:42
.
-- cgit v1.2.3