From 246c516e4a8c98394c695dcb446995319d557e01 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sat, 22 Aug 2020 14:07:53 +0100 Subject: Generated --- doc/blog/using-covid-19-pubseq-part5.html | 130 +++++++++++++++++++++++------- 1 file changed, 101 insertions(+), 29 deletions(-) (limited to 'doc/blog/using-covid-19-pubseq-part5.html') diff --git a/doc/blog/using-covid-19-pubseq-part5.html b/doc/blog/using-covid-19-pubseq-part5.html index 4caa5ac..5d640f9 100644 --- a/doc/blog/using-covid-19-pubseq-part5.html +++ b/doc/blog/using-covid-19-pubseq-part5.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- +The public sequence resource uses multiple data formats listed on the @@ -268,8 +277,8 @@ The public sequence resource uses multiple data formats listed on the for RDF and semantic web/linked data ontologies. This technology allows for querying data in unprescribed ways - that is, you can formulate your own queries without dealing with a preset model of that -data (so typical of CSV files and SQL tables). Examples of exploring -data are listed here. +data (which is how one has to approach CSV files and SQL +tables). Examples of exploring data are listed here.
@@ -280,8 +289,8 @@ understand that anyone, including you, can change that information!
The default metadata schema is listed here.
@@ -289,8 +298,8 @@ The default metadata schema is listed
-
Using the schema we use pyshex shex expressions and schema salad to
@@ -300,9 +309,13 @@ All from that one metadata schema.
One of the first things we want to do is to add a field for the data
license. Initially we only supported CC-4.0 as a license, but
@@ -380,25 +393,25 @@ So, we'll add it simply as a title field. Now the draft schema is
type: record
fields:
license_type:
- doc: License types as refined in https://wiki.creativecommons.org/images/d/d6/Ccrel-1.0.pdf
+ doc: License types as refined in https://wiki.creativecommons.org/images/d/d6/Ccrel-1.0.pdf
type: string?
jsonldPredicate:
- _id: https://creativecommons.org/ns#License
+ _id: https://creativecommons.org/ns#License
title:
doc: Attribution title related to license
type: string?
jsonldPredicate:
- _id: http://semanticscience.org/resource/SIO_001167
+ _id: http://semanticscience.org/resource/SIO_001167
attribution_url:
doc: Attribution URL related to license
type: string?
jsonldPredicate:
- _id: https://creativecommons.org/ns#Work
+ _id: https://creativecommons.org/ns#Work
attribution_source:
doc: Attribution source URL
type: string?
jsonldPredicate:
- _id: https://creativecommons.org/ns#Work
+ _id: https://creativecommons.org/ns#Work
To add the new fields to the form we have to modify it a little. If we
go to the upload form we need to add the license box. The schema is
-loaded in main.py in the 'generateform' function.
+loaded in main.py in the 'generate-form' function.
@@ -453,12 +466,71 @@ field to be optional - a missing license assumes it is CC-BY-4.0.
+When fetching information from GenBank and EBI/ENA we also translate
+the location into an unambiguous identifier. We opted for the wikidata
+tag. E.g. for New York city it is https://www.wikidata.org/wiki/Q60
+and for New York state it is https://www.wikidata.org/wiki/Q1384. If
+everyone uses these metadata URIs it is easy to group when making
+queries. Note that we should be using
+http://www.wikidata.org/entity/Q60 in the dataset (http instead of
+https and entitity instead of wiki).
+
+Unfortunately the main repositories of SARS-CoV-2 have variable
+strings of text for location and/or GPS coordinates. For us to support
+our schema we had to translate all options and this proves expensive.
+
+So we decide to relax the enforcement of this type of metadata and to
+allow for a free form string.
+
+The schema already used http://purl.obolibrary.org/obo/GAZ_00000448
+which states:
+
+and when you check count by location in the DEMO it lists a free
+format.
+
+So, why does the validation step balk when importing GenBank?
+The problem was in the shex check for RDF generation.
+Removing the wikidata requirement relaxed the imports with this
+patch.
+3 How is the website generated?
+3 How is the website generated?
4 Modifying the schema
+4 Changing the license field
4.1 Modifying the schema
+5 Adding fields to the form
-4.2 Adding fields to the form
+6 TODO Testing the license fields
+4.3 TODO Testing the license fields
+5 Changing GEO or location field
+5.1 Relaxing the shex constraint
+Class: geographic
+ location
+ Term IRI: http://purl.obolibrary.org/obo/GAZ_00000448
+Definition: A reference to a place on
+ the Earth, by its name or by its geographical location.
+
+
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-07-16 Thu 03:27.
+
Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-08-22 Sat 07:42.