From 71820c89e06b7d028ffdf76ddf01141733c78388 Mon Sep 17 00:00:00 2001 From: lltommy Date: Sun, 27 Sep 2020 12:56:27 +0200 Subject: Adding script supporting semantic enrichment --- scripts/db_enrichment/readme.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 scripts/db_enrichment/readme.md (limited to 'scripts/db_enrichment/readme.md') diff --git a/scripts/db_enrichment/readme.md b/scripts/db_enrichment/readme.md new file mode 100644 index 0000000..83297dc --- /dev/null +++ b/scripts/db_enrichment/readme.md @@ -0,0 +1,20 @@ +We have two files in the folder *semantic_enrichment* that are used to enrich the identifier in our triples store with additional information, e.g. human readable labels and semantics (e.g. *What countries are summarizes as a continent*). This describes how to update these two files. + +### semantic_enrichment/labels.ttl +Static label about the ontology vocabulary terms we use. This file has to be updated manually. Use the OLS or bioportal to find more information about a used ontology term. + +### semantic_enrichment/countries.ttl +File containing information about the countries in our database. Additional information about countries are e.g. the label or GPS coordinates. We enricht the country identifier via wikidata. + +#### Update process +- What countries (=wikidata identifier) do we have to enrich? +This query retrieves all countries (ids) from our database that do not have a label yet: + +>SELECT DISTINCT ?geoLocation WHERE +>{ +>?fasta ?x [ ?geoLocation] . +>FILTER NOT EXISTS {?geoLocation ?geoLocation_tmp_label} +>} + +- Use the list of identifiers created with the query above as input for the update script *country_enrichment.py*. The script creates a temporary .ttl file in this folder +- Merge the output of the script above manually into the file semantic_enrichment/countries.ttl (TODO: Improve script output so manual intervention no longer needed. Currently there are "double entries" for continents in the output) -- cgit v1.2.3