aboutsummaryrefslogtreecommitdiff
path: root/scripts/db_enrichment/readme.md
diff options
context:
space:
mode:
authorlltommy2020-09-27 12:56:27 +0200
committerlltommy2020-09-27 12:56:27 +0200
commit71820c89e06b7d028ffdf76ddf01141733c78388 (patch)
treef7b99454ca5912c4edcafbf7fdbe445b214337a1 /scripts/db_enrichment/readme.md
parent7afd6b778b0deade1bf70062a10041e31a249af0 (diff)
downloadbh20-seq-resource-71820c89e06b7d028ffdf76ddf01141733c78388.tar.gz
bh20-seq-resource-71820c89e06b7d028ffdf76ddf01141733c78388.tar.lz
bh20-seq-resource-71820c89e06b7d028ffdf76ddf01141733c78388.zip
Adding script supporting semantic enrichment
Diffstat (limited to 'scripts/db_enrichment/readme.md')
-rw-r--r--scripts/db_enrichment/readme.md20
1 files changed, 20 insertions, 0 deletions
diff --git a/scripts/db_enrichment/readme.md b/scripts/db_enrichment/readme.md
new file mode 100644
index 0000000..83297dc
--- /dev/null
+++ b/scripts/db_enrichment/readme.md
@@ -0,0 +1,20 @@
+We have two files in the folder *semantic_enrichment* that are used to enrich the identifier in our triples store with additional information, e.g. human readable labels and semantics (e.g. *What countries are summarizes as a continent*). This describes how to update these two files.
+
+### semantic_enrichment/labels.ttl
+Static label about the ontology vocabulary terms we use. This file has to be updated manually. Use the OLS or bioportal to find more information about a used ontology term.
+
+### semantic_enrichment/countries.ttl
+File containing information about the countries in our database. Additional information about countries are e.g. the label or GPS coordinates. We enricht the country identifier via wikidata.
+
+#### Update process
+- What countries (=wikidata identifier) do we have to enrich?
+This query retrieves all countries (ids) from our database that do not have a label yet:
+
+>SELECT DISTINCT ?geoLocation WHERE
+>{
+>?fasta ?x [<http://purl.obolibrary.org/obo/GAZ_00000448> ?geoLocation] .
+>FILTER NOT EXISTS {?geoLocation <http://www.w3.org/2000/01/rdf-schema#label> ?geoLocation_tmp_label}
+>}
+
+- Use the list of identifiers created with the query above as input for the update script *country_enrichment.py*. The script creates a temporary .ttl file in this folder
+- Merge the output of the script above manually into the file semantic_enrichment/countries.ttl (TODO: Improve script output so manual intervention no longer needed. Currently there are "double entries" for continents in the output)