diff options
author | lltommy | 2020-09-27 12:56:27 +0200 |
---|---|---|
committer | lltommy | 2020-09-27 12:56:27 +0200 |
commit | 71820c89e06b7d028ffdf76ddf01141733c78388 (patch) | |
tree | f7b99454ca5912c4edcafbf7fdbe445b214337a1 /scripts/db_enrichment/readme.md | |
parent | 7afd6b778b0deade1bf70062a10041e31a249af0 (diff) | |
download | bh20-seq-resource-71820c89e06b7d028ffdf76ddf01141733c78388.tar.gz bh20-seq-resource-71820c89e06b7d028ffdf76ddf01141733c78388.tar.lz bh20-seq-resource-71820c89e06b7d028ffdf76ddf01141733c78388.zip |
Adding script supporting semantic enrichment
Diffstat (limited to 'scripts/db_enrichment/readme.md')
-rw-r--r-- | scripts/db_enrichment/readme.md | 20 |
1 files changed, 20 insertions, 0 deletions
diff --git a/scripts/db_enrichment/readme.md b/scripts/db_enrichment/readme.md new file mode 100644 index 0000000..83297dc --- /dev/null +++ b/scripts/db_enrichment/readme.md @@ -0,0 +1,20 @@ +We have two files in the folder *semantic_enrichment* that are used to enrich the identifier in our triples store with additional information, e.g. human readable labels and semantics (e.g. *What countries are summarizes as a continent*). This describes how to update these two files. + +### semantic_enrichment/labels.ttl +Static label about the ontology vocabulary terms we use. This file has to be updated manually. Use the OLS or bioportal to find more information about a used ontology term. + +### semantic_enrichment/countries.ttl +File containing information about the countries in our database. Additional information about countries are e.g. the label or GPS coordinates. We enricht the country identifier via wikidata. + +#### Update process +- What countries (=wikidata identifier) do we have to enrich? +This query retrieves all countries (ids) from our database that do not have a label yet: + +>SELECT DISTINCT ?geoLocation WHERE +>{ +>?fasta ?x [<http://purl.obolibrary.org/obo/GAZ_00000448> ?geoLocation] . +>FILTER NOT EXISTS {?geoLocation <http://www.w3.org/2000/01/rdf-schema#label> ?geoLocation_tmp_label} +>} + +- Use the list of identifiers created with the query above as input for the update script *country_enrichment.py*. The script creates a temporary .ttl file in this folder +- Merge the output of the script above manually into the file semantic_enrichment/countries.ttl (TODO: Improve script output so manual intervention no longer needed. Currently there are "double entries" for continents in the output) |