COVID-19 PubSeq (part 4)

Table of Contents

1 Modify Metadata

The public sequence resource uses multiple data formats listed on the DOWNLOAD page. One of the most exciting features is the full support for RDF and semantic web/linked data ontologies. This technology allows for querying data in unprescribed ways - that is, you can formulate your own queries without dealing with a preset model of that data (so typical of CSV files and SQL tables). Examples of exploring data are listed here.

In this BLOG we are going to look at the metadata entered on the COVID-19 PubSeq website (or command line client). It is important to understand that anyone, including you, can change that information!

2 What is the schema?

The default metadata schema is listed here.

3 How is the website generated?

Using the schema we use pyshex shex expressions and schema salad to generate the input form, validate the user input and to build RDF! All from that one metadata schema.

4 Modifying the schema

One of the first things we wanted to do is to add a field for the data license. Initially we only support CC-4.0 as a license by default, but now we want to give uploaders the option to make it an even more liberal CC0 license. The first step is to find a good ontology term for the field. Searching for `creative commons cc0 rdf' rendered this useful page. We also find an overview where CC0 is represented as URI https://creativecommons.org/publicdomain/zero/1.0/. Meanwhile the attribution license https://creativecommons.org/licenses/by/4.0/. According to this document we should really also add fields for attributionName and attributionURL.

Note: work in progress


Created by Pjotr Prins (pjotr.public768 at thebird 'dot' nl) using Emacs org-mode and a healthy dose of Lisp!
Modified 2020-07-12 Sun 06:24
.