DEER: RDF Data Extraction and Enrichment Framework

Over the last years, the Linked Data principles have been used across academia and industry to publish and consume structured data. Thanks to the fourth Linked Data principle, many of the RDF datasets used within these applications contain implicit and explicit references to more data. For example, music datasets such as Jamendo include references to locations of record labels, places where artists were born or have been, etc. Datasets such as Drugbank contain references to drugs from DBpedia, were verbal description of the drugs and their usage is explicitly available. The goal of mapping component, dubbed DEER, is to retrieve this information, make it explicit and integrate it into data sources according to the specifications of the user. To this end, DEER relies on a simple yet powerful pipeline system that consists of two main components: enrichment functions and operators.

Download Issues

Enrichment functions and operators.

Enrichment functions implement functionality for processing the content of a dataset (e.g., applying named entity recognition to a particular property). Thus, they take a dataset as input and return a dataset as output. Enrichment operators work at a higher level of granularity and combine datasets. Thus, they take sets of datasets as input and return sets of datasets.

RDF specification paradigm

In the current version of DEER we introduce our new RDF based specification paradigm. The main idea behind this new paradigm is to enable the processing execution of specifications in an efficient way. To this end, we first decided to use RDF as language for the specification. This has the main advantage of allowing for creating specification repositories which can be queried easily with the aim of retrieving accurate specifications for the use cases at hand. Moreover, extensions of the specification language do not require a change of the specification language due to the intrinsic extensibility of ontologies. The third reason for choosing RDF as language for specifications is that we can easily check the specification for correctness by using a reasoner, as the specification ontology allows for specifying the restrictions that specifications must abide by.

Publications

by (Editors: ) [BibTex of ]

News

SANSA 0.7.1 (Semantic Analytics Stack) Released ( 2020-01-17T09:52:41+01:00 by Prof. Dr. Jens Lehmann)

2020-01-17T09:52:41+01:00 by Prof. Dr. Jens Lehmann

We are happy to announce SANSA 0.7.1 – the seventh release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs. Read more about "SANSA 0.7.1 (Semantic Analytics Stack) Released"

More Complete Resultset Retrieval from Large Heterogeneous RDF Sources ( 2019-12-05T15:46:09+01:00 Andre Valdestilhas)

2019-12-05T15:46:09+01:00 Andre Valdestilhas

Over recent years, the Web of Data has grown significantly. Various interfaces such as LOD Stats, LOD Laundromat and SPARQL endpoints provide access to hundreds of thousands of RDF datasets, representing billions of facts. Read more about "More Complete Resultset Retrieval from Large Heterogeneous RDF Sources"

DL-Learner 1.4 (Supervised Structured Machine Learning Framework) Released ( 2019-09-24T22:41:46+02:00 by Simon Bin)

2019-09-24T22:41:46+02:00 by Simon Bin

Dear all, The Smart Data Analytics group [1] and the E.T.-db-MOLE sub-group located at the InfAI Leipzig [2] is happy to announce DL-Learner 1.4. DL-Learner is a framework containing algorithms for supervised machine learning in RDF and OWL. Read more about "DL-Learner 1.4 (Supervised Structured Machine Learning Framework) Released"

DBpedia Day @ SEMANTiCS 2019 ( 2019-08-01T10:35:05+02:00 Sandra Bartsch)

2019-08-01T10:35:05+02:00 Sandra Bartsch

 We are happy to announce that SEMANTiCS 2019 will host the 14th DBpedia Community Meeting at the last day of the conference on September 12, 2019. Read more about "DBpedia Day @ SEMANTiCS 2019"

LDK conference @ University of Leipzig ( 2019-03-22T09:21:41+01:00 by Julia Holze)

2019-03-22T09:21:41+01:00 by Julia Holze

With the advent of digital technologies, an ever-increasing amount of language data is now available across various application areas and industry sectors, thus making language data more and more valuable. Read more about "LDK conference @ University of Leipzig"