RDFUnit: an RDF Unit-Testing suite

RDFUnit is a test driven data-debugging framework that can run automatically generated (based on a schema) and manually generated test cases against an endpoint. All test cases are executed as SPARQL queries using a pattern-based transformation approach.

Demo Source Code Issues Homepage Download Wiki

For more information on our methodology please refer to our report:

Test-driven evaluation of linked data quality. Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, and Amrapali J. Zaveri in Proceedings of the 23rd International Conference on World Wide Web.

RDFUnit in a Nutshell

  • Test case: a data constraint that involves one or more triples. We use SPARQL as a test definition language.
  • Test suite: a set of test cases for testing a dataset
  • Status: Success, Fail, Timeout (complexity) or Error (e.g. network). A Fail can be an actual error, a warning or a notice
  • Data Quality Test Pattern (DQTP): Abstract test cases that can be intantiated into concrete test cases using pattern bindings
  • Pattern Bindings: valid replacements for a DQTP variable
  • Test Auto Generators (TAGs): Converts RDFS/OWL axioms into concrete test cases

As shown in the figure, there are two major sources for creating test cases. One source is stakeholder feedback from everyone involved in the usage of a dataset and the other source is the already existing RDFS/OWL schema of a dataset. Based on this, there are several ways in which test cases can be created:

  • Using RDFS/OWL constraints directly: Test cases can be automatically created via TAGs in this case.
  • Enriching the RDFS/OWL constraints: Since many datasets provide only limited schema information, we perform automatic schema enrichment. These schema enrichment methods can take an RDF/OWL dataset or a SPARQL endpoint as input and automatically suggest schema axioms with a certain confidence value by analysing the dataset. In our methodology, this is used to create further test cases via TAGs. It should be noted that test cases are explicitly labelled, such that the engineer knows that they are less reliable than manual test cases.
  • Re-using tests based on common vocabularies: Naturally, a major goal in the Semantic Web is to re-use existing vocabularies instead of creating them from scratch for each dataset. We detect the used vocabularies in a dataset, which allows to re-use test cases from a test case pattern library.
  • Instantiate existing DQTPs: The aim of DQTPs is to be generic, such that they can be applied to different datasets. While this requires a high initial effort of compiling a pattern library, it is beneficial in the long run, since they can be re-used. Instead of writing SPARQL templates themselves, an engineer can select and instantiate the correct DQTP. This does not necessarily require SPARQL knowledge, but can also be achieved via a textual description of a DQTP, examples and its intended usage.
  • Write own DQTPs: In some cases, test cases cannot be generated by any of the automatic and semi-automatic methods above and have to be written from scratch by an engineer. These DQTPs can then become part of a central library to facilitate later re-use.

Publications

by (Editors: ) [BibTex of ]

News

AKSW Colloquium, 09.05.2016: Hebrew MMoOn inventory, federated SPARQL query processing ( 2016-05-03T13:57:03+02:00 by Bettina Klimek)

2016-05-03T13:57:03+02:00 by Bettina Klimek

In this week’s colloquium Bettina Klimek will give a practice talk of the paper ‘Creating Linked Data Morphological Language Resources with MMoOn – The Hebrew Morpheme Inventory‘, which she will present at the LREC conference 2016, 23-28 May 2016, Slovenia, … Continue reading → Read more about "AKSW Colloquium, 09.05.2016: Hebrew MMoOn inventory, federated SPARQL query processing"

AKSW Colloquium, 25.04.2016, DISPONTE, Workbench for Big Data Dev ( 2016-04-22T16:05:08+02:00 by Mohamed Sherif)

2016-04-22T16:05:08+02:00 by Mohamed Sherif

In this colloquium, Frank Nietzsche will present his master thesis titled “Game Theory- distributed solving” Game theory analyzes the behavior of individuals in complex situations. One popular game in Europe and North America with such a complex situation is Skat. Read more about "AKSW Colloquium, 25.04.2016, DISPONTE, Workbench for Big Data Dev"

AKSW Colloquium, 18.04.2016, DISPONTE, Workbench for Big Data Dev ( 2016-04-18T10:17:10+02:00 by Patrick Westphal)

2016-04-18T10:17:10+02:00 by Patrick Westphal

In this week’s Colloquium, today 18th of April at 3 PM, Patrick Westphal will present the paper ‘Probabilistic Description Logics under the Distribution Semantics‘ by Riguzzi et. al. Abstract Representing uncertain information is crucial for modeling real world domains. Read more about "AKSW Colloquium, 18.04.2016, DISPONTE, Workbench for Big Data Dev"

AKSW Colloquium, 11.04.2016, METEOR with DBnary ( 2016-04-11T09:07:18+02:00 by Ricardo Usbeck)

2016-04-11T09:07:18+02:00 by Ricardo Usbeck

In this week’s Colloquium, today 11th of April at 3 PM, Diego Moussallem will present the paper by Zied Elloumi et al. titled “METEOR for Multiple Target Languages using DBnary.” [PDF]. Read more about "AKSW Colloquium, 11.04.2016, METEOR with DBnary"

AKSW Colloquium, 04.04.2016, AMIE + Structured Feedback ( 2016-04-04T13:51:45+02:00 by Lorenz Bühmann)

2016-04-04T13:51:45+02:00 by Lorenz Bühmann

In this week’s Colloquium, today 4th of April at 3 PM, Lorenz Bühmann will present the paper by Galárraga et al. titled “AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases.” [PDF]. Read more about "AKSW Colloquium, 04.04.2016, AMIE + Structured Feedback"