Tapioca: Tapioca is a search engine for topically similar RDF datasets.

Tapioca is a search engine for finding topically similar linked data datasets.

Demo Issues Source Code

The Web of data is growing continuously with respect to both the size and number of the datasets published. Porting these datasets to five-star Linked Data however requires data publishers to link their novel dataset with the already available Linked Data sets. Given the size and growth of the Linked Data Cloud, the current mostly manual approach used for detecting relevant datasets for linking is thus obsolete.

We present Tapioca, a linked dataset search engine so as to provide data publishers with similar existing datasets automatically. Our search engine uses a novel approach for determining the topical similarity of datasets. This approach relies on probabilistic topic modelling to determine related datasets by relying solely on the metadata of datasets.

The source code can be found at Github. The software is provided under a dual license. For non-commercial purposes, the terms of the LGPL 3.0 license hold. For commercial purposes, please contact us.

For our publication Detecting Similar Linked Datasets Using Topic Modelling we have the following additional material:

  • For the first experiment, you can find the gold standard as well as the detailed F1 scores of Tapioca and a second version of Tapioca that uses the Jensen-Shannon divergence, in this folder.
  • For the second experiment, you can find the detailed values of the P(w|T) and the A measure in this folder.
  • For the third experiment, you can find the detailed values of the P(w|T) and the A measure as well as the F1 scores of our approach in this folder.

Project Team

Publications

by (Editors: ) [BibTex of ]

News

SANSA 0.2 (Semantic Analytics Stack) Released ( 2017-06-13T18:18:28+02:00 by Prof. Dr. Jens Lehmann)

2017-06-13T18:18:28+02:00 by Prof. Dr. Jens Lehmann

The AKSW and Smart Data Analytics groups are happy to announce SANSA 0.2 – the second release of the Scalable Semantic Analytics Stack. Read more about "SANSA 0.2 (Semantic Analytics Stack) Released"

AKSW at ESWC 2017 ( 2017-06-12T10:53:35+02:00 Christopher Schulz)

2017-06-12T10:53:35+02:00 Christopher Schulz

Hello Community! The ESWC 2017 just ended and we give a short report of the course at the conference, especially regarding the AKSW-Group. Our members Dr. Muhammad Saleem, Dr. Mohamed Ahmed Sherif, Claus Stadler, Michael Röder, Prof. Dr. Read more about "AKSW at ESWC 2017"

Four papers accepted at WI 2017 ( 2017-06-10T15:01:31+02:00 Christopher Schulz)

2017-06-10T15:01:31+02:00 Christopher Schulz

Hello Community! We proudly announce that The International Conference on Web Intelligence (WI) accepted four papers by our group. The WI takes place in Leipzig between the 23th – 26th of August. Read more about "Four papers accepted at WI 2017"

AKSW Colloquium, 29.05.2017, Addressing open Machine Translation problems with Linked Data. ( 2017-05-26T13:51:11+02:00 by Diego Moussallem)

2017-05-26T13:51:11+02:00 by Diego Moussallem

At the AKSW Colloquium, on Monday 29th of May 2017, 3 PM, Diego Moussallem will present two papers related to his topic. First paper titled “Using BabelNet to Improve OOV Coverage in SMT” of Du et al. Read more about "AKSW Colloquium, 29.05.2017, Addressing open Machine Translation problems with Linked Data."

SML-Bench 0.2 Released ( 2017-05-11T13:01:45+02:00 by Patrick Westphal)

2017-05-11T13:01:45+02:00 by Patrick Westphal

Dear all, we are happy to announce the 0.2 release of SML-Bench, our Structured Machine Learning benchmark framework. SML-Bench provides full benchmarking scenarios for inductive supervised machine learning covering different knowledge representation languages like OWL and Prolog. Read more about "SML-Bench 0.2 Released"