Tapioca: Tapioca is a search engine for topically similar RDF datasets.

Tapioca is a search engine for finding topically similar linked data datasets.

Demo Issues Source Code

The Web of data is growing continuously with respect to both the size and number of the datasets published. Porting these datasets to five-star Linked Data however requires data publishers to link their novel dataset with the already available Linked Data sets. Given the size and growth of the Linked Data Cloud, the current mostly manual approach used for detecting relevant datasets for linking is thus obsolete.

We present Tapioca, a linked dataset search engine so as to provide data publishers with similar existing datasets automatically. Our search engine uses a novel approach for determining the topical similarity of datasets. This approach relies on probabilistic topic modelling to determine related datasets by relying solely on the metadata of datasets.

The source code can be found at Github. The software is provided under a dual license. For non-commercial purposes, the terms of the LGPL 3.0 license hold. For commercial purposes, please contact us.

For our publication Detecting Similar Linked Datasets Using Topic Modelling we have the following additional material:

  • For the first experiment, you can find the gold standard as well as the detailed F1 scores of Tapioca and a second version of Tapioca that uses the Jensen-Shannon divergence, in this folder.
  • For the second experiment, you can find the detailed values of the P(w|T) and the A measure in this folder.
  • For the third experiment, you can find the detailed values of the P(w|T) and the A measure as well as the F1 scores of our approach in this folder.

Project Team

Publications

by (Editors: ) [BibTex of ]

News

DBpedia Tutorial @ Knowledge Graph Conference 2021 ( 2021-04-09T13:20:50+02:00 by Julia Holze)

2021-04-09T13:20:50+02:00 by Julia Holze

On May 4, 2021 we will organize a tutorial at the Knowledge Graph Conference (KGC) 2021. Read more about "DBpedia Tutorial @ Knowledge Graph Conference 2021"

DBpedia @ Google Summer of Code program 2021 ( 2021-03-15T09:41:22+01:00 by Julia Holze)

2021-03-15T09:41:22+01:00 by Julia Holze

DBpedia, one of InfAI’s community projects, will participate in the Google Summer of Code (GSoC) program for the 10th time. The GsoC program has the goal to bring students from all over the globe into open source software development. Read more about "DBpedia @ Google Summer of Code program 2021"

DBpedia’s New Website ( 2021-01-28T12:42:40+01:00 by Julia Holze)

2021-01-28T12:42:40+01:00 by Julia Holze

We are proud to announce the completion of the new DBpedia website. Read more about "DBpedia’s New Website"

SANSA 0.7.1 (Semantic Analytics Stack) Released ( 2020-01-17T09:52:41+01:00 by Prof. Dr. Jens Lehmann)

2020-01-17T09:52:41+01:00 by Prof. Dr. Jens Lehmann

We are happy to announce SANSA 0.7.1 – the seventh release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs. Read more about "SANSA 0.7.1 (Semantic Analytics Stack) Released"

More Complete Resultset Retrieval from Large Heterogeneous RDF Sources ( 2019-12-05T15:46:09+01:00 Andre Valdestilhas)

2019-12-05T15:46:09+01:00 Andre Valdestilhas

Over recent years, the Web of Data has grown significantly. Various interfaces such as LOD Stats, LOD Laundromat and SPARQL endpoints provide access to hundreds of thousands of RDF datasets, representing billions of facts. Read more about "More Complete Resultset Retrieval from Large Heterogeneous RDF Sources"