RDFSlice: Large-scale RDF Dataset Slicing

  • screenshot

In the last years an increasing number of structured data was published on the Web as Linked Open Data (LOD).Despite recent advances, consuming and using Linked Open Data within an organization is still a substantial challenge. Many of the LOD datasets are quite large and despite progress in RDF data management their loading and querying within a triple store is extremely time-consuming and resource-demanding. To overcome this consumption obstacle, we propose a process inspired by the classical Extract-Transform-Load (ETL) paradigm, RDF dataset slicing.

Download Homepage Source Code

RDFSlicing focuses on the selection and extraction. It devises a fragment of SPARQL dubbed SliceSPARQL, which enables the selection of well-defined slices of datasets fulfilling typical information needs. SliceSPARQL supports graph patterns for which each connected subgraph pattern involves a maximum of one variable or IRI in its join conditions. This restriction guarantees the efficient processing of the query against a sequential dataset dump stream. As a result dataset slices can be generated an order of magnitude faster than by using the conventional approach of loading the whole dataset into a triple store and retrieving the slice by executing the query against the triple store's SPARQL endpoint.

Project Team

Publications

by (Editors: ) [BibTex of ]

News

DBpedia Tutorial @ Knowledge Graph Conference 2021 ( 2021-04-09T13:20:50+02:00 by Julia Holze)

2021-04-09T13:20:50+02:00 by Julia Holze

On May 4, 2021 we will organize a tutorial at the Knowledge Graph Conference (KGC) 2021. Read more about "DBpedia Tutorial @ Knowledge Graph Conference 2021"

DBpedia @ Google Summer of Code program 2021 ( 2021-03-15T09:41:22+01:00 by Julia Holze)

2021-03-15T09:41:22+01:00 by Julia Holze

DBpedia, one of InfAI’s community projects, will participate in the Google Summer of Code (GSoC) program for the 10th time. The GsoC program has the goal to bring students from all over the globe into open source software development. Read more about "DBpedia @ Google Summer of Code program 2021"

DBpedia’s New Website ( 2021-01-28T12:42:40+01:00 by Julia Holze)

2021-01-28T12:42:40+01:00 by Julia Holze

We are proud to announce the completion of the new DBpedia website. Read more about "DBpedia’s New Website"

SANSA 0.7.1 (Semantic Analytics Stack) Released ( 2020-01-17T09:52:41+01:00 by Prof. Dr. Jens Lehmann)

2020-01-17T09:52:41+01:00 by Prof. Dr. Jens Lehmann

We are happy to announce SANSA 0.7.1 – the seventh release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs. Read more about "SANSA 0.7.1 (Semantic Analytics Stack) Released"

More Complete Resultset Retrieval from Large Heterogeneous RDF Sources ( 2019-12-05T15:46:09+01:00 Andre Valdestilhas)

2019-12-05T15:46:09+01:00 Andre Valdestilhas

Over recent years, the Web of Data has grown significantly. Various interfaces such as LOD Stats, LOD Laundromat and SPARQL endpoints provide access to hundreds of thousands of RDF datasets, representing billions of facts. Read more about "More Complete Resultset Retrieval from Large Heterogeneous RDF Sources"