REX: Web-Scale Extension of RDF Knowledge Bases

REX is an RDF extraction framework for Web data that can learn XPath wrappers from unlabelled Web pages using knowledge from the Linked Open Data Cloud.

API Documentation Issues Source Code Wiki

Introduction

The Web RDF Extraction Framework, REX, addresses the problem of extracting RDF data from templated websites. To this end, REX provide a generic architecture that allows learning XPath wrappers from unlabelled Web pages using knowledge from the Linked Open Data Cloud. REX is to be regarded as a skeleton that is to be fleshed out for your purposes. Still, REX is also a running system as it provides running implementations for all of its interfaces.

In contrast to existing frameworks to RDF extraction using XPath wrappers, REX provides a consistency layer which ensure that the new knowledge extracted is logically consistent with the knowledge already available in the input knowledge base. This website gives an overview of the framework. All technical details can be found on the Github page's wiki. There you will also find:

  • The Java documentation for the coders out there.
  • A manual to help you run the framework before you customize it for your purposes.
  • A ticket system in case you find some bugs or have some feature request.

Architecture

The REX Architecture

To facilitate the implementation of extraction processes, the framework provides the four layer-architecture shown in Figure 1. The data for the extraction is first to be gathered from the Web (or any other source of your choice). To this end, interfaces are provided. Each of the modules in each of the layers is provided as an interface. Moreover, an initial implementation of each interface is provided (see Java Docs).

  • The extraction layer allows for gathering data from the Web and consists of two modules: The crawler gathers website content from the Web while the domain identifier helps detecting web site domains that contain information pertaining to a given property.
  • The storage layer provides interfaces for managing and storing structured data as well as unstructured data.
  • The induction layer contains all modules that allow to learn XPath expressions. The core module here is the XPath Learner.
  • The generation layer allows integration approaches for generating and validating RDF data. The default generator relies on AGDISTIS and ORE.

Evaluation

With REX, we also aimed to provide a baseline system for the extraction of RDF from templated websites. Thus, in addition to providing at least one implementation for all the interfaces, we also evaluated the basic REX. The data we used for the evaluation can be found here.

What next?

There are several things you can do.

  1. Run REX: Simply follow the steps in the manual.
  2. Extend REX: Please check out the installation instructured.
  3. Point out bugs: Please use the issue tracker.

Now you're on. Please extend REX and help improving the extraction of RDF from the Web.

Project Team

Former Members

Publications

by (Editors: ) [BibTex of ]

News

SML-Bench 0.2 Released ( 2017-05-11T13:01:45+02:00 by Patrick Westphal)

2017-05-11T13:01:45+02:00 by Patrick Westphal

Dear all, we are happy to announce the 0.2 release of SML-Bench, our Structured Machine Learning benchmark framework. SML-Bench provides full benchmarking scenarios for inductive supervised machine learning covering different knowledge representation languages like OWL and Prolog. Read more about "SML-Bench 0.2 Released"

AKSW Colloquium, 08.05.2017, Scalable RDF Graph Pattern Matching ( 2017-05-08T09:42:49+02:00 by Lorenz Bühmann)

2017-05-08T09:42:49+02:00 by Lorenz Bühmann

At the AKSW Colloquium, on Monday 8th of May 2017, 3 PM, Lorenz Bühmann will discuss a paper titled “Type-based Semantic Optimization for Scalable RDF Graph Pattern Matching” of Kim et al. Read more about "AKSW Colloquium, 08.05.2017, Scalable RDF Graph Pattern Matching"

ESWC 2017 accepted two Demo Papers by AKSW members ( 2017-04-19T10:19:43+02:00 Christopher Schulz)

2017-04-19T10:19:43+02:00 Christopher Schulz

Hello Community! The 14th ESWC, which takes place from May 28th to June 1st 2017 in Portoroz, Slovenia, accepted two demos to be presented at the conference. Read more about them in the following:                                                                         1. Read more about "ESWC 2017 accepted two Demo Papers by AKSW members"

AKSW Colloquium, 10.04.2017, GeoSPARQL on geospatial databases ( 2017-04-07T10:43:55+02:00 by Dr. Matthias Wauer)

2017-04-07T10:43:55+02:00 by Dr. Matthias Wauer

At the AKSW Colloquium, on Monday 10th of April 2017, 3 PM, Matthias Wauer will discuss a paper titled “Ontop of Geospatial Databases“. Read more about "AKSW Colloquium, 10.04.2017, GeoSPARQL on geospatial databases"

AKSW Colloquium, 03.04.2017, RDF Rule Mining ( 2017-03-31T13:39:28+02:00 TommasoSoru)

2017-03-31T13:39:28+02:00 TommasoSoru

At the AKSW Colloquium, on Monday 3rd of April 2017, 3 PM, Tommaso Soru will present the state of his ongoing research titled “Efficient Rule Mining on RDF Data”, where he will introduce Horn Concerto, a novel scalable SPARQL-based approach … Continue reading → Read more about "AKSW Colloquium, 03.04.2017, RDF Rule Mining"