DEQA: Deep Web Extraction for Question Answering

Despite decades of effort, intelligent object search remains elusive. Neither search engine nor semantic web technologies alone have managed to provide usable systems for simple questions such as “Find me a flat with a garden and more than two bedrooms near a supermarket.” We introduce DEQA, a conceptual framework that achieves this elusive goal through combining state-of-the-art semantic technologies with effective data extraction. To that end, we apply DEQA to the UK real estate domain and show that it can answer the majority of such questions correctly. DEQA achieves this by mapping natural language questions to SPARQL patterns. These patterns are then evaluated on an RDF database of current real estate offers. The offers are obtained using OXPATH, a state-of-the-art data extraction system, on the major agencies in the Oxford area and linked through LIMES to background knowledge such as the location of supermarkets.

Demo

AutoSPARQL prototype user interface: http://autosparql-tbsl.dl-learner.org

General Approach

DEQA provides a conceptual framework for enhancing classic information retrieval and search techniques using recent advances in web extraction, data integration and question answering. The overall approach is illustrated in the figure above: Given a particular domain, such as real estate, the first step consists of identifying relevant websites and extracting data from those. This previously tedious task can now be reduced to the rapid creation of OXPath wrappers. In DEQA, data integration is performed through a triple store using a common base ontology. Hence, the first phase may be a combination of the extraction of unstructured and structured data. For instance, websites may already expose data as RDFa, which can then be transformed to the target schema, e.g.using R2R, if necessary. This basic RDF data is enriched, e.g. via linking, schema enrichment, geo-coding or post-processing steps on the extracted data. This is particularly interesting, since the LOD cloud contains a wealth of information across different domains which allows users to formulate queries in a more natural way (e.g., using landmarks rather than postcodes or coordinates). For instance, in our analysis of the real estate domain, over 100k triples for 2,400 properties were extracted and enriched by over 100k links to the LOD cloud. Finally, question answering or semantic search systems can be deployed on top of the created knowledge. One of the most promising research areas in question answering in the past years is the conversion of natural language to SPARQL queries, which allows a direct deployment of such systems on top of a triple store. Finally, DEQA first attempts to convert a natural language query to SPARQL, yet can fall back to standard information retrieval, where this fails.

Use Case: Application to the Real Estate Domain

he domain-specific implementation of the conceptual framework, which we used for the real estate domain, is depicted in the figure above. It covers the above described steps by employing state-of-the-art tools in the respective areas, OXPath for data extraction to RDF, LIMES for linking to the linked data cloud, and TBSL for translating natural language questions to SPARQL queries. Below are the configuration files necessary to set up the system and a pointer to a user interface for testing it:

Members

DIADEM

  • Dr. Tim Furche, http://furche.net
  • Dr. Giovanni Grasso, http://www.giovannigrasso.it/
  • Dr. Christian Schallhart, http://www.cs.ox.ac.uk/people/christian.schallhart/
  • Dr. Andrew Sellers, http://www.cs.ox.ac.uk/people/andrew.sellers/
  • David Liu

CITEC

  • Dr. Christina Unger, http://www.sc.cit-ec.uni-bielefeld.de/people/cunger/

News

ESWC 2017 accepted two Demo Papers by AKSW members ( 2017-04-19T10:19:43+02:00 Christopher Schulz)

2017-04-19T10:19:43+02:00 Christopher Schulz

Hello Community! The 14th ESWC, which takes place from May 28th to June 1st 2017 in Portoroz, Slovenia, accepted two demos to be presented at the conference. Read more about them in the following:                                                                         1. Read more about "ESWC 2017 accepted two Demo Papers by AKSW members"

AKSW Colloquium, 10.04.2017, GeoSPARQL on geospatial databases ( 2017-04-07T10:43:55+02:00 by Dr. Matthias Wauer)

2017-04-07T10:43:55+02:00 by Dr. Matthias Wauer

At the AKSW Colloquium, on Monday 10th of April 2017, 3 PM, Matthias Wauer will discuss a paper titled “Ontop of Geospatial Databases“. Read more about "AKSW Colloquium, 10.04.2017, GeoSPARQL on geospatial databases"

AKSW Colloquium, 03.04.2017, RDF Rule Mining ( 2017-03-31T13:39:28+02:00 TommasoSoru)

2017-03-31T13:39:28+02:00 TommasoSoru

At the AKSW Colloquium, on Monday 3rd of April 2017, 3 PM, Tommaso Soru will present the state of his ongoing research titled “Efficient Rule Mining on RDF Data”, where he will introduce Horn Concerto, a novel scalable SPARQL-based approach … Continue reading → Read more about "AKSW Colloquium, 03.04.2017, RDF Rule Mining"

AKSW Colloquium, 27.03.2017, PPO & PPM 2.0: Extending the privacy preference framework to provide finer-grained access control for the Web of Data ( 2017-03-27T10:13:08+02:00 by Marvin Frommhold)

2017-03-27T10:13:08+02:00 by Marvin Frommhold

In the upcoming Colloquium, March the 27th at 3 PM Marvin Frommhold will discuss the paper “PPO & PPM 2.0: Extending the Privacy Preference Framework to provide finer-grained access control for the Web of Data” by Owen Sacco and John G. Read more about "AKSW Colloquium, 27.03.2017, PPO & PPM 2.0: Extending the privacy preference framework to provide finer-grained access control for the Web of Data"

DBpedia @ Google Summer of Code – GSoC 2017 ( 2017-03-13T11:12:50+01:00 Christopher Schulz)

2017-03-13T11:12:50+01:00 Christopher Schulz

DBpedia, one of InfAI’s community projects, will be part of the 5th Google Summer of Code program. The GsoC has the goal to bring students from all over the globe into open source software development. Read more about "DBpedia @ Google Summer of Code – GSoC 2017"