HAWK: Hybrid Question Answering over Linked Data

HAWK is going to drive forth the OKBQA vision of hybrid question answering using Linked Data and full-text information. Performance benchmarks are done on the QALD-4 task 3 hybrid.

Source Code Demo Issues

Introduction

Recent advances in question answering (QA) over Linked Data provide end users with more and more sophisticated tools for querying linked data by expressing their information need in natural language. This allows access to the wealth of structured data available on the Semantic Web also to non-experts. However, a lot of information is still available only in textual form, both on the Document Web and in the form of labels and abstracts in Linked Data sources. Therefore, a considerable number of questions can only be answered by using hybrid question answering approaches, which can find and combine information stored in both structured and textual data sources.

Architecture

The HAWK Architecture

We present HAWK, the (to best of our knowledge) first full-fledged hybrid QA framework for entity search over Linked Data and textual data.

Given an input query, HAWK implements an 8-step pipeline, which comprises 1) part-of-speech tagging, 2) detecting entities in the query, 3) dependency parsing and 4) applying linguistic pruning heuristics for an in-depth analysis of the natural language input. The results of these first four steps is a predicate-argument graph annotated with resources from the Linked Data Web. HAWK then 5) assign semantic meaning to nodes and 6) generates basic triple patterns for each component of the input query with respect to a multitude of features. This deductive linking of triples results in a set of SPARQL queries containing text operators as well as triple patterns. In order to reduce operational costs, 7) HAWK discards queries using several rules, e.g., by discarding not connected query graphs. Finally, 8) queries are ranked using extensible feature vectors and cosine similarity.

Supplementary material concerning the evaluation and implementation of HAWK can be found here

Project Team

Former Members

Publications

by (Editors: ) [BibTex of ]

News

SANSA 0.7.1 (Semantic Analytics Stack) Released ( 2020-01-17T09:52:41+01:00 by Prof. Dr. Jens Lehmann)

2020-01-17T09:52:41+01:00 by Prof. Dr. Jens Lehmann

We are happy to announce SANSA 0.7.1 – the seventh release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs. Read more about "SANSA 0.7.1 (Semantic Analytics Stack) Released"

More Complete Resultset Retrieval from Large Heterogeneous RDF Sources ( 2019-12-05T15:46:09+01:00 Andre Valdestilhas)

2019-12-05T15:46:09+01:00 Andre Valdestilhas

Over recent years, the Web of Data has grown significantly. Various interfaces such as LOD Stats, LOD Laundromat and SPARQL endpoints provide access to hundreds of thousands of RDF datasets, representing billions of facts. Read more about "More Complete Resultset Retrieval from Large Heterogeneous RDF Sources"

DL-Learner 1.4 (Supervised Structured Machine Learning Framework) Released ( 2019-09-24T22:41:46+02:00 by Simon Bin)

2019-09-24T22:41:46+02:00 by Simon Bin

Dear all, The Smart Data Analytics group [1] and the E.T.-db-MOLE sub-group located at the InfAI Leipzig [2] is happy to announce DL-Learner 1.4. DL-Learner is a framework containing algorithms for supervised machine learning in RDF and OWL. Read more about "DL-Learner 1.4 (Supervised Structured Machine Learning Framework) Released"

DBpedia Day @ SEMANTiCS 2019 ( 2019-08-01T10:35:05+02:00 Sandra Bartsch)

2019-08-01T10:35:05+02:00 Sandra Bartsch

 We are happy to announce that SEMANTiCS 2019 will host the 14th DBpedia Community Meeting at the last day of the conference on September 12, 2019. Read more about "DBpedia Day @ SEMANTiCS 2019"

LDK conference @ University of Leipzig ( 2019-03-22T09:21:41+01:00 by Julia Holze)

2019-03-22T09:21:41+01:00 by Julia Holze

With the advent of digital technologies, an ever-increasing amount of language data is now available across various application areas and industry sectors, thus making language data more and more valuable. Read more about "LDK conference @ University of Leipzig"