Recent advances in question answering (QA) over Linked Data provide end users with more and more sophisticated tools for querying linked data by expressing their information need in natural language. This allows access to the wealth of structured data available on the Semantic Web also to non-experts. However, a lot of information is still available only in textual form, both on the Document Web and in the form of labels and abstracts in Linked Data sources. Therefore, a considerable number of questions can only be answered by using hybrid question answering approaches, which can find and combine information stored in both structured and textual data sources.
We present HAWK, the (to best of our knowledge) first full-fledged hybrid QA framework for entity search over Linked Data and textual data.
Given an input query, HAWK implements an 8-step pipeline, which comprises 1) part-of-speech tagging, 2) detecting entities in the query, 3) dependency parsing and 4) applying linguistic pruning heuristics for an in-depth analysis of the natural language input. The results of these first four steps is a predicate-argument graph annotated with resources from the Linked Data Web. HAWK then 5) assign semantic meaning to nodes and 6) generates basic triple patterns for each component of the input query with respect to a multitude of features. This deductive linking of triples results in a set of SPARQL queries containing text operators as well as triple patterns. In order to reduce operational costs, 7) HAWK discards queries using several rules, e.g., by discarding not connected query graphs. Finally, 8) queries are ranked using extensible feature vectors and cosine similarity.
Supplementary material concerning the evaluation and implementation of HAWK can be found here