Navigation-induced Knowledge Engineering by Example: a light-weight methodology for low-cost knowledge engineering by a massive user base

  • screenshot

Although structured data is becoming widely available, no other methodology – to the best of our knowledge – is currently able to scale up and provide light-weight knowledge engineering for a massive user base. Using NKE, data providers can publish flat data on the Web without extensively engineering structure upfront, but rather observe how structure is created on the fly by interested users, who navigate the knowledge base and at the same time also benefit from using it. The vision of NKE is to produce ontologies as a result of users navigating through a system. This way, NKE reduces the costs for creating expressive knowledge by disguising it as navigation.

Demo

Links

Introduction

Whenever a consumer enters a department store selling groceries, the way the shelves are sorted and arranged will have been predefined by the store owner. This order might be useful in some cases (where the tooth brush is next to the tooth paste), but in other cases it won't be. If one consumer is searching for protein-rich food, the ideal place to search would be a shelf, with food that is rich of protein. Imagine another consumer, who was searching for the same thing a day before. Imagine she would have taken the effort to go around the store, collect all matching food products and put it on one shelf. Wouldn't this be useful?

In a brick and mortar store such a reordering is impossible for the lack of space alone. But in a digital store, it would not be a shelf, but just another category in the category tree and reordering doesn't require lifting heavy things. With NKE the complexity of creating such categories is disguised as navigation and reduced to selecting examples.

A walkthrough example

A user is entering the site http://amazon.com with the intention to find a selection “2.5 inch external hard drive with 500 GB”. She might proceed as follows:

  1. String search for “hard drive” Amazon search.
  2. Click on the category “external harddrive” on the left hierarchy.
  3. Instead of reformulating the string search, the user will now be able to switch into an “Active Learning” mode in which she can create a list of products, which she is searching for and the ones she she does not search.

Based on this input, an algorithm is able to produce a much preciser recommendation than by analyzing click behavior, because the user is now actually able to directly model the search enquiry by stating examples. In case the recommendation matches the conception of the user, she is able to save it and give it a name, i.e. “2.5'' external hard drives with 500 GB”. This so created category is now available to be incorporated into Amazons hierarchy and will be shown to the next user with the same search intention.

If the Web 3.0 is about “structured data created by a massive user base” then it might mean on the one hand that facts are collected, but on the other hand it might as well mean that users will be able to structure data by navigating, searching and using it. Thus a reciprocal relation is formed between the information need of users and the structure gain through the created taxonomy.

Vision (defining and elaborating the concept of NKE)

Most of the text is taken from our yet unpublished WWW 2011 submission.

Formal Definition

Navigational Knowledge Engineering is the manifestation of labeled examples by interpreting user navigation, combined with the active correction and refinement of these examples by the user to create an ontology of user interests through supervised active machine learning.

The NKE methodology consists of three distinct yet interrelated steps:

  1. Navigation: NKE starts by interpreting navigational behavior of users to infer an initial (seed) set of positive and negative examples.
  2. Iterative Feedback: NKE supports users in interactively refining the seed set of examples such that the final set of objects satisfies the users’ intent.
  3. Retention: NKE allows users to retain previously explored sets of objects by grouping them and saving them for later retrieval.

The vision of NKE is to enable low-cost knowledge engineering on the largest possible scale – the Web. The most fundamental consequence of the paradigm is that value is added to data by just navigating and using it. A reciprocal relation is formed between the information need of users and the information gain through the created taxonomy. Although structured data is becoming widely available, no other methodology – to the best of our knowledge (Nov 2010) – is currently able to scale up and provide light-weight knowledge engineering for a massive user base. Using NKE, data providers can publish flat data on the Web without creating any structure upfront, but rather observe how structure is created on the fly by interested users, who navigate the knowledge base and at the same time also benefit from using it. As an ontology is created by user navigation, it can be used directly to improve further navigation. For other purposes the ontology might not be used directly, but it can be seen as raw material, that needs to be refined and curated by an ontology expert, a system admin or moderator, employed by the data provider. Two users on Amazon.com searching for 2.5 inch external hard drives might save two intentionally different concepts with a similar extension: “hard drives without an extra power cord” and “hard drives measuring 2.5 inches”. An easy improvement by a knowledge curator would be to define equivalence between these concepts (e.g. via an owl:equivalentClass axiom). The knowledge curator could also review the user-generated ontology in regular intervals and select good concepts to be included into a domain ontology.

Prototype (a description of our prototypical implementation)

The methodology is demonstrated with HANNE, a Semantic Web system which enables users and domain experts to navigate over knowledge bases by selecting examples. From these examples, formal OWL class expressions are created and refined by a scalable Iterative Learning approach. When saved by users, these class expressions form an expressive OWL ontology, which can be exploited in numerous ways: as navigation suggestions, as a hierarchy for browsing and as input for a team of ontology editors. In particular, we developed an NKE tool named HANNE (Holistic Application for Navigational KNowledge Engineering), which implements the NKE method. In this tool, users can search for resources and mark them as relevant (+) and irrelevant (-). Machine learning techniques use this to explore the underlying concept a user is looking for. The demo is available at http://hanne.aksw.org, the source code can be downloaded from the Google code project and a screenshot is shown below:

How to use NKE (A tutorial on how to integrate it into a custom system)

NKE is a methodology, which can be implemented with existing technology and on integrated easily with existing solutions. We will explain the three steps necessary to get a working NKE system and then show some Mockups of how existing portals can be upgraded with NKE. Please tell us, if you find an application, which employs NKE.

3 steps to NKE

  1. Your data has to be in a resource-feature scheme. The RDF format for example provides it by design. The resources are objects, which are of interest to the users such as articles in Wikipedia and products on Amazon.com. These objects are described by features, which could be basically any additional data describing the objects such as tags, key-value pairs, any kind of metadata or concrete values such as title, birthdate, salary or price and weight. This is necessary, so a learning algorithm can be provided with the needed input.
  2. Choose an entry point for NKE. Before users can actively indulge in NKE, they need to find initial seed examples to start the iterative feedback loop. A non-exhaustive list of displaying objects of interest is given here:

    • Browse down a hierarchy of categories and display the members: UKAT
    • Choose facets: Faceted Wikipedia Search
    • Anything Solr has to offer
    • A string or keyword search
  3. Implement some way for the user to choose positive and negative examples and give him the means to manage such lists. The main incentive for users is that they can manage and order their own dataspace or save those lists to come back later. Allow him to give names to the lists he created. These names are important as they will be the concept names of the resulting ontology. Use supervized active machine learning to find new matching examples so the user can refine the list. There are many off-the-shelf frameworks for machine learning algorithms that can be used. For HANNE we used the DL-Learner, which works on arbitrary RDF/OWL knowledge bases of arbitrary size. RapidMiner is also an option.

That's it. As soon as users start to save the learned concepts and give them names, an ontology of user interests will start growing. Now it is your choice what to do. Here are three suggestions:

  • Show them back to the users as Navigation Suggestions
  • Integrate them into you category system for Browsing
  • Use it to create and extend your Domain Ontology

Mockups

Project Team

Former Members

Publications

by (Editors: ) [BibTex of ]

News

DBpedia @ Google Summer of Code – GSoC 2017 ( 2017-03-13T11:12:50+01:00 Christopher Schulz)

2017-03-13T11:12:50+01:00 Christopher Schulz

DBpedia, one of InfAI’s community projects, will be part of the 5th Google Summer of Code program. The GsoC has the goal to bring students from all over the globe into open source software development. Read more about "DBpedia @ Google Summer of Code – GSoC 2017"

New GERBIL release v1.2.5 – Benchmarking entity annotation systems ( 2017-03-10T11:49:51+01:00 by Ricardo Usbeck)

2017-03-10T11:49:51+01:00 by Ricardo Usbeck

Dear all, the Smart Data Management competence center at AKSW is happy to announce GERBIL 1.2.5. Read more about "New GERBIL release v1.2.5 – Benchmarking entity annotation systems"

DBpedia Open Text Extraction Challenge – TextExt ( 2017-03-09T12:15:57+01:00 Christopher Schulz)

2017-03-09T12:15:57+01:00 Christopher Schulz

DBpedia, a community project affiliated with the Institute for Applied Informatics (InfAI) e.V., extract structured information from Wikipedia & Wikidata. Now DBpedia started the DBpedia Open Text Extraction Challenge – TextExt. Read more about "DBpedia Open Text Extraction Challenge – TextExt"

The USPTO Linked Patent Dataset release ( 2017-02-24T17:18:51+01:00 by Mofeed Hassan)

2017-02-24T17:18:51+01:00 by Mofeed Hassan

Dear all, We are happy to announce USPTO Linked Patent Dataset release. Patents are widely used to protect intellectual property and a measure of innovation output. Read more about "The USPTO Linked Patent Dataset release"

Two accepted papers in ESWC 2017 ( 2017-02-22T17:43:38+01:00 by Dr. Mohamed Ahmed Sherif)

2017-02-22T17:43:38+01:00 by Dr. Mohamed Ahmed Sherif

Hello Community! We are very pleased to announce the acceptance of two papers in ESWC 2017 research track. The ESWC 2017 is to be held in Portoroz, Slovenia from 28th of May to the 1st of June. Read more about "Two accepted papers in ESWC 2017"