BOA

BOA Logo

BOotstrapping linked datA

Try the Beta



Latest AKSW news

LOD2 Webinar Series: LIMES ? Discovery of Links across Knowledge Bases
The 1st version of the LOD2 Stack has been published in September 2011 in the form of an LOD2 Stack demo and the downloadable LOD2 Stack virtual machine image – additional details and the instructions on installing the LOD2 Stack from scratch are available in the How-To-Start document. Born from the wish to make linking [...]

LOD2 Plenary Meeting: WebID and Authorization SIG
Today morning, we started the main LOD2 plenary meeting in Vienna. The first half day session the WebID special interest group discussed about WebID based single sign for the LDO2 stack and authorization. The challenge here is to provide a interoperable authorization layer which describe user, groups / roles and access to different parts of the stack as well as the managed knowledge bases. We agreed on the following short and long-time goals and activities: WebID registration service and stack internal authorization policy.

I-Semantics deadlines approaching (research/application papers and Linked Data Cup)
Several deadlines regarding the I-Semantics 2012 are approaching and we would like to give a gentle reminder to all who are thinking about a submission that now would be a good time to start writing The deadline for research and application papers (8 pages for full papers) is April 13th, 2012. Note that the abstract [...]

General Overview

Most knowledge sources on the Data Web were extracted from structured or semi-structured data. Thus, they encompass solely a small fraction of the information available on the document-oriented Web. In this paper, we present BOA, an iterative bootstrapping strategy for extracting RDF from unstructured data. The idea behind BOA is to use the Data Web as background knowledge for the extraction of natural language patterns that represent predicates found on the Data Web. These patterns are used to extract instance knowledge from natural language text. This knowledge is finally fed back into the Data Web, therewith closing the loop. We evaluate our approach on two data sets using DBpedia as background knowledge. Our results show that we can extract several thousand new facts in one iteration with very high accuracy. Moreover, we provide the first repository of natural language representations of predicates found on the Data Web.

Presentation

The following presentation was held at WeKEx at ISWC 2011 in Bonn:

Paper

You can find additional information on how BOA works in this PDF Documentpaper presented at the Web Scale Knowledge Extraction Workshop @ ISWC 2011.

Architecture

http://aksw.org/Projects/BOA/files?get=architecture_new.png

Source Code

The source code can be found at the Mercurial Google Code Repository.

Background Knowledge

Generated Knowledge

The generated knowledge can be accessed at the BOA dydra repository.

Library of Natural-Language Representations of Formal Relations

The results of the BOA approach can be downloaded in form of an Lucene Index. The pattern in this index were derived from applying DBpedia background knowledge on the English Wikipedia. The index was created as follows:

You can query the index like this:

You can download this index here. Keep in mind that you need Lucene in at least Version 3.0. We applied very strict rules during pattern filtering, so very few patterns were actually generated. Also there are no score constrains applied to the patterns contained, leading to very weak patterns inside the index.


Contact

Dr. Axel-C. Ngonga Ngomo
Johannisgasse 26, Zimmer 5-22
04103 Leipzig

Tel.: +49 341 97-32341
E-Mail, Workpage

Daniel Gerber
Johannisgasse 26, Zimmer 5-21
04103 Leipzig

Tel.: +49 341 97 32322
E-Mail, Research Group, Workpage


 
There are 3 files on this page. [Display files/form]
There is no comment on this page. [Display comments/form]

Information

Last Modification: 2011-11-21 12:10:03 by Daniel Gerber