DBpedia NIF Dataset: Open, Large-Scale and Multilingual Knowledge Extraction Corpus

DBpedia NIF - a large-scale and multilingual knowledge extraction corpus. The aim of the dataset is two-fold: to dramatically broaden and deepen the amount of structured information in DBpedia, and to provide large-scale and multilingual language resource for development of various NLP and IR task. The dataset provides the content of all articles for 128 Wikipedia languages.

Homepage

Overview

The DBpedia community has put signi cant amount of effort on developing technical infrastructure and methods for ecient extraction of structured information from Wikipedia. These efforts have been primarily focused on harvesting, refinement and publishing semi-structured information found in Wikipedia articles, such as information from infoboxes, categorization information, images, wikilinks and citations. Nevertheless, still vast amount of valuable information is contained in the unstructured Wikipedia article texts. DBpedia NIF aims to fill in these gaps and extract valuable information from Wikipedia article texts. In its core, DBpedia NIF is a large-scale and multilingual knowledge extraction corpus. The purpose of this project is two-fold: to dramatically broaden and deepen the amount of structured information in DBpedia, and to provide large-scale and multilingual language resource for development of various NLP and IR task. The dataset provides the content of all articles for 128 Wikipedia languages. It captures the content as it is found in Wikipedia-it captures the structure (sections and paragraphs) and the annotations provided by the Wikipedia editors.

DBpedia NIF

Key Features and Facts

  • content in 128 Wikipedia languages
  • over 9 billion RDF triples, which is almost 40% of DBpedia
  • selected partitions published as Linked Data
  • exploited within the TextExt - DBpedia Open Extraction challenge
  • available for large-scale training NLP and IR methods

TextExt - DBpedia Open Extraction challenge

The DBpedia Open Text Extraction Challenge differs significantly from other challenges in the language technology and other areas in that it is not a one time call, but a continuous growing and expanding challenge with the focus to sustainably advance the state of the art and transcend boundaries in a systematic way. The DBpedia Association and the people behind this challenge are committed to provide the necessary infrastructure and drive the challenge for an indefinite time as well as potentially extend the challenge beyond Wikipedia. We provide data form the DBpedia NIF datasets in 9 different languages and your task is to execute your NLP tool on the data and extract valuable information such as facts, relations, events, terminology, ontologies as RDF triples, or useful NLP annotations such as pos-tags, dependencies or co-reference.

Join the challenge at any time, there are no strict deadlines!

Project Team

Publications

by (Editors: ) [BibTex of ]

News

AKSW is organizing the 6th Leipzig Semantic Web Day (LSWT2018) ( 2018-04-17T14:14:17+02:00 by Natanael Arndt)

2018-04-17T14:14:17+02:00 by Natanael Arndt

On June 18th 2018 we will have the 6th Leipzig Semantic Web Day (LSWT2018). A platform for regional actors to get in touch with each other regarding Semantic Web topics. Read more about "AKSW is organizing the 6th Leipzig Semantic Web Day (LSWT2018)"

SANSA 0.3 (Semantic Analytics Stack) Released ( 2017-12-18T11:15:38+01:00 by Simon Bin)

2017-12-18T11:15:38+01:00 by Simon Bin

Dear all, We are happy to announce SANSA 0.3 – the third release of the Scalable Semantic Analytics Stack. Read more about "SANSA 0.3 (Semantic Analytics Stack) Released"

DBpedia @ SEMANTiCS 2017 ( 2017-09-04T15:25:14+02:00 by Sandra Bartsch)

2017-09-04T15:25:14+02:00 by Sandra Bartsch

We are happy to invite you to the 10th DBpedia Community Meeting which will be held in Amsterdam. During the SEMANTiCS 2017, Sep 11-14, the DBpedia Community will get together on the 14th of September for the DBpdia Day. Read more about "DBpedia @ SEMANTiCS 2017"

PRESS RELEASE: Amsterdam​ ​-​ ​this​ ​year’s​ ​hotspot​ ​​on Linked​ ​Data​ ​Strategies​ ​&​ ​Practices ( 2017-09-04T11:58:06+02:00 by Sandra Bartsch)

2017-09-04T11:58:06+02:00 by Sandra Bartsch

September 11-14, 2017 international experts from science and industry demonstrate the business value of smart data services at SEMANTiCS 2017 Experts from science and industry meet at Europe’s biggest Linked Data and Semantic Web event to present and discuss latest … Continue reading → Read more about "PRESS RELEASE: Amsterdam​ ​-​ ​this​ ​year’s​ ​hotspot​ ​​on Linked​ ​Data​ ​Strategies​ ​&​ ​Practices"

AKSW Colloquium, 01.09.2017, IDOL: Comprehensive & Complete LOD Insights ( 2017-08-28T17:24:03+02:00 Gustavo Publio)

2017-08-28T17:24:03+02:00 Gustavo Publio

At the AKSW Colloquium on Friday 1st of September, at 10:40 AM there will be a paper presentation by Gustavo Publio. Read more about "AKSW Colloquium, 01.09.2017, IDOL: Comprehensive & Complete LOD Insights"