N3 - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format
Abstract:
Extracting Linked Data following the Semantic Web paradigm from unstructured sources has become a key driver for scientific research as well as new business models.
Named Entity Recognition and Disambiguation are two basic steps in this extraction process.
One key driver to realize the vision of the Semantic Web and develop highly accurate tools is the availability of data for performance validation.
In this article, we present three novel, manually curated and annotated corpora, N3.
Furthermore, we based them on a free licence and stored in the NLP Interchange Format for interoperability reasons.