Projects
Table of Contents
Funded Projects
ASKW is currently funded with the following regional, national and European research projects:
DIESEL

What we want to do: Develop a generic keyword search and question answering infrastructure for distributed and structured enterprise data. Our framework will exploit the distribution of the data to improve both the interpretation and the federated execution of user queries while abiding by the users’ access restrictions. We will deliver an open-source version of the framework that implements keyword search functionality as well as prototypical extensions of the partners’ product suites and use case studies. Read more about DIESEL
DINOBBIO
The chemical diversity of the flora and fauna of the Brazilian Biomes is revealed by its multiplicity of compound classes and structural types of secondary metabolites from plants, fungi, insects, marine organisms, and bacteria. This extraordinary bio- and chemical diversity can be explored for developing new bioproducts, including pharmaceuticals, cosmetics, food supplements, and agricultural pesticides. DINOBBIO aims to investigate the challenges imposed on building, managing, and consuming Biochemical Knowledge Graphs through Semantic Web Technologies and Machine Learning. DINOBBIO focuses on researching new natural inspired products using Brazilian biodiversity. Read more about DINOBBIO
FREME
The general objective of the FREME innovation action is to build an open innovative commercial- grade framework of e-services for multilingual and semantic enrichment of digital content. FREME will empower digital content managers with its advantages and benefits it brings to the market. FREME addresses the general systemic and technological challenges to validate that multilingual and semantic technologies are ready for their integration in real life business cases in innovative way. These technologies are capable to process (harvest and analyse) content, capture datasets, and add value throughout content and data value chains across sectors, countries, and languages. Read more about FREME
GEISER

GEISER develops an open cloud-based platform for integrating geospatial data with sensor data from cyberphysical systems based on semantic and Big Data technologies. Read more about GEISER
GeoKnow

GeoKnow addresses a bold challenge in the area of intelligent information management: the exploitation of the Web as a platform for geospatial knowledge integration as well as for exploration of geographic information. Read more about GeoKnow
HOBBIT

HOBBIT is a European project that develops a holistic open-source platform and industry-grade benchmarks for benchmarking big linked data. Read more about HOBBIT
KupferDigital
Im Projekt KupferDigital werden Methoden und Konzepte erforscht, um den Lebenszyklus am Beispiel von Kupfer – von der Erzgewinnung bis zum Recycling – digital zu erfassen. Ziel ist es, einen Demonstrator für ein digitales Datenökosystem zu erstellen, der der Digitalisierung der Materialforschung und der metallverarbeitenden Industrie als zukunftsfähige Plattform zur Verfügung stehen soll. Read more about KupferDigital
LIDER

We aim at providing an ecosystem for the establishment of a new Linked Open Data (LOD) based ecosystem of free, interlinked, and semantically interoperable language resources (corpora, dictionaries, lexical and syntactic metadata, etc.) and media resources (image, video, etc. metadata) that will allow for free and open exploitation of such resources in multilingual, cross-media content analytics across the EU and beyond, with specific use cases in industries related to social media, financial services, localization, and other multimedia content providers and consumers. In some cases, we will explore new business model and hybrid licensing schemes for using of Linguistic Linked Data in commercial settings for Free but not Open resources. Read more about LIDER
LIMBO

In Open Data schlummert ein enormes Potential, aber die Angebote in Deutschland sind über viele Portale verteilt, verschieden strukturiert und beschrieben. Mit der mCLOUD und dem MDM stellt das BMVI für den Mobilitätsbereich ein bundesweites Angebot bereit. Das zentrale Ziel des Projektes LIMBO besteht in der Verbesserung der Daten und damit einem erleichterten Zugang. Durch einen Mobility Data Space sollen die Daten des Bundesministeriums für Verkehrs & digitale Infrastruktur für die breite Masse verfügbar gemacht werden. LIMBO vereinheitlicht und veredelt die Daten und schafft so die Grundlage für viele neue, innovative Apps und Anwendungen. Read more about LIMBO
mCLIENT
Zahlreiche Einrichtungen der Kommunen und der Länder besitzen hochgradig relevante Datenschätze aus dem Mobilitätsbereich, die sie gern einer breiten Öffentlichkeit zur Verfügung stellen möchten. Der potenzielle Nutzerkreis ist groß: Unternehmen und Behörden, aber auch Journalisten, Wissenschaftler oder Bürger haben Interesse an der Nutzung. Oftmals existiert vor Ort jedoch kein Werkzeug, um die Daten direkt in bundesweite Datenportale einzuspielen. Deshalb planen die Projektpartner, eine Softwarelösung für die automatisierte Veröffentlichung von Daten zu erstellen: den mCLIENT. Der mCLIENT soll in der Lage sein, Datensammlungen auf Qualität zu prüfen, semantisch zu beschreiben und im PUSH-Modus in zentrale Datenportale zu überführen. Read more about mCLIENT
PCP on Web

Development of a scientific method for research on online available and distributed research databases of academic history. Read more about PCP on Web
QAMEL

In QAMEL, we will develop a resource-aware and generic multimodal question answering (QA) framework for mobile devices.QAMEL will support speech, text and gestures as input. Moreover, QAMEL will use the distribution of the data to improve the execution of queries in both offline and online modes. We will deliver an open-source framework that implements (1) multimodal QA, (2) feedback processing functionality as well as (3) prototypical extensions of the partners’ product suites and use case studies Read more about QAMEL
SAGE
Data integration is one of the main barriers to harnessing the full power of data in companies. SAGE addresses this challenge by aiming to develop dedicated algorithms for the management of big geospatial data. SAGE’s main result will be a set of interoperable solutions that implement time-efficient geospatial analytics that can be integrated into a high-performance solutions. Read more about SAGE
SAKE
The increasing use of automation in machine and plant construction has inevitably led to a large growth in the number of industrial production processes being recorded and monitored by sensors. If the vast quantities of data this generates could be centrally evaluated in real time it would be possible for the results to be used to optimize internal processes and drastically reduce production costs. Unfortunately, the most commonly used data analysis tools are simply not designed to handle such enormous amounts of real time data. The SAKE project has been set up to resolve this problem by developing a framework specifically designed to analyze these vast streams of data. By implementing prefabricated modules it will also be possible to use individual applications in a number of different roles. The modules will be evaluated in real-life industrial environments by the project's industrial partners. Read more about SAKE
SlideWiki.eu

The SlideWiki EU project started 2016 with seventeen partners from Europe and Brazil using the award-winning open-source SlideWiki platform from germany as a basis. All project goals aiming at creating a large scale accessible learning and teaching platform using educational technology, skill recognition and global collaboration. Read more about SlideWiki.eu
SLIPO
POIs are the content of any application, service, and product even remotely related to our physical surroundings. From navigation applications, to social networks, to tourism, and logistics, we use POIs to search, communicate, decide, and plan our actions. The Big Data assets for POIs and the evolved POI value chain introduced opportunities for growth, but also complexity, intensifying the challenges relating to their quality-assured integration, enrichment, and data sharing. POI data are by nature semantically diverse and spatiotemporally evolving, representing different entities and associations depending on their geographical, temporal, and thematic context. Pioneered by the FP7 project GeoKnow, linked data technologies have been applied to effectively extract the maximum possible value from open, crowdsourced and proprietary Big Data sources. Validated in the domains of tourism and logistics, these technologies have proven their benefit as a cost-effective and scalable foundation for the quality-assured integration, enrichment, and sharing of generic-purpose geospatial data In SLIPO, we argue that linked data technologies can address the limitations, gaps and challenges of the current landscape in integrating, enriching, and sharing POI data. Our goal is to transfer the research output generated by our work in project GeoKnow, to the specific challenge of POI data, introducing validated and cost-effective innovations across their value chain. Read more about SLIPO
Smart Data Web

The Smart Data Web project has to goal to create an industry knowledge base for the German industry.
Read more about Smart Data WebAKSW was formerly funded with the following regional, national and European research projects:
- ALIGNED – Aligned, Quality-centric Software and Data Engineering
- amsl.technology – Electronic Resource Management for Heterogeneous Data in Libraries
- ART-e-FACT – Media continuity artefact management
- BDE – Big Data Europe
- BIG – Big Data Public Private Forum
- BioASQ – a challenge on large-scale biomedical semantic indexing and question answering
- Digital Agenda Scoreboard – A Statistical Anatomy of Europe's way into the Information Age
- GOLD – Generating Ontologies from Linked Data
- LATC – LOD Around-the-Clock
- LE4SW – Regional Technology Platform of Social Semantic Collaboration
- LEDS – Linked Enterprise Data Services
- LinkingLOD – interlinking knowledge bases
- LOD2 – Creating Knowledge out of Interlinked Data
- OntoWiki.eu – Social Semantic Collaboration for EKM, E-Learning & E-Tourism
- SCMS – Semantic Content Management Systems
- SoftWiki – Semantics- and Community-Based Requirements Engineering
Open Source Projects
AKSW has launched a number of high-impact R&D OpenSource projects.
AGDISTIS

AGDISTIS is an Open Source Named Entity Disambiguation Framework able to link entities against every Linked Data Knowledge Base. Read more about AGDISTIS
AutoSlides
AutoSlides is a tool that will automatically create Powerpoint based slideshows, based on chosen topics. It uses several online resources, like DbPedia, Wikipedia and Flickr in order to search for relevant content and to produce meaningful slideshows Read more about AutoSlides
CubeQA
As an increasing amount of statistical data is published as RDF, intuitive ways of satisfying information needs and getting new insights out of this type of data becomes increasingly important. Question answering systems provide intuitive access to data by translating natural language queries into SPARQL, which is the native query language of RDF knowledge bases. Existing approaches, however, perform poorly on statistical data because of the different structure. Based on a question corpus compiled in previous work, we created a benchmark for evaluating statistical questions answering systems and to stimulate further research. Building upon a previously established algorithm outline, we detail a Question Anwering algorithm for statistical Linked Data, which covers a wide range of question types, evaluate it using the benchmark and discuss future challenges in this field. To our knowledge, this is the first question answering approach for statistical RDF data and could open up a new research area. Read more about CubeQA
CubeViz

CubeViz is a facetted browser for statistical data utilizing the RDF Data Cube vocabulary which is the state-of-the-art in representing statistical data in RDF. This vocabulary is compatible with SDMX and increasingly being adopted. Based on the vocabulary and the encoded Data Cube, CubeViz is generating a facetted browsing widget that can be used to filter interactively observations to be visualized in charts. Based on the selected structure, CubeViz offer beneficiary chart types and options which can be selected by users. Read more about CubeViz
DBtrends
Many ranking methods have been proposed for RDF data. These methods often use the structure behind the data to measure its importance. Recently, some of these methods have started to explore information from other sources such as the Wikipedia page graph for better ranking RDF data. In this work, we extensively evaluate the application of different ranking functions for entities, classes, and properties across two different countries as well as their combination. Read more about DBtrends
DEER

Over the last years, the Linked Data principles have been used across academia and industry to publish and consume structured data. Thanks to the fourth Linked Data principle, many of the RDF datasets used within these applications contain implicit and explicit references to more data. For example, music datasets such as Jamendo include references to locations of record labels, places where artists were born or have been, etc. Datasets such as Drugbank contain references to drugs from DBpedia, were verbal description of the drugs and their usage is explicitly available. The goal of mapping component, dubbed DEER, is to retrieve this information, make it explicit and integrate it into data sources according to the specifications of the user. To this end, DEER relies on a simple yet powerful pipeline system that consists of two main components: enrichment functions and operators. Read more about DEER
DL-Learner

DL-Learner is a tool for learning concepts in Description Logics (DLs) from user-provided examples. Equivalently, it can be used to learn classes in OWL ontologies from selected objects. The goal of DL-Learner is to support knowledge engineers in constructing knowledge and learning about the data they created. Read more about DL-Learner
DockerConverter
DockerConverter is an approach and a software to map a Docker configuration to various matured systems and also to reverse engineer any available Docker image in order to increase the confidence (or trust) into it. Read more about DockerConverter
FOX

FOX is a framework for RDF extraction from text based on Ensemble Learning. It makes use of the diversity of NLP algorithms to extract entities with a high precision and a high recall. Moreover, it provides functionality for Keyword Extraction and Relation Extraction. Read more about FOX
HAWK
HAWK is going to drive forth the OKBQA vision of hybrid question answering using Linked Data and full-text information. Performance benchmarks are done on the QALD-4 task 3 hybrid. Read more about HAWK
Jassa
Jassa is comprised of a set of layered modules, ranging from a (low-level) RDF API over a service abstraction layer, a SPARQL-JSON mapping layer up to a faceted browsing layer. Furthermore exists a module with a set of reusable AngularJS directives (widgets) for user interface components (Jassa-UI-Angular). Read more about Jassa
KBox
KBox allows users to have a single place to share resources and knowledge among different applications. Working on top of RDF model, KBox is a natural extension of the Web on your computer. Read more about KBox
LIMES

LIMES is a link discovery framework for the Web of Data. It implements time-efficient approaches for large-scale link discovery based on the characteristics of metric spaces. It is easily configurable via a configuration file as well as through a graphical user interface. LIMES can be downloaded as standalone tool for carrying out link discovery or as a Java library. Read more about LIMES
LSQ

LSQ is a Linked Dataset describing SPARQL queries extracted from the logs of a variety of prominent public SPARQL endpoints Read more about LSQ
MEX Vocabulary
MEX Vocabulary: A Light-Weight Interchange Format for Machine Learning Experiments Read more about MEX Vocabulary
MMoOn

In the last years a rapid emergence of lexical resources evolved in the Semantic Web. Whereas most of the linguistic information is already machine-readable, we found that morphological information is either absent or only contained in semi-structured strings. While a plethora of linguistic resources for the lexical domain already exist and are highly reused, there is still a great gap for equivalent morphological datasets and ontologies. In order to enable the capturing of the semantics of expressions beneath the word-level, the Multilingual Morpheme Ontology (MMoOn) has been developed. It is designed for the creation of machine-processable and interoperable morpheme inventories of a given natural language. As such, any MMoOn dataset contains not only semantic information of whole words and word-forms but also information on the meaningful parts of which they consist, including inflectional and derivational affixes, stems and bases as well as a wide range of their underlying meanings. Read more about MMoOn
Neural SPARQL Machines

Neural SPARQL Machines translate natural language expressions into sequences encoding SPARQL queries. A generator module builds the training set from manually- or automatically-created templates and a knowledge base. The tasks of entity recognition and query construction are entirely assigned to a LSTM-based recurrent neural network. The support for external word embeddings helps tackling the vocabulary mismatch problem, while curriculum learning is employed to learn graph pattern compositions. Read more about Neural SPARQL Machines
OntoWiki

OntoWiki facilitates the visual presentation of a knowledge base as an information map, with different views on instance data. It enables intuitive authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWIG for text documents. Read more about OntoWiki
openQA

The use of Semantic Web technologies led to an increasing number of structured data published on the Web. Despite the advances on question answering systems retrieving the desired information from structured sources is still a substantial challenge. Users and researchers still face difficulties to integrate and compare their results. openQA is an open source question answering framework that unifies approaches from several domain experts. The aim of openQA is to provide a common platform that can be used to promote advances by easy integration and measuring different approaches. Read more about openQA
ORE
The ORE (Ontology Repair and Enrichment) tool allows for the enrichment, repair and validation of OWL based knowledge bases. Read more about ORE
Palmetto

Palmetto is a quality measuring tool for topics based on coherence calculations. Read more about Palmetto
QUETSAL

QUETSAL is a SPARQL endpoint federation engine for federated SPARQL query processing Read more about QUETSAL
Quit

The Quit Store is a triple store capable of performing version control on RDF knowledge graphs. Is especially supports distributed collaboration setups for distributed evolving RDF knowledge bases. Read more about Quit
REX

REX is an RDF extraction framework for Web data that can learn XPath wrappers from unlabelled Web pages using knowledge from the Linked Open Data Cloud. Read more about REX
Rocker

Rocker is a refinement-operator-based approach for the extraction of keys and almost-keys from RDF datasets. Besides being finite, proper and non-redundant, Rocker is a state-of-the-art tool which can scale to very large knowledge bases. Read more about Rocker
SlideWiki

SlideWiki's goal is to revolutionise how educational materials can be authored, shared and reused. By enabling authors and students to create and share slide decks as HTML in an open platform, communities around the world can benefit from materials created by world-leading educators on a wide range of topics. Read more about SlideWiki
SML-Bench
The ultimate goal of SML-Bench is to foster research in machine learning from structured data as well as increase the reproducibility and comparability of algorithms in that area. This is important, since a) the preparation of machine learning tasks in that area involves a significant amount of work and b) there are hardly any cross comparisions across languages as this requires data conversion processes. Read more about SML-Bench
Tapioca
Tapioca is a search engine for finding topically similar linked data datasets. Read more about Tapioca
Triplify

Triplify tackles the chicken-and-egg problem of the Semantic Web by providing a building block for the “semantification” of Web applications. Triplify provides small, light-weight plugins for database-backed Web applications and exposes semantics as RDF, Linked Data and JSON. Read more about Triplify
Community Projects
AKSW has launched a number of high-impact R&D Community projects.
Cofundos
Cofundos helps to realize open-source software ideas, by providing a platform for their discussion & enrichment and by establishing a process for organizing the contributions and interests of different stakeholders in the idea. Read more about Cofundos
NLP2RDF
NLP2RDF is a LOD2 Community project that is developing the NLP Interchange Format (NIF). NIF aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The output of NLP tools can be converted into RDF and used in the LOD2 Stack. The latest version and up-to-date information can be found on the NLP2RDF landing page. A list of relevant scientific publications is displayed below.
Read more about NLP2RDFDataset Projects
AKSW is publisher or contributer to the following dataset projects.
Catalogus Professorum
An adapted OntoWiki with accompanying vocabularies for managing historic information related to the professors working at the University of Leipzig in its 600-year history. Read more about Catalogus Professorum
DBpedia

DBpedia is a community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data. Read more about DBpedia
DBpedia NIF Dataset
DBpedia NIF - a large-scale and multilingual knowledge extraction corpus. The aim of the dataset is two-fold: to dramatically broaden and deepen the amount of structured information in DBpedia, and to provide large-scale and multilingual language resource for development of various NLP and IR task. The dataset provides the content of all articles for 128 Wikipedia languages. Read more about DBpedia NIF Dataset
FTS
The Financial Transparency System (FTS) of the European Commission contains information about grants for European Union projects starting from 2007. It allows users to get an overview on EU funding, including information on beneficiaries as well as the amount and type of expenditure and information on the responsible EU department. The original dataset is freely available on the European Commission website, where users can query the data using an HTML form and download it in CSV and most recently XML format. The result of the conversion allows interesting queries over the data, which were very difficult without it. The main benefit of the dataset is an increased financial transparency of EU project funding. The RDF version of the FTS dataset will become part of the EU Open Data Portal and eventually be hosted and maintained by the European Union itself. Read more about FTS
GHO
The improvement of public health is one of the main indicators for societal progress. Statistical data for monitoring public health is highly relevant for a number of sectors, such as research (e.g. in the life sciences or economy), policy making, health care, pharmaceutical industry, insurances etc. Such data is meanwhile available even on a global scale, e.g. in the Global Health Observatory (GHO) of the United Nations’s World Health Organization (WHO). GHO comprises more than 50 different datasets, it covers all 198 WHO member countries and is updated as more recent or revised data becomes available or when there are changes to the methodology being used. However, this data is only accessible via complex spreadsheets and, therefore, queries over the 50 different datasets as well as combinations with other datasets are very tedious and require a significant amount of manual work. By making the data available as RDF, we lower the barrier for data re-use and integration. Read more about GHO
Linked History
The idea of the project derived during our research on prosopographical knowledgebases. We set up together with historians of the University of Leipzig the Catalogus Professorum Lipsiensis project. This projects contains detailled informations about more then 2000 professors in the university history. To enable collaborative research in the field of prosopographical research existing databases have to be interlinked. Within this project, the AKSW research group and the group of the professors cataloge is developing some services to support this idea.
Read more about Linked HistoryLinked TCGA
The Cancer Genome Atlas Database aims to characterize the changes that occur in genes due to cancer. Knowledge about such changes can be of central nature when aiming to predict the life expectancy as well as the medication or sequence of medication that should be administered to a patient to ensure his/her survival. So far, experts that needed this data had to wait in long data queues and write dedicated tools to analyze this data. Read more about Linked TCGA
LinkedGeoData

LinkedGeoData is an effort to add a spatial dimension to the Web of Data / Semantic Web. LinkedGeoData uses the information collected by the OpenStreetMap project and makes it available as an RDF knowledge base according to the Linked Data principles. It interlinks this data with other knowledge bases in the Linking Open Data initiative. Read more about LinkedGeoData
LinkedIdioms
The LINKEDIDIOM dataset is a Multilingual RDF representation of idioms containing five different languages. The data set was crawled and integrated from various sources. For assuring the quality of the presented data set, all idioms were evaluated by at least two native speakers. We designed the dataset to be easily usable in natural-language processing applications with the goal of facilitating the translation content task. In particular, the dataset uses the best practices in accordance with Linguistic Linked Open Data Community (LLDO). Read more about LinkedIdioms
LinkedSpending

Transparency into government spending is in high demand from the public. Open spending data has the power to reduce corruption by increasing accountability and strengthens democracy because voters can make better informed decisions. An informed and trusting public also strengthens the government itself because is more likely to commit to large projects. LinkedSpending provides more than 2 million planned and carried out financial transactions from all over the world from 2005 to 2035 as Linked Open Data. This data is represented in the RDF data cube format and is freely available and openly licensed. Read more about LinkedSpending
Pfarrerbuch
„Pfarrerbuch“ ist ein Projekt der Arbeitsgemeinschaft für Sächsische Kirchengeschichte, des Instituts für Kirchengeschichte der Universität Leipzig und des Instituts für Informatik der Universität Leipzig sowie des Instituts für Kirchengeschichte an der Evangelisch-Lutherischen Theologischen Universität Budapest. Read more about Pfarrerbuch
SemanticQuran
The Semantic Quran dataset is a multilingual RDF representation of translations of the Quran. The dataset was created by integrating data from two different semi-structured sources. The dataset were aligned to an ontology designed to represent multilingual data from sources with a hierarchical structure. The resulting RDF data encompasses 43 different languages which belong to the most under represented languages in Linked Data, including Arabic, Amharic and Amazigh. We designed the dataset to be easily usable in natural-language processing applications with the goal of facilitating the development of knowledge extraction tools for these languages. In particular, the Semantic Quran is compatible with the Natural-Language Interchange Format and contains explicit morpho-syntactic information on the utilized terms. Read more about SemanticQuran
USPatents
A patent is a set of exclusive rights granted to an inventor by a sovereign state for a solution, be it a product or a process, for a solution to a particular technological problem. The United States Patent and Trademark Office (USPTO) is part of the US department of Commerce that provides patents to businesses and inventors for their inventions in addition to registration of products and intellectual property identification. Each year, the USPTO grants over 150,000 patents to individuals and companies all over the world. As of December 2011, 8,743,423 patents have been issued and 16,020,302 applications have been received. The USPTO patents are accepted in electronic form and are filed as PDF documents. However, the indexing is not perfect and it is cumbersome to search through the PDF documents. Additionally, Google has also made all the patents available for download in XML format, albeit only from the years 2002 to 2015. Thus, we converted this bulk of data (spanning 13 years) from XML to RDF to conform to the Linked Data principles. Read more about USPatents
Incubator Projects
- AgriNepalData – Ontology Based Data Access and Integration for Improving the Effectiveness of Farming in Nepal
- aksw.org – a linked data driven web page rendered by OntoWiki site extension
- ALOE – Assisted Linked Data Consumption Engine
- Analyzing Cognitive Evolution using Linked Data – Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
- AskNow – AskNow is a Question Answering (QA) system for RDF datasets.
- AskNow – AskNow is a Question Answering (QA) system for RDF datasets.
- ASSESS – Automatic Self Assessment
- AutoSPARQL – Convert a natural language expression to a SPARQL query
- BOA – BOotstrapping linked datA
- conTEXT – Lightweight Text Analytics using Linked Data
- CSVImport – Representing multi-dimensional statistical data as RDF using the RDF Data Cube Vocabulary
- DBpediaDQ – User-driven quality evaluation of DBpedia
- DBpediaDQCrowd – Crowdsourcing DBpedia Quality Assessment
- DBtrends – Evaluating Ranking functions on RDF data sets
- DEER – RDF Data Extraction and Enrichment Framework
- DeFacto – Deep Fact Validation
- DEQA – Deep Web Extraction for Question Answering
- Dockerizing Linked Data – Knowledge Base Shipping to the Linked Open Data Cloud
- DSSN – towards a global Distributed Semantic Social Network
- Erfurt – PHP5 / Zend based Semantic Web API for Social Semantic Software
- Facete – JavaScript SPARQL-based Faceted Search Library and Browsing Widgets
- GeoLift – Spatial mapping framework for enriching RDF datasets with Geo-spatial information
- GERBIL – General Entity Annotation Benchmark Framework
- HERObservatory – Using Linked Data to Build an Observatory of Societal Progress Leveraging on Data Quality
- IGUANA – Intelligent Suite for Benchmarking SPARQL with Updates
- jena-sparql-api – A Java library featuring tools for transparently boosting SPARQL query execution.
- KBox – Distributing Ready-to-Query RDF Knowledge Graphs
- KeyNode.js – Next level web presentations
- LDWPO – the Linked Data Workflow Project ontology
- Linked Data Quality Survey – Quality Assessment for Linked Data: A Survey
- LODStats – a statement-stream-based approach for gathering comprehensive statistics about RDF datasets
- Mosquito – SPARQL benchmark
- N3 - Collection – N3 - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format
- Neural SPARQL Machines – Translating natural language into machine language for data access.
- NIF4OGGD – Natural Language Interchange Format for Open German Governmental Data
- NLP Interchange Format (NIF) – an RDF/OWL-based format that allows to combine and chain several NLP tools in a flexible, light-weight way
- openQA – Open Question Answering Framework
- OpenResearch – Semantic Wiki for the Sciences
- QualisBrasil – Linked Open Data for supporting scientometric studies
- Query Cache – Adaptive SPARQL Query Cache
- RDFaCE – RDFa Content Editor
- RDFauthor – is an editing solution for distributed and syndicated structured content on the World Wide Web
- RDFSlice – Large-scale RDF Dataset Slicing
- RDFUnit – an RDF Unit-Testing suite
- ReDD-Observatory – Using the Web of Data for Evaluating the Research-Disease Disparity
- Relation Annotation in GENIA
- SAIM – (Semi-)Automatic Instance Matcher
- SANSA-Stack – Open source platform for distributed data processing for RDF large-scale datasets
- SCRS – Semantic Clinical Registry System for Rare Diseases
- Semantic Pingback – Adding a social dimension to the Linked Data Web
- SINA – Semantically INterpreting user query towards question-Answering
- SMART – A Semantic Search Engine
- SPARQL2NL – converting SPARQL queries to natural language
- SparqlAnalytics – I Know What You Did Last Query
- Sparqlify – a SPARQL-SQL rewriter
- SparqlMap – is a SPARQL-to-SQL rewriter
- TripleCheckMate – Crowdsourcing the evaluation of Linked Data
- VeriLinks – verifying links in an arbitrary linkset
- Xodx – A basic DSSN node implementation
- Xturtle – an eclipse / Xtext2 based editor for RDF/Turtle files
Project Alumni
Some projects have reached a stable state, but are currently not actively maintained and further developed.
- BorderFlow – a general-purpose graph clustering tool
- DBpedia SPARQL Benchmark – a pure RDF benchmark based on actually posed queries
- IGUANA – Intelligent Suite for Benchmarking SPARQL with Updates
- LDAP 2 SPARQL – Accessing RDF Knowledge Bases via LDAP Clients
- LESS – Syndicate Linked Data Content
- Mobile Social Semantic Web – weaving a distributed, semantic social network for mobile users
- Navigation-induced Knowledge Engineering by Example – a light-weight methodology for low-cost knowledge engineering by a massive user base
- OD@FMI – Open Data for the University of Leipzig's Math and Computer Science Faculty
- OntoWiki Mobile – Knowledge Management in your Pocket
- Powl – Semantic Web Development Plattform
- R2D2 – PHP implementation of the D2RQ Mapping Language
- RDFAPI-JS – Use JavaScript RDFa Widgets for Model/View Separation inside Read/Write Websites
- re:publish – Light-weight linked data publishing with node.js
- Semantic LDAP – Bringing together LDAP and the Semantic Web
- SPARQL Trainer – learn to query the semantic web
- SPARQR – SPARQL Query Recommender Web Service
- XML2OWL XSLT – Configurable XSLT stylesheet, which transforms XML documents into OWL
- xOperator – combines advantages of social network websites with instant messaging