Table of Contents

    Funded Projects

    ASKW is currently funded with the following regional, national and European research projects:


    The chemical diversity of the flora and fauna of the Brazilian Biomes is revealed by its multiplicity of compound classes and structural types of secondary metabolites from plants, fungi, insects, marine organisms, and bacteria. This extraordinary bio- and chemical diversity can be explored for developing new bioproducts, including pharmaceuticals, cosmetics, food supplements, and agricultural pesticides. DINOBBIO aims to investigate the challenges imposed on building, managing, and consuming Biochemical Knowledge Graphs through Semantic Web Technologies and Machine Learning. DINOBBIO focuses on researching new natural inspired products using Brazilian biodiversity. Read more about DINOBBIO


    Im Projekt KupferDigital werden Methoden und Konzepte erforscht, um den Lebenszyklus am Beispiel von Kupfer – von der Erzgewinnung bis zum Recycling – digital zu erfassen. Ziel ist es, einen Demonstrator für ein digitales Datenökosystem zu erstellen, der der Digitalisierung der Materialforschung und der metallverarbeitenden Industrie als zukunftsfähige Plattform zur Verfügung stehen soll. Read more about KupferDigital


    Zahlreiche Einrichtungen der Kommunen und der Länder besitzen hochgradig relevante Datenschätze aus dem Mobilitätsbereich, die sie gern einer breiten Öffentlichkeit zur Verfügung stellen möchten. Der potenzielle Nutzerkreis ist groß: Unternehmen und Behörden, aber auch Journalisten, Wissenschaftler oder Bürger haben Interesse an der Nutzung. Oftmals existiert vor Ort jedoch kein Werkzeug, um die Daten direkt in bundesweite Datenportale einzuspielen. Deshalb planen die Projektpartner, eine Softwarelösung für die automatisierte Veröffentlichung von Daten zu erstellen: den mCLIENT. Der mCLIENT soll in der Lage sein, Datensammlungen auf Qualität zu prüfen, semantisch zu beschreiben und im PUSH-Modus in zentrale Datenportale zu überführen. Read more about mCLIENT


    Data integration is one of the main barriers to harnessing the full power of data in companies. SAGE addresses this challenge by aiming to develop dedicated algorithms for the management of big geospatial data. SAGE’s main result will be a set of interoperable solutions that implement time-efficient geospatial analytics that can be integrated into a high-performance solutions. Read more about SAGE


    The increasing use of automation in machine and plant construction has inevitably led to a large growth in the number of industrial production processes being recorded and monitored by sensors. If the vast quantities of data this generates could be centrally evaluated in real time it would be possible for the results to be used to optimize internal processes and drastically reduce production costs. Unfortunately, the most commonly used data analysis tools are simply not designed to handle such enormous amounts of real time data. The SAKE project has been set up to resolve this problem by developing a framework specifically designed to analyze these vast streams of data. By implementing prefabricated modules it will also be possible to use individual applications in a number of different roles. The modules will be evaluated in real-life industrial environments by the project's industrial partners. Read more about SAKE


    POIs are the content of any application, service, and product even remotely related to our physical surroundings. From navigation applications, to social networks, to tourism, and logistics, we use POIs to search, communicate, decide, and plan our actions. The Big Data assets for POIs and the evolved POI value chain introduced opportunities for growth, but also complexity, intensifying the challenges relating to their quality-assured integration, enrichment, and data sharing. POI data are by nature semantically diverse and spatiotemporally evolving, representing different entities and associations depending on their geographical, temporal, and thematic context. Pioneered by the FP7 project GeoKnow, linked data technologies have been applied to effectively extract the maximum possible value from open, crowdsourced and proprietary Big Data sources. Validated in the domains of tourism and logistics, these technologies have proven their benefit as a cost-effective and scalable foundation for the quality-assured integration, enrichment, and sharing of generic-purpose geospatial data In SLIPO, we argue that linked data technologies can address the limitations, gaps and challenges of the current landscape in integrating, enriching, and sharing POI data. Our goal is to transfer the research output generated by our work in project GeoKnow, to the specific challenge of POI data, introducing validated and cost-effective innovations across their value chain. Read more about SLIPO

    AKSW was formerly funded with the following regional, national and European research projects:

    • ALIGNEDAligned, Quality-centric Software and Data Engineering
    • amsl.technologyElectronic Resource Management for Heterogeneous Data in Libraries
    • ART-e-FACTMedia continuity artefact management
    • BDEBig Data Europe
    • BIGBig Data Public Private Forum
    • BioASQa challenge on large-scale biomedical semantic indexing and question answering
    • Digital Agenda ScoreboardA Statistical Anatomy of Europe's way into the Information Age
    • GOLDGenerating Ontologies from Linked Data
    • LATCLOD Around-the-Clock
    • LE4SWRegional Technology Platform of Social Semantic Collaboration
    • LEDSLinked Enterprise Data Services
    • LinkingLODinterlinking knowledge bases
    • LOD2Creating Knowledge out of Interlinked Data
    • OntoWiki.euSocial Semantic Collaboration for EKM, E-Learning & E-Tourism
    • SCMSSemantic Content Management Systems
    • SoftWikiSemantics- and Community-Based Requirements Engineering

    Open Source Projects

    AKSW has launched a number of high-impact R&D OpenSource projects.


    AutoSlides is a tool that will automatically create Powerpoint based slideshows, based on chosen topics. It uses several online resources, like DbPedia, Wikipedia and Flickr in order to search for relevant content and to produce meaningful slideshows Read more about AutoSlides


    As an increasing amount of statistical data is published as RDF, intuitive ways of satisfying information needs and getting new insights out of this type of data becomes increasingly important. Question answering systems provide intuitive access to data by translating natural language queries into SPARQL, which is the native query language of RDF knowledge bases. Existing approaches, however, perform poorly on statistical data because of the different structure. Based on a question corpus compiled in previous work, we created a benchmark for evaluating statistical questions answering systems and to stimulate further research. Building upon a previously established algorithm outline, we detail a Question Anwering algorithm for statistical Linked Data, which covers a wide range of question types, evaluate it using the benchmark and discuss future challenges in this field. To our knowledge, this is the first question answering approach for statistical RDF data and could open up a new research area. Read more about CubeQA


    Many ranking methods have been proposed for RDF data. These methods often use the structure behind the data to measure its importance. Recently, some of these methods have started to explore information from other sources such as the Wikipedia page graph for better ranking RDF data. In this work, we extensively evaluate the application of different ranking functions for entities, classes, and properties across two different countries as well as their combination. Read more about DBtrends


    DockerConverter is an approach and a software to map a Docker configuration to various matured systems and also to reverse engineer any available Docker image in order to increase the confidence (or trust) into it. Read more about DockerConverter


    HAWK is going to drive forth the OKBQA vision of hybrid question answering using Linked Data and full-text information. Performance benchmarks are done on the QALD-4 task 3 hybrid. Read more about HAWK


    Jassa is comprised of a set of layered modules, ranging from a (low-level) RDF API over a service abstraction layer, a SPARQL-JSON mapping layer up to a faceted browsing layer. Furthermore exists a module with a set of reusable AngularJS directives (widgets) for user interface components (Jassa-UI-Angular). Read more about Jassa


    KBox allows users to have a single place to share resources and knowledge among different applications. Working on top of RDF model, KBox is a natural extension of the Web on your computer. Read more about KBox

    MEX Vocabulary

    MEX Vocabulary: A Light-Weight Interchange Format for Machine Learning Experiments Read more about MEX Vocabulary


    The ORE (Ontology Repair and Enrichment) tool allows for the enrichment, repair and validation of OWL based knowledge bases. Read more about ORE


    The ultimate goal of SML-Bench is to foster research in machine learning from structured data as well as increase the reproducibility and comparability of algorithms in that area. This is important, since a) the preparation of machine learning tasks in that area involves a significant amount of work and b) there are hardly any cross comparisions across languages as this requires data conversion processes. Read more about SML-Bench


    Tapioca is a search engine for finding topically similar linked data datasets. Read more about Tapioca

    Community Projects

    AKSW has launched a number of high-impact R&D Community projects.


    Cofundos helps to realize open-source software ideas, by providing a platform for their discussion & enrichment and by establishing a process for organizing the contributions and interests of different stakeholders in the idea. Read more about Cofundos


    NLP2RDF is a LOD2 Community project that is developing the NLP Interchange Format (NIF). NIF aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The output of NLP tools can be converted into RDF and used in the LOD2 Stack. The latest version and up-to-date information can be found on the NLP2RDF landing page. A list of relevant scientific publications is displayed below.

    Read more about NLP2RDF

    Dataset Projects

    AKSW is publisher or contributer to the following dataset projects.

    Catalogus Professorum

    An adapted OntoWiki with accompanying vocabularies for managing historic information related to the professors working at the University of Leipzig in its 600-year history. Read more about Catalogus Professorum

    DBpedia NIF Dataset

    DBpedia NIF - a large-scale and multilingual knowledge extraction corpus. The aim of the dataset is two-fold: to dramatically broaden and deepen the amount of structured information in DBpedia, and to provide large-scale and multilingual language resource for development of various NLP and IR task. The dataset provides the content of all articles for 128 Wikipedia languages. Read more about DBpedia NIF Dataset


    The Financial Transparency System (FTS) of the European Commission contains information about grants for European Union projects starting from 2007. It allows users to get an overview on EU funding, including information on beneficiaries as well as the amount and type of expenditure and information on the responsible EU department. The original dataset is freely available on the European Commission website, where users can query the data using an HTML form and download it in CSV and most recently XML format. The result of the conversion allows interesting queries over the data, which were very difficult without it. The main benefit of the dataset is an increased financial transparency of EU project funding. The RDF version of the FTS dataset will become part of the EU Open Data Portal and eventually be hosted and maintained by the European Union itself. Read more about FTS


    The improvement of public health is one of the main indicators for societal progress. Statistical data for monitoring public health is highly relevant for a number of sectors, such as research (e.g. in the life sciences or economy), policy making, health care, pharmaceutical industry, insurances etc. Such data is meanwhile available even on a global scale, e.g. in the Global Health Observatory (GHO) of the United Nations’s World Health Organization (WHO). GHO comprises more than 50 different datasets, it covers all 198 WHO member countries and is updated as more recent or revised data becomes available or when there are changes to the methodology being used. However, this data is only accessible via complex spreadsheets and, therefore, queries over the 50 different datasets as well as combinations with other datasets are very tedious and require a significant amount of manual work. By making the data available as RDF, we lower the barrier for data re-use and integration. Read more about GHO

    Linked History

    The idea of the project derived during our research on prosopographical knowledgebases. We set up together with historians of the University of Leipzig the Catalogus Professorum Lipsiensis project. This projects contains detailled informations about more then 2000 professors in the university history. To enable collaborative research in the field of prosopographical research existing databases have to be interlinked. Within this project, the AKSW research group and the group of the professors cataloge is developing some services to support this idea.

    Read more about Linked History

    Linked TCGA

    The Cancer Genome Atlas Database aims to characterize the changes that occur in genes due to cancer. Knowledge about such changes can be of central nature when aiming to predict the life expectancy as well as the medication or sequence of medication that should be administered to a patient to ensure his/her survival. So far, experts that needed this data had to wait in long data queues and write dedicated tools to analyze this data. Read more about Linked TCGA


    The LINKEDIDIOM dataset is a Multilingual RDF representation of idioms containing five different languages. The data set was crawled and integrated from various sources. For assuring the quality of the presented data set, all idioms were evaluated by at least two native speakers. We designed the dataset to be easily usable in natural-language processing applications with the goal of facilitating the translation content task. In particular, the dataset uses the best practices in accordance with Linguistic Linked Open Data Community (LLDO). Read more about LinkedIdioms


    „Pfarrerbuch“ ist ein Projekt der Arbeitsgemeinschaft für Sächsische Kirchengeschichte, des Instituts für Kirchengeschichte der Universität Leipzig und des Instituts für Informatik der Universität Leipzig sowie des Instituts für Kirchengeschichte an der Evangelisch-Lutherischen Theologischen Universität Budapest. Read more about Pfarrerbuch


    The Semantic Quran dataset is a multilingual RDF representation of translations of the Quran. The dataset was created by integrating data from two different semi-structured sources. The dataset were aligned to an ontology designed to represent multilingual data from sources with a hierarchical structure. The resulting RDF data encompasses 43 different languages which belong to the most under represented languages in Linked Data, including Arabic, Amharic and Amazigh. We designed the dataset to be easily usable in natural-language processing applications with the goal of facilitating the development of knowledge extraction tools for these languages. In particular, the Semantic Quran is compatible with the Natural-Language Interchange Format and contains explicit morpho-syntactic information on the utilized terms. Read more about SemanticQuran


    A patent is a set of exclusive rights granted to an inventor by a sovereign state for a solution, be it a product or a process, for a solution to a particular technological problem. The United States Patent and Trademark Office (USPTO) is part of the US department of Commerce that provides patents to businesses and inventors for their inventions in addition to registration of products and intellectual property identification. Each year, the USPTO grants over 150,000 patents to individuals and companies all over the world. As of December 2011, 8,743,423 patents have been issued and 16,020,302 applications have been received. The USPTO patents are accepted in electronic form and are filed as PDF documents. However, the indexing is not perfect and it is cumbersome to search through the PDF documents. Additionally, Google has also made all the patents available for download in XML format, albeit only from the years 2002 to 2015. Thus, we converted this bulk of data (spanning 13 years) from XML to RDF to conform to the Linked Data principles. Read more about USPatents

    Incubator Projects

    • AgriNepalDataOntology Based Data Access and Integration for Improving the Effectiveness of Farming in Nepal
    • aksw.orga linked data driven web page rendered by OntoWiki site extension
    • ALOEAssisted Linked Data Consumption Engine
    • Analyzing Cognitive Evolution using Linked DataTowards Biomedical Data Integration for Analyzing the Evolution of Cognition
    • AskNowAskNow is a Question Answering (QA) system for RDF datasets.
    • AskNowAskNow is a Question Answering (QA) system for RDF datasets.
    • ASSESSAutomatic Self Assessment
    • AutoSPARQLConvert a natural language expression to a SPARQL query
    • BOABOotstrapping linked datA
    • conTEXTLightweight Text Analytics using Linked Data
    • CSVImportRepresenting multi-dimensional statistical data as RDF using the RDF Data Cube Vocabulary
    • DBpediaDQUser-driven quality evaluation of DBpedia
    • DBpediaDQCrowdCrowdsourcing DBpedia Quality Assessment
    • DBtrendsEvaluating Ranking functions on RDF data sets
    • DEERRDF Data Extraction and Enrichment Framework
    • DeFactoDeep Fact Validation
    • DEQADeep Web Extraction for Question Answering
    • Dockerizing Linked DataKnowledge Base Shipping to the Linked Open Data Cloud
    • DSSNtowards a global Distributed Semantic Social Network
    • ErfurtPHP5 / Zend based Semantic Web API for Social Semantic Software
    • FaceteJavaScript SPARQL-based Faceted Search Library and Browsing Widgets
    • GeoLiftSpatial mapping framework for enriching RDF datasets with Geo-spatial information
    • GERBILGeneral Entity Annotation Benchmark Framework
    • HERObservatoryUsing Linked Data to Build an Observatory of Societal Progress Leveraging on Data Quality
    • IGUANAIntelligent Suite for Benchmarking SPARQL with Updates
    • jena-sparql-apiA Java library featuring tools for transparently boosting SPARQL query execution.
    • KBoxDistributing Ready-to-Query RDF Knowledge Graphs
    • KeyNode.jsNext level web presentations
    • LDWPOthe Linked Data Workflow Project ontology
    • Linked Data Quality SurveyQuality Assessment for Linked Data: A Survey
    • LODStatsa statement-stream-based approach for gathering comprehensive statistics about RDF datasets
    • MosquitoSPARQL benchmark
    • N3 - CollectionN3 - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format
    • Neural SPARQL MachinesTranslating natural language into machine language for data access.
    • NIF4OGGDNatural Language Interchange Format for Open German Governmental Data
    • NLP Interchange Format (NIF)an RDF/OWL-based format that allows to combine and chain several NLP tools in a flexible, light-weight way
    • openQAOpen Question Answering Framework
    • OpenResearchSemantic Wiki for the Sciences
    • QualisBrasilLinked Open Data for supporting scientometric studies
    • Query CacheAdaptive SPARQL Query Cache
    • RDFaCERDFa Content Editor
    • RDFauthoris an editing solution for distributed and syndicated structured content on the World Wide Web
    • RDFSliceLarge-scale RDF Dataset Slicing
    • RDFUnitan RDF Unit-Testing suite
    • ReDD-ObservatoryUsing the Web of Data for Evaluating the Research-Disease Disparity
    • Relation Annotation in GENIA
    • SAIM(Semi-)Automatic Instance Matcher
    • SANSA-StackOpen source platform for distributed data processing for RDF large-scale datasets
    • SCRSSemantic Clinical Registry System for Rare Diseases
    • Semantic PingbackAdding a social dimension to the Linked Data Web
    • SINASemantically INterpreting user query towards question-Answering
    • SMARTA Semantic Search Engine
    • SPARQL2NLconverting SPARQL queries to natural language
    • SparqlAnalyticsI Know What You Did Last Query
    • Sparqlifya SPARQL-SQL rewriter
    • SparqlMapis a SPARQL-to-SQL rewriter
    • TripleCheckMateCrowdsourcing the evaluation of Linked Data
    • VeriLinksverifying links in an arbitrary linkset
    • XodxA basic DSSN node implementation
    • Xturtlean eclipse / Xtext2 based editor for RDF/Turtle files

    Project Alumni

    Some projects have reached a stable state, but are currently not actively maintained and further developed.

    • BorderFlowa general-purpose graph clustering tool
    • DBpedia SPARQL Benchmarka pure RDF benchmark based on actually posed queries
    • IGUANAIntelligent Suite for Benchmarking SPARQL with Updates
    • LDAP 2 SPARQLAccessing RDF Knowledge Bases via LDAP Clients
    • LESSSyndicate Linked Data Content
    • Mobile Social Semantic Webweaving a distributed, semantic social network for mobile users
    • Navigation-induced Knowledge Engineering by Examplea light-weight methodology for low-cost knowledge engineering by a massive user base
    • OD@FMIOpen Data for the University of Leipzig's Math and Computer Science Faculty
    • OntoWiki MobileKnowledge Management in your Pocket
    • PowlSemantic Web Development Plattform
    • R2D2PHP implementation of the D2RQ Mapping Language
    • RDFAPI-JSUse JavaScript RDFa Widgets for Model/View Separation inside Read/Write Websites
    • re:publishLight-weight linked data publishing with node.js
    • Semantic LDAPBringing together LDAP and the Semantic Web
    • SPARQL Trainerlearn to query the semantic web
    • SPARQR SPARQL Query Recommender Web Service
    • XML2OWL XSLTConfigurable XSLT stylesheet, which transforms XML documents into OWL
    • xOperatorcombines advantages of social network websites with instant messaging


    Can we create better links by playing games? ( 2012-06-20T20:51:46+02:00 by Prof. Dr. Jens Lehmann)

    2012-06-20T20:51:46+02:00 by Prof. Dr. Jens Lehmann

    Most of you will agree that links are an important element of the Web of Data. Read more about "Can we create better links by playing games?"

    AKSW at TU Dresden PLT ( 2011-06-17T16:34:54+02:00 by Prof. Dr. Jens Lehmann)

    2011-06-17T16:34:54+02:00 by Prof. Dr. Jens Lehmann

    On June 8, I (Jens) visited the process control engineering research group (PLT) of Leon Urbas at the Dresden University of Technology. Read more about "AKSW at TU Dresden PLT"

    May 4-5: Leipziger Semantic Web Tag 2011 and Local Media Conferenz ( 2011-04-16T18:39:51+02:00 by Prof. Dr. Sören Auer)

    2011-04-16T18:39:51+02:00 by Prof. Dr. Sören Auer

    Like in the past two years, we again organize a Leipzig Semantic Web Day on May 5th at the marvelous Mediencampus Villa Ida. This year’s theme is “Linked Data  for the Masses”, particularly focusing on its use in enterprises. Read more about "May 4-5: Leipziger Semantic Web Tag 2011 and Local Media Conferenz"

    AKSW takes part in EU-funded LATC project ( 2010-11-30T15:44:33+01:00 by Prof. Dr. Jens Lehmann)

    2010-11-30T15:44:33+01:00 by Prof. Dr. Jens Lehmann

    The AKSW group is member of the recently started LATC (Linked Open Data Around-The-Clock) project funded by the European Union. LATC aims to improve quality and quantity of Linked Data on the Web, e.g. by developing a 24/7 interlinking engine. Read more about "AKSW takes part in EU-funded LATC project"