Projects

Table of Contents

    Funded Projects

    ASKW is currently funded with the following regional, national and European research projects:

    SAGE

    <p property="http://purl.org/dc/terms/abstract" content="

    Data integration is one of the main barriers to harnessing the full power of data in companies. Business-relevant data can be distributed across thousands of data silos in different formats in large companies. For companies driven by geospatial data (e.g., disaster management, automobile), the knowledge to be managed accounts for billions of rapidly changing facts (Big Data). Developing dedicated solutions for managing such large amounts of geospatial data is of central importance to improve the efficiency and effectiveness of data delivery for business-critical applications. However, dealing with geospatial data demands specific solutions for dealing with their intrinsic complexity (up to 5 dimensions) and the rapid changes in the data.

    SAGE addresses exactly this challenge by aiming to develop dedicated algorithms for the management of big geospatial data. We will develop time-efficient storage and querying strategies for geospatial data by extending the GeoSPARQL standard so as to deal with continuous queries. A time-efficient knowledge extraction framework dedicated to recognizing geospatial entities will also be developed. In addition, we will focus on developing scalable link discovery approaches for streams of RDF data that will interplay with the storage solution while running on distributed solutions such as FLINK.

    SAGE’s main result will be a set of interoperable solutions that implement time-efficient geospatial analytics that can be integrated into a high-performance solutions. These procedures will enable the fast deployment of SAGE-driven solutions such as triple store geospatial benchmarking, geographic based marketing, disaster management and the continuous delivery of big interlinked geospatial data. Using SAGE in data-driven companies promises to increase the reuse of company internal knowledge, the productivity of employees, the reduction of parallel development and a better use of company-internal resources.

    " class="abstract">

    Data integration is one of the main barriers to harnessing the full power of data in companies. Business-relevant data can be distributed across thousands of data silos in different formats in large companies. For companies driven by geospatial data (e.g., disaster management, automobile), the knowledge to be managed accounts for billions of rapidly changing facts (Big Data). Developing dedicated solutions for managing such large amounts of geospatial data is of central importance to improve the efficiency and effectiveness of data delivery for business-critical applications. However, dealing with geospatial data demands specific solutions for dealing with their intrinsic complexity (up to 5 dimensions) and the rapid changes in the data.

    SAGE addresses exactly this challenge by aiming to develop dedicated algorithms for the management of big geospatial data. We will develop time-efficient storage and querying strategies for geospatial data by extending the GeoSPARQL standard so as to deal with continuous queries. A time-efficient knowledge extraction framework dedicated to recognizing geospatial entities will also be developed. In addition, we will focus on developing scalable link discovery approaches for streams of RDF data that will interplay with the storage solution while running on distributed solutions such as FLINK.

    SAGE’s main result will be a set of interoperable solutions that implement time-efficient geospatial analytics that can be integrated into a high-performance solutions. These procedures will enable the fast deployment of SAGE-driven solutions such as triple store geospatial benchmarking, geographic based marketing, disaster management and the continuous delivery of big interlinked geospatial data. Using SAGE in data-driven companies promises to increase the reuse of company internal knowledge, the productivity of employees, the reduction of parallel development and a better use of company-internal resources.

    Read more about SAGE

    SLIPO

    POIs are the content of any application, service, and product even remotely related to our physical surroundings. From navigation applications, to social networks, to tourism, and logistics, we use POIs to search, communicate, decide, and plan our actions. The Big Data assets for POIs and the evolved POI value chain introduced opportunities for growth, but also complexity, intensifying the challenges relating to their quality-assured integration, enrichment, and data sharing. POI data are by nature semantically diverse and spatiotemporally evolving, representing different entities and associations depending on their geographical, temporal, and thematic context. Pioneered by the FP7 project GeoKnow, linked data technologies have been applied to effectively extract the maximum possible value from open, crowdsourced and proprietary Big Data sources. Validated in the domains of tourism and logistics, these technologies have proven their benefit as a cost-effective and scalable foundation for the quality-assured integration, enrichment, and sharing of generic-purpose geospatial data In SLIPO, we argue that linked data technologies can address the limitations, gaps and challenges of the current landscape in integrating, enriching, and sharing POI data. Our goal is to transfer the research output generated by our work in project GeoKnow, to the specific challenge of POI data, introducing validated and cost-effective innovations across their value chain. Read more about SLIPO

    AKSW was formerly funded with the following regional, national and European research projects:

    • ART-e-FACTMedia continuity artefact management
    • BIGBig Data Public Private Forum
    • BioASQa challenge on large-scale biomedical semantic indexing and question answering
    • Digital Agenda ScoreboardA Statistical Anatomy of Europe's way into the Information Age
    • GOLDGenerating Ontologies from Linked Data
    • LATCLOD Around-the-Clock
    • LE4SWRegional Technology Platform of Social Semantic Collaboration
    • LinkingLODinterlinking knowledge bases
    • LOD2Creating Knowledge out of Interlinked Data
    • OntoWiki.euSocial Semantic Collaboration for EKM, E-Learning & E-Tourism
    • SCMSSemantic Content Management Systems
    • SoftWikiSemantics- and Community-Based Requirements Engineering

    Open Source Projects

    AKSW has launched a number of high-impact R&D OpenSource projects.

    AutoSlides

    AutoSlides is a tool that will automatically create Powerpoint based slideshows, based on chosen topics. It uses several online resources, like DbPedia, Wikipedia and Flickr in order to search for relevant content and to produce meaningful slideshows Read more about AutoSlides

    CubeQA

    As an increasing amount of statistical data is published as RDF, intuitive ways of satisfying information needs and getting new insights out of this type of data becomes increasingly important. Question answering systems provide intuitive access to data by translating natural language queries into SPARQL, which is the native query language of RDF knowledge bases. Existing approaches, however, perform poorly on statistical data because of the different structure. Based on a question corpus compiled in previous work, we created a benchmark for evaluating statistical questions answering systems and to stimulate further research. Building upon a previously established algorithm outline, we detail a Question Anwering algorithm for statistical Linked Data, which covers a wide range of question types, evaluate it using the benchmark and discuss future challenges in this field. To our knowledge, this is the first question answering approach for statistical RDF data and could open up a new research area. Read more about CubeQA

    DBtrends

    Many ranking methods have been proposed for RDF data. These methods often use the structure behind the data to measure its importance. Recently, some of these methods have started to explore information from other sources such as the Wikipedia page graph for better ranking RDF data. In this work, we extensively evaluate the application of different ranking functions for entities, classes, and properties across two different countries as well as their combination. Read more about DBtrends

    DockerConverter

    DockerConverter is an approach and a software to map a Docker configuration to various matured systems and also to reverse engineer any available Docker image in order to increase the confidence (or trust) into it. Read more about DockerConverter

    HAWK

    HAWK is going to drive forth the OKBQA vision of hybrid question answering using Linked Data and full-text information. Performance benchmarks are done on the QALD-4 task 3 hybrid. Read more about HAWK

    Jassa

    Jassa is comprised of a set of layered modules, ranging from a (low-level) RDF API over a service abstraction layer, a SPARQL-JSON mapping layer up to a faceted browsing layer. Furthermore exists a module with a set of reusable AngularJS directives (widgets) for user interface components (Jassa-UI-Angular). Read more about Jassa

    KBox

    KBox allows users to have a single place to share resources and knowledge among different applications. Working on top of RDF model, KBox is a natural extension of the Web on your computer. Read more about KBox

    MEX Vocabulary

    MEX Vocabulary: A Light-Weight Interchange Format for Machine Learning Experiments Read more about MEX Vocabulary

    ORE

    The ORE (Ontology Repair and Enrichment) tool allows for the enrichment, repair and validation of OWL based knowledge bases. Read more about ORE

    Quit

    Collaborating in a setup of distributed evolving RDF knowledge bases. Read more about Quit

    SML-Bench

    The ultimate goal of SML-Bench is to foster research in machine learning from structured data as well as increase the reproducibility and comparability of algorithms in that area. This is important, since a) the preparation of machine learning tasks in that area involves a significant amount of work and b) there are hardly any cross comparisions across languages as this requires data conversion processes. Read more about SML-Bench

    Tapioca

    Tapioca is a search engine for finding topically similar linked data datasets. Read more about Tapioca

    Community Projects

    AKSW has launched a number of high-impact R&D Community projects.

    Cofundos

    Cofundos helps to realize open-source software ideas, by providing a platform for their discussion & enrichment and by establishing a process for organizing the contributions and interests of different stakeholders in the idea. Read more about Cofundos

    NLP2RDF

    NLP2RDF is a LOD2 Community project that is developing the NLP Interchange Format (NIF). NIF aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The output of NLP tools can be converted into RDF and used in the LOD2 Stack. The latest version and up-to-date information can be found on the NLP2RDF landing page. A list of relevant scientific publications is displayed below.

    Read more about NLP2RDF

    Dataset Projects

    AKSW is publisher or contributer to the following dataset projects.

    Catalogus Professorum

    An adapted OntoWiki with accompanying vocabularies for managing historic information related to the professors working at the University of Leipzig in its 600-year history. Read more about Catalogus Professorum

    FTS

    The Financial Transparency System (FTS) of the European Commission contains information about grants for European Union projects starting from 2007. It allows users to get an overview on EU funding, including information on beneficiaries as well as the amount and type of expenditure and information on the responsible EU department. The original dataset is freely available on the European Commission website, where users can query the data using an HTML form and download it in CSV and most recently XML format. The result of the conversion allows interesting queries over the data, which were very difficult without it. The main benefit of the dataset is an increased financial transparency of EU project funding. The RDF version of the FTS dataset will become part of the EU Open Data Portal and eventually be hosted and maintained by the European Union itself. Read more about FTS

    GHO

    The improvement of public health is one of the main indicators for societal progress. Statistical data for monitoring public health is highly relevant for a number of sectors, such as research (e.g. in the life sciences or economy), policy making, health care, pharmaceutical industry, insurances etc. Such data is meanwhile available even on a global scale, e.g. in the Global Health Observatory (GHO) of the United Nations’s World Health Organization (WHO). GHO comprises more than 50 different datasets, it covers all 198 WHO member countries and is updated as more recent or revised data becomes available or when there are changes to the methodology being used. However, this data is only accessible via complex spreadsheets and, therefore, queries over the 50 different datasets as well as combinations with other datasets are very tedious and require a significant amount of manual work. By making the data available as RDF, we lower the barrier for data re-use and integration. Read more about GHO

    Linked History

    The idea of the project derived during our research on prosopographical knowledgebases. We set up together with historians of the University of Leipzig the Catalogus Professorum Lipsiensis project. This projects contains detailled informations about more then 2000 professors in the university history. To enable collaborative research in the field of prosopographical research existing databases have to be interlinked. Within this project, the AKSW research group and the group of the professors cataloge is developing some services to support this idea.

    Read more about Linked History

    Linked TCGA

    The Cancer Genome Atlas Database aims to characterize the changes that occur in genes due to cancer. Knowledge about such changes can be of central nature when aiming to predict the life expectancy as well as the medication or sequence of medication that should be administered to a patient to ensure his/her survival. So far, experts that needed this data had to wait in long data queues and write dedicated tools to analyze this data. Read more about Linked TCGA

    LinkedIdioms

    The LINKEDIDIOM dataset is a Multilingual RDF representation of idioms containing five different languages. The data set was crawled and integrated from various sources. For assuring the quality of the presented data set, all idioms were evaluated by at least two native speakers. We designed the dataset to be easily usable in natural-language processing applications with the goal of facilitating the translation content task. In particular, the dataset uses the best practices in accordance with Linguistic Linked Open Data Community (LLDO). Read more about LinkedIdioms

    Pfarrerbuch

    „Pfarrerbuch“ ist ein Projekt der Arbeitsgemeinschaft für Sächsische Kirchengeschichte, des Instituts für Kirchengeschichte der Universität Leipzig und des Instituts für Informatik der Universität Leipzig sowie des Instituts für Kirchengeschichte an der Evangelisch-Lutherischen Theologischen Universität Budapest. Read more about Pfarrerbuch

    SemanticQuran

    The Semantic Quran dataset is a multilingual RDF representation of translations of the Quran. The dataset was created by integrating data from two different semi-structured sources. The dataset were aligned to an ontology designed to represent multilingual data from sources with a hierarchical structure. The resulting RDF data encompasses 43 different languages which belong to the most under represented languages in Linked Data, including Arabic, Amharic and Amazigh. We designed the dataset to be easily usable in natural-language processing applications with the goal of facilitating the development of knowledge extraction tools for these languages. In particular, the Semantic Quran is compatible with the Natural-Language Interchange Format and contains explicit morpho-syntactic information on the utilized terms. Read more about SemanticQuran

    USPatents

    A patent is a set of exclusive rights granted to an inventor by a sovereign state for a solution, be it a product or a process, for a solution to a particular technological problem. The United States Patent and Trademark Office (USPTO) is part of the US department of Commerce that provides patents to businesses and inventors for their inventions in addition to registration of products and intellectual property identification. Each year, the USPTO grants over 150,000 patents to individuals and companies all over the world. As of December 2011, 8,743,423 patents have been issued and 16,020,302 applications have been received. The USPTO patents are accepted in electronic form and are filed as PDF documents. However, the indexing is not perfect and it is cumbersome to search through the PDF documents. Additionally, Google has also made all the patents available for download in XML format, albeit only from the years 2002 to 2015. Thus, we converted this bulk of data (spanning 13 years) from XML to RDF to conform to the Linked Data principles. Read more about USPatents

    Incubator Projects

    • AgriNepalDataOntology Based Data Access and Integration for Improving the Effectiveness of Farming in Nepal
    • aksw.orga linked data driven web page rendered by OntoWiki site extension
    • ALOEAssisted Linked Data Consumption Engine
    • Analyzing Cognitive Evolution using Linked DataTowards Biomedical Data Integration for Analyzing the Evolution of Cognition
    • AskNowAskNow is a Question Answering (QA) system for RDF datasets.
    • AskNowAskNow is a Question Answering (QA) system for RDF datasets.
    • ASSESSAutomatic Self Assessment
    • AutoSPARQLConvert a natural language expression to a SPARQL query
    • BOABOotstrapping linked datA
    • conTEXTLightweight Text Analytics using Linked Data
    • CSVImportRepresenting multi-dimensional statistical data as RDF using the RDF Data Cube Vocabulary
    • DBpediaDQUser-driven quality evaluation of DBpedia
    • DBpediaDQCrowdCrowdsourcing DBpedia Quality Assessment
    • DBtrendsEvaluating Ranking functions on RDF data sets
    • DEERRDF Data Extraction and Enrichment Framework
    • DeFactoDeep Fact Validation
    • DEQADeep Web Extraction for Question Answering
    • Dockerizing Linked DataKnowledge Base Shipping to the Linked Open Data Cloud
    • DSSNtowards a global Distributed Semantic Social Network
    • ErfurtPHP5 / Zend based Semantic Web API for Social Semantic Software
    • FaceteJavaScript SPARQL-based Faceted Search Library and Browsing Widgets
    • GeoLiftSpatial mapping framework for enriching RDF datasets with Geo-spatial information
    • GERBILGeneral Entity Annotation Benchmark Framework
    • HERObservatoryUsing Linked Data to Build an Observatory of Societal Progress Leveraging on Data Quality
    • IGUANAIntelligent Suite for Benchmarking SPARQL with Updates
    • jena-sparql-apiA Java library featuring tools for transparently boosting SPARQL query execution.
    • KBoxDistributing Ready-to-Query RDF Knowledge Graphs
    • KeyNode.jsNext level web presentations
    • LDWPOthe Linked Data Workflow Project ontology
    • Linked Data Quality SurveyQuality Assessment for Linked Data: A Survey
    • LODStatsa statement-stream-based approach for gathering comprehensive statistics about RDF datasets
    • MosquitoSPARQL benchmark
    • N3 - CollectionN3 - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format
    • NIF4OGGDNatural Language Interchange Format for Open German Governmental Data
    • NLP Interchange Format (NIF)an RDF/OWL-based format that allows to combine and chain several NLP tools in a flexible, light-weight way
    • openQAOpen Question Answering Framework
    • OpenResearchSemantic Wiki for the Sciences
    • QualisBrasilLinked Open Data for supporting scientometric studies
    • Query CacheAdaptive SPARQL Query Cache
    • RDFaCERDFa Content Editor
    • RDFauthoris an editing solution for distributed and syndicated structured content on the World Wide Web
    • RDFSliceLarge-scale RDF Dataset Slicing
    • RDFUnitan RDF Unit-Testing suite
    • ReDD-ObservatoryUsing the Web of Data for Evaluating the Research-Disease Disparity
    • Relation Annotation in GENIA
    • SAIM(Semi-)Automatic Instance Matcher
    • SANSA-StackOpen source platform for distributed data processing for RDF large-scale datasets
    • SCRSSemantic Clinical Registry System for Rare Diseases
    • Semantic PingbackAdding a social dimension to the Linked Data Web
    • SINASemantically INterpreting user query towards question-Answering
    • SMARTA Semantic Search Engine
    • SPARQL2NLconverting SPARQL queries to natural language
    • SparqlAnalyticsI Know What You Did Last Query
    • Sparqlifya SPARQL-SQL rewriter
    • SparqlMapis a SPARQL-to-SQL rewriter
    • TripleCheckMateCrowdsourcing the evaluation of Linked Data
    • VeriLinksverifying links in an arbitrary linkset
    • XodxA basic DSSN node implementation
    • Xturtlean eclipse / Xtext2 based editor for RDF/Turtle files

    Project Alumni

    Some projects have reached a stable state, but are currently not actively maintained and further developed.

    • BorderFlowa general-purpose graph clustering tool
    • DBpedia SPARQL Benchmarka pure RDF benchmark based on actually posed queries
    • IGUANAIntelligent Suite for Benchmarking SPARQL with Updates
    • LDAP 2 SPARQLAccessing RDF Knowledge Bases via LDAP Clients
    • LESSSyndicate Linked Data Content
    • Mobile Social Semantic Webweaving a distributed, semantic social network for mobile users
    • Navigation-induced Knowledge Engineering by Examplea light-weight methodology for low-cost knowledge engineering by a massive user base
    • OD@FMIOpen Data for the University of Leipzig's Math and Computer Science Faculty
    • OntoWiki MobileKnowledge Management in your Pocket
    • PowlSemantic Web Development Plattform
    • R2D2PHP implementation of the D2RQ Mapping Language
    • RDFAPI-JSUse JavaScript RDFa Widgets for Model/View Separation inside Read/Write Websites
    • re:publishLight-weight linked data publishing with node.js
    • Semantic LDAPBringing together LDAP and the Semantic Web
    • SPARQL Trainerlearn to query the semantic web
    • SPARQR SPARQL Query Recommender Web Service
    • XML2OWL XSLTConfigurable XSLT stylesheet, which transforms XML documents into OWL
    • xOperatorcombines advantages of social network websites with instant messaging

    News

    More than 20 European Union Datasets Converted to RDF by LATC Project ( 2012-07-09T15:13:52+02:00 by Prof. Dr. Jens Lehmann)

    2012-07-09T15:13:52+02:00 by Prof. Dr. Jens Lehmann

    Over the past two years, the LATC project (Linked Open Data Around-The-Clock) has worked on converting more than 20 EU datasets to RDF, make them available as Linked Data and SPARQL, and link them to other datasets. Read more about "More than 20 European Union Datasets Converted to RDF by LATC Project"

    Can we create better links by playing games? ( 2012-06-20T20:51:46+02:00 by Prof. Dr. Jens Lehmann)

    2012-06-20T20:51:46+02:00 by Prof. Dr. Jens Lehmann

    Most of you will agree that links are an important element of the Web of Data. Read more about "Can we create better links by playing games?"

    AKSW at TU Dresden PLT ( 2011-06-17T16:34:54+02:00 by Prof. Dr. Jens Lehmann)

    2011-06-17T16:34:54+02:00 by Prof. Dr. Jens Lehmann

    On June 8, I (Jens) visited the process control engineering research group (PLT) of Leon Urbas at the Dresden University of Technology. Read more about "AKSW at TU Dresden PLT"

    May 4-5: Leipziger Semantic Web Tag 2011 and Local Media Conferenz ( 2011-04-16T18:39:51+02:00 by Prof. Dr. Sören Auer)

    2011-04-16T18:39:51+02:00 by Prof. Dr. Sören Auer

    Like in the past two years, we again organize a Leipzig Semantic Web Day on May 5th at the marvelous Mediencampus Villa Ida. This year’s theme is “Linked Data  for the Masses”, particularly focusing on its use in enterprises. Read more about "May 4-5: Leipziger Semantic Web Tag 2011 and Local Media Conferenz"

    AKSW takes part in EU-funded LATC project ( 2010-11-30T15:44:33+01:00 by Prof. Dr. Jens Lehmann)

    2010-11-30T15:44:33+01:00 by Prof. Dr. Jens Lehmann

    The AKSW group is member of the recently started LATC (Linked Open Data Around-The-Clock) project funded by the European Union. LATC aims to improve quality and quantity of Linked Data on the Web, e.g. by developing a 24/7 interlinking engine. Read more about "AKSW takes part in EU-funded LATC project"