LIMES implements novel time-efficient approaches for link discovery in metric spaces. Our approaches utilize the mathematical characteristics of metric spaces to compute estimates of the similarity between instances. These estimates are then used to filter out a large amount of those instance pairs that do not suffice the mapping conditions. By these means, LIMES can reduce the number of comparisons needed during the mapping process by several orders of magnitude. <##>
The general workflow implemented by the LIMES framework comprises four steps: Given a source, a target and a threshold, LIMES first computes a set exemplars for the target data source (step 1). This process is concluded by matching each target instance to the exemplar closest to it. In step 2 and 3, the matching is carried out. In the filterig step, the distance between all source instances and target instances is approximated via the exemplars computed previously (step 3). Obvious non-matches are then filtered out. Subsequently, the real distance between the remaining source and target instances are computed (step 3). Finally, the matching instances are are serialized, i.e., written in a user-defined output stream according to a user-specified format, e.g. ((http://www.w3.org/2001/sw/RDFCore/ntriples/ NTriples)) (step 4).
The LIMES framework consists of seven main modules of which each can be extended to accommodate new or improved functionality. The central modules of LIMES are the controller module, which coordinates the matching process and the data module, which contains all the classes necessary to store data. The matching process is carried out as follows: First, the controller calls the I/O-module, which reads the configuration file and extracts all the information necessary to carry out the comparison of instances, including the URL of the SPARQL-endpoints of the knowledge bases, the restrictions on the instances to map (e.g., their type), the expression of the metric to be used and the threshold to be used. Examples of configuration files can be found in the distribution.
Given that the configuration file is valid w.r.t. the LIMES Specification Language (LSL), the query module is called. This module uses the configuration for the target and source knowledge bases to retrieve instances and properties from the SPARQL-endpoints of the source and target knowledge bases that adhere to the restrictions specified in the configuration file. The query module writes its output into a cache, which can be a file (for large number of instances, not implemented yet) or main memory. Once all instances have been stored in the cache, the controller calls the organizer module. This module carries out two tasks: first, it computes the exemplars of the source knowledge base. Then, it uses the exemplars to compute the matchings from the source to the target knowledge base. Finally, the I/O-module is called to serialize the results.
Running LIMES can be carried in one of three ways.
- You can use our hosted Linking Service,
- Download the LIMES package CAUTION: version 0.4 will be replaced soon. Please contact Axel Ngonga if you are interested in an alpha of the new kernel and run it locally on your server or
- Use the LIMES webservice programmatically at the LIMES Linking Server. A client for tests be found here. The short description (the manual will be out soon) can be found here.