Adaptive SPARQL Query Cache
In order to get closer to the performance of relational database-backed Web applications, we developed an approach for improving the performance of triple stores by caching query results and even complete application objects. The selective invalidation of cache objects, following updates of the underlying knowledge bases, is based on analysing the graph patterns of cached SPARQL queries in order to obtain information about what kind of updates will change the query result.
Overview
We implemented our approach as a small proxy layer, which resides between the Semantic Web application and the SPARQL/SPARUL endpoint. All SPARQL queries and SPARUL updates are routed through this proxy. Once the proxy receives a query, it checks whether a result for this query is cached in its local store.

If that is the case, the result is directly delivered to the client without accessing the triple store. If the query was not previously stored and is not excluded from caching by user-supplied rules, the query is routed to the triple store and, before results are returned to the client, these are stored in the cache’s local result store.
Implementation
We developed two implementations of the SPARQL cache:
PHP Implementation
The PHP implementation is integrated into the Erfurt layer of Onto Wiki. The caching component is part of the latest release and is also used by other web applications build on the Erfurt middleware. This implementation furthermore supports application specific object caching with SPARQL dependencies.
Java Implementation
The Java implementation is a web application which runs within a servlet container. It provides a SPARQL/SPARUL endpoint proxy, requests to this endpoint are forwarded to the original SPARQL/SPARUL endpoint and cached.
The caching servlet is backed by the popular
ehcache. For query parsing we use
Jena.
The prototype can be downloaded
here. The archive consists of the Java sources and a Maven project descriptor. A simple mvn jetty:run should do the trick to get it running. Point your browser then to localhost:8080. The self-explanatory configuration of the cache is done via a simple web interface:

Evaluation
We evaluated our approach by extending the
BSBM triple store benchmark with an update dimension as well as in typical Semantic Web application scenarios. We present here a short overview of the results acquired using the BSBM, which are discussed in greater detail the paper M. Martin, J. Unbehauen, S. Auer:
Improving the Performance of Semantic Web Applications with SPARQL Query Result Caching.

Using the BSBM with
Pareto distributed queries we varied query repetition, benchmarked the implementation against various store sizes and observed significant performance improvements in most cases.

In addition, we measured the impact of updates on the cache and we found that only a high update frequency (i.e. more than one update per 25 queries) reduces overall performance.
Contributions and Support
The following people of AKSW created this project and provide support:
- Sören Auer
- Michael Martin supports the PHP implementation
- Jörg Unbehauen supports the Java implementation
Information
Last Modification:
2010-01-23 10:53:36 by Soeren Auer