CAVES - A Configurable Application
View Storage System

 
Introduction
Project Sponsor
Project Members
Overview
Publications

Introduction
CAVES is a fully configurable storage system that allows a client application to register for caching services. Applications can register methods to re-use stored applications views and also register rules that govern how the storage of these views should be managed. We test different storage management protocols and rules using a simulation model. This simulation model can also be used to tune a CAVES installation as usage characteristics change.

Sponsor
Funded by the National Science Foundation:

Project Members Overview

Project Overview Presentation [ Powerpoint ]

The CAVES system is designed as a general-purpose storage management system that can be located either at a middleware server or at a client machine. Its purpose is to store views (i.e., query results) from multiple applications and re-use them whenever possible to reduce turnaround time between applications and data servers. The CAVES system is fully programmable, making it possible for applications programmers to specify various storage management protocols that are best suited for a given application. The CAVES system also works as a fast prototyping facility for view caching. It constantly monitors the general system behavior, collects statistics and measures the effectiveness of the predefined storage management rules. This is accomplished with the help of a simulation model of the actual system that can be run periodically to perform specific system optimizations.

The architecture of the CAVES system is as follows:

The View Use Methods component of the architecture includes several types of methods for managing and reusing views.  These types include:

These methods are created by the application developers and are tailored to specific application systems using CAVES.

The View Management Methods component of the architecture includes rules that are used to define the priority of a cached view based on a set of statistics that maintained for each view.  These statistics are updated as the views are reused and as the system runs over a period of time.  The priorities are used to determine which views enter the cache and which views are removed when a new higher priority view must be entered into the cache.  Again, the statistics and rules are defined by the developers of the applications using the CAVES system in order to tailor them to that application.

The Runtime System in the architecture manages a fixed-sized cache for views.  A priority queue orders the views in the cache to make removal of the views with lowest priorities efficient when new views must be added to a full cache.  System statistics are used for tuning the operation of the cache.

The Simulation Model component of the cache models the runtime system and rules for computing the priorities of the views.  It simulates the response of the runtime system to evaluate effectiveness of specific rules.  Newly discovered rules and rule refinements are fed back to the runtime system to improve its performance.

CAVES maintains a set of statistics for views that are tailored to capture the important properties of views for particular applications.  Each statistic, with associated initialization and update methods, is defined by the application developers of an application.  Sample statistics that might be used for the views of an application are:

CAVES computes the priority view based on the view's statistics.  Order formulas define this computation.  Order formulas are tailored to specific applications and are specified by the application programmers that develop an application.  Example order formulas include:

The views of possibly many applications share a particular CAVES installation.  Each application may have its own tailored order formulas.  The effect of these order formulas must be integrated into a simple comprehensive management policy for the cache as a whole.  One way to do this is to use a weighted sum of several order formulas, using the weights to tune the performance of the cache.  Over time the integrated cache  management policy must adapt to changing conditions (e.g., a slow network connection) and to changes in the mix of queries from the applications currently using CAVES.  This can be done by adjusting the weights in the weighted sum of the order formulas.  More significant changes may also require changes in the order formulas themselves.

Consequently, to produce an integrated management policy for the cache and to allow the policy to adapt to changes, the priority of a view in the cache is computed as a weighted sum of order formulas.  Let OF = <f1, f2, ..., fm> be a set of order formulas and W = <w1, w2, ..., wm> be a set of weights, such that w1 + w2 + ... + wm = 1.  Then the priority of view V is defined as follows:

The overall algorithm for the CAVES system is shown below.  This algorithm shows what happens when a new query is sent from an application system to CAVES for processing.




CAVES constantly monitors general system behavior and collects and updates statistics to measure the effectiveness of the storage management rules.  This is accomplished through a simulation model of the actual system.  As mentioned above, dynamic tuning of CAVES is done by changing the weight vector and/or the order formulas used in the computation of view priorities.  Weight vector changes are triggered by weight change rules of the following form:

where Ai is a Boolean expression defined using the statistics and relational operators.  This rule changes the weight for order formula f by either an additive factor or a multiplicative factor.

There are a variety of potential applications for the CAVES system.  One application is a geographic information system performing zooming and panning operations over a possibly multi-layered map.  CAVES is also useful for application involving data mining and data cube operations. Other applications include mobile applications where connections to servers are intermittent, applications in which  prefetching of data is possible based on known rules, and distributed applications running over slow or highly congested networks.
 

Selected Publications

Software

Caves Simulation Package
README file [TXT]
The complete package [TAR/GZ]

Last updated August, 2005.