|
View Storage System |
|
|
|
|
|
|
|
Introduction
CAVES is a fully configurable storage
system that allows a client application to register for caching services.
Applications can register methods to re-use stored applications views and
also register rules that govern how the storage of these views should be
managed. We test different storage management protocols and rules using
a simulation model. This simulation model can also be used to tune a CAVES
installation as usage characteristics change.
Sponsor
Funded by the National Science Foundation:
Project Overview Presentation [ Powerpoint ]
The CAVES system is designed as a general-purpose storage management system that can be located either at a middleware server or at a client machine. Its purpose is to store views (i.e., query results) from multiple applications and re-use them whenever possible to reduce turnaround time between applications and data servers. The CAVES system is fully programmable, making it possible for applications programmers to specify various storage management protocols that are best suited for a given application. The CAVES system also works as a fast prototyping facility for view caching. It constantly monitors the general system behavior, collects statistics and measures the effectiveness of the predefined storage management rules. This is accomplished with the help of a simulation model of the actual system that can be run periodically to perform specific system optimizations.
The architecture of the CAVES system is as follows:

The View Use Methods component of the architecture includes several types of methods for managing and reusing views. These types include:
The View Management Methods component of the architecture includes rules that are used to define the priority of a cached view based on a set of statistics that maintained for each view. These statistics are updated as the views are reused and as the system runs over a period of time. The priorities are used to determine which views enter the cache and which views are removed when a new higher priority view must be entered into the cache. Again, the statistics and rules are defined by the developers of the applications using the CAVES system in order to tailor them to that application.
The Runtime System in the architecture manages a fixed-sized cache for views. A priority queue orders the views in the cache to make removal of the views with lowest priorities efficient when new views must be added to a full cache. System statistics are used for tuning the operation of the cache.
The Simulation Model component of the cache models the runtime system and rules for computing the priorities of the views. It simulates the response of the runtime system to evaluate effectiveness of specific rules. Newly discovered rules and rule refinements are fed back to the runtime system to improve its performance.
CAVES maintains a set of statistics for views that are tailored to capture the important properties of views for particular applications. Each statistic, with associated initialization and update methods, is defined by the application developers of an application. Sample statistics that might be used for the views of an application are:

The views of possibly many applications share a particular CAVES installation. Each application may have its own tailored order formulas. The effect of these order formulas must be integrated into a simple comprehensive management policy for the cache as a whole. One way to do this is to use a weighted sum of several order formulas, using the weights to tune the performance of the cache. Over time the integrated cache management policy must adapt to changing conditions (e.g., a slow network connection) and to changes in the mix of queries from the applications currently using CAVES. This can be done by adjusting the weights in the weighted sum of the order formulas. More significant changes may also require changes in the order formulas themselves.
Consequently, to produce an integrated management policy for the cache and to allow the policy to adapt to changes, the priority of a view in the cache is computed as a weighted sum of order formulas. Let OF = <f1, f2, ..., fm> be a set of order formulas and W = <w1, w2, ..., wm> be a set of weights, such that w1 + w2 + ... + wm = 1. Then the priority of view V is defined as follows:

The overall algorithm for the CAVES system is shown below. This algorithm shows what happens when a new query is sent from an application system to CAVES for processing.

CAVES constantly monitors general system behavior and collects and updates statistics to measure the effectiveness of the storage management rules. This is accomplished through a simulation model of the actual system. As mentioned above, dynamic tuning of CAVES is done by changing the weight vector and/or the order formulas used in the computation of view priorities. Weight vector changes are triggered by weight change rules of the following form:

where Ai is a Boolean expression defined using the statistics and relational operators. This rule changes the weight for order formula f by either an additive factor or a multiplicative factor.
There are a variety of potential applications
for the CAVES system. One application is a geographic information
system performing zooming and panning operations over a possibly multi-layered
map. CAVES is also useful for application involving data mining and
data cube operations. Other applications include mobile applications where
connections to servers are intermittent, applications in which prefetching
of data is possible based on known rules, and distributed applications
running over slow or highly congested networks.