* Research

Ph.D. Theses

Decentralized Data Management Framework for Data Grids

By Houda Lamehamedi
Advisor: Boleslaw K. Szymanski
November 30, 2005

An emergence of a new generation of data intensive universal co-operation and collaborative applications has led to increased demand for highly efficient and cost-effective resource sharing and problem solving. Data Grids provide an environment and a framework that supports and coordinates the access of widely distributed storage and compute resources to large numbers of users. In Data Grids data and data management utilities are treated as first class citizens. The main focus of Data Grids is providing users with an infrastructure that enables and facilitates reliable access and sharing of data, access to storage resources, and data transfer services that can scale across widely distributed locations. Yet, providing efficient access to huge and widely distributed data is still a considerable challenge.

Most existing and deployed Grid systems and platforms are centrally managed and are quite difficult to set up and maintain. Proper access to software and hardware resources requires meticulous installation, configuration, and testing of different components across all the participating Grid nodes. In such systems, control of the resources is centralized and usually handled by system administrators. Such configurations hinder dynamic and scalable expansion of the Grid infrastructure and resources. The tremendous growth in data requirements for both scientific and commercial applications in the last few years stresses the need for new data placement algorithms and access tools that can break administrative and geographical barriers. This new generation of universal co-operation and collaborative applications require new approaches to ensure efficient access and distribution of data and resources based on real time users' and applications' demand.

In this thesis we propose a new lightweight distributed, adaptive, and scalable middleware that provides transparent, fast, and reliable access to data and storage resources in distributed resource sharing environments such as Data Grids. Strategically placing data near the user and her application offers considerable benefits and is key to our solution. At the core of our approach are dynamic data placement and replica location techniques that adapt replica creation and location to the continuously changing network connectivity and users behavior. The corresponding framework is fully distributed, self configuring, scalable to large numbers of users, and supportive of dynamic growth of the underlying infrastructure.

We evaluate the benefit and applicability of our proposed solution via analytical models, simulations, and emulation. Results from the simulation and the deployment of our middleware prototype using widely observed and popular data access patterns show that our solution provides better data access performance with lower resource consumption rates than the static approaches.

* Return to main PhD Theses page