A metadata service for an infrastructure of large scale distributed scientific datasets

No Thumbnail Available

Date

2014-06-12

Authors

Adeleke, Oluwalani Aeoluwa

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In this constantly growing information technology driven era, data migration and replication pose a serious bottleneck in the distributed database infrastructure envi- ronment. For large heterogeneous environments with domains such as geospatial sci- ence and high energy physics, where large array of scienti c data are involved, diverse challenges are encountered with respect to dataset identi cation, location services, and e cient retrieval of information. These challenges include locating data sources, identifying e ective transfer route, and replication, just to mention a few. As dis- tributed systems aimed at constant delivery of data to the point of query origination continue to expand in size and functionality, e cient replication and data retrieval systems have subsequently become increasingly important and relevant. One such system is an infrastructure for large scale distributed scienti c data management. Several data management systems have been developed to help manage these fast growing datasets and their metadata. However little work has been done on allowing cross-communication and data-sharing between these di erent dataset management systems in a distributed, heterogeneous environment. This dissertation addresses this problem, focusing particularly on metadata and provenance service associated with it. We present the Virtual Uni ed Metadata architecture to establish communication between remote sites within a distributed heterogeneous environment using a client-server model. The system provides a frame- work that allows heterogeneous metadata services communicate and share metadata and datasets through the implementation of a communication interface. It allows for metadata discovery and dataset identi cation by enabling remote query between heterogeneous metadata repositories. The signi cant contributions of this system include: { the design and implementation of a client/server based remote metadata query system for scienti c datasets within distributed heterogeneous dataset reposito- ries; { Implementation of a caching mechanism for optimizing the system performance; { Analyzing the quality of service with respect to correct dataset identi cation, estimation of migration and replication time frame, and cache performance.

Description

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By