Scientific Data Services – Autonomous Data Management on Exascale Infrastructure

DOE mission critical applications are expected to generate exabytes of data per second in a few years. Capturing, storing, and accessing the most relevant features from these massive data collections are the core challenges of scientific data management research. We plan to meet these challenges with a scalable data management system to enable efficient analysis operations. More specifically, we focus on improving efficiency of data storage, data access, and in situ analysis to accelerate scientific discoveries. In the past couple of years, we have been working on (1) designing a new paradigm for querying array data stored in popular data formats, such as HDF5 and NetCDF, (2) creating novel data management and analysis algorithms to take advantage of emerging many-core architectures and to expand the capability of in situ data processing systems, and (3) developing fundamental algorithms and data structures for efficient selection and access of large data sets on disk tailored for dominant access patterns. Together, these activities form a coherent Scientific Data Services (SDS) framework.