SDM - On the left are the squares 
that grow smaller.  
 This represents distributed 
reduction of data to information and knowledge.  The bar at the bottom represents 
data resources distributed over wide area networks, or data grids.
NCSU
Scientific Data Management Center
at NC State

About the Project(*)


[Main Menu]

Welcome to the NC State Scientific Data Management Center. A SciDAC Center of the Department of Energy

Managing scientific data has been identified as one of the most important emerging needs by the scientific community because of the sheer volume and increasing complexity of data being collected. Effectively generating, managing, and analyzing this information requires a comprehensive, end-to-end approach to data management that encompasses all of the stages from the initial data acquisition to the final analysis of the data. Based on the community input, we have identified three significant requirements. First, more efficient access to storage systems is needed. In particular, parallel file system improvements are needed to write and read large volumes of data without slowing a simulation, analysis, or visualization engine. Second, scientists require technologies to facilitate better understanding of their data, in particular the ability to effectively perform complex data analysis and searches over large data sets. Specialized feature discovery, parallel statistical analysis, and efficient indexing are needed before the data can be understood or visualized. Finally, generating the data, collecting and storing the results, data post-processing, and analysis of results is a tedious, fragmented process. Workflow tools for automation of this process in a robust, tractable, and recoverable fashion are required to enhance scientific exploration.

We have organized our activities in three layers that abstract the end-to-end data flow described above. We labeled the layers as Storage Efficient Access (SEA), Data Mining and Analytics (DMA), and Scientific Process Automation (SPA). The SEA layer is immediately on top of hardware, operating systems, file systems, and mass storage systems, and provides parallel data access technology and transparent access to archival storage. The DMA layer, which builds on the functionality of the SEA layer, consists of indexing, feature selection, and parallel statistical analysis technology. The SPA layer, which is on top of the DMA layer, provides the ability to compose scientific workflows from the components in the DMA layer as well as application specific modules. Figure 1 shows this organization and the components developed by the center and applied to various scientific applications.

The NC State teams is primarily concerned with Scientific Process Automation.

(*)This project is part of the Scientific Data Management Center funded by the U.S. Department of Energy, grant number DE-FC02-01ER25484 and ER25809, program: SciDAC. Further information about the SDM structure, goals and plans can be found at the main SDM site.


SDM Center (Version 1.0 /Last Modified 12/23/07 /access count: 2467/QoS: 84| 65 ) - Site/Content Notice

Copyright © 2001-2007 by North Carolina State University and others, All Rights Reserved.

Contact: M. Vouk, E-mail: vouk@csc.ncsu.edu , Tel: 919-515-7886, Fax: 919-515-6497, Department of Computer Science, Box 8206, North Carolina State University, Raleigh, NC27695, USA