Server‐side workflow execution using data grid technology for reproducible analyses of data‐intensive hydrologic systems
Essawy, Bakinam T.; Goodall, Jonathan L.; Xu, Hao; Rajasekar, Arcot; Myers, James D.; Kugler, Tracy A.; Billah, Mirza M.; Whitton, Mary C.; Moore, Reagan W.
2016-04
Citation
Essawy, Bakinam T.; Goodall, Jonathan L.; Xu, Hao; Rajasekar, Arcot; Myers, James D.; Kugler, Tracy A.; Billah, Mirza M.; Whitton, Mary C.; Moore, Reagan W. (2016). "Server‐side workflow execution using data grid technology for reproducible analyses of data‐intensive hydrologic systems." Earth and Space Science 3(4): 163-175.
Abstract
Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule‐Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community‐driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment‐Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.Plain Language SummaryExecuting computational workflows in the geosciences can be challenging, especially when dealing with large, distributed, and heterogeneous data sets and computational tools. We present a methodology for addressing this challenge using the Integrated Rule‐Oriented Data System (iRODS) Workflow Structured Object (WSO). We demonstrate the approach through an end‐to‐end application of data access, processing, and publication of digital assets for a scientific study analyzing drought in the Carolinas region of the United States.Key PointsReproducibility of data‐intensive analyses remains a significant challengeData grids are useful for reproducibility of workflows requiring large, distributed data setsData and computations should be co‐located on servers to create executable Web‐resourcesPublisher
Wiley Periodicals, Inc. Univ. of Minnesota
ISSN
2333-5084 2333-5084
Other DOIs
Types
Article
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.