Server‐side workflow execution using data grid technology for reproducible analyses of data‐intensive hydrologic systems
dc.contributor.author | Essawy, Bakinam T. | |
dc.contributor.author | Goodall, Jonathan L. | |
dc.contributor.author | Xu, Hao | |
dc.contributor.author | Rajasekar, Arcot | |
dc.contributor.author | Myers, James D. | |
dc.contributor.author | Kugler, Tracy A. | |
dc.contributor.author | Billah, Mirza M. | |
dc.contributor.author | Whitton, Mary C. | |
dc.contributor.author | Moore, Reagan W. | |
dc.date.accessioned | 2017-06-16T20:14:57Z | |
dc.date.available | 2017-06-16T20:14:57Z | |
dc.date.issued | 2016-04 | |
dc.identifier.citation | Essawy, Bakinam T.; Goodall, Jonathan L.; Xu, Hao; Rajasekar, Arcot; Myers, James D.; Kugler, Tracy A.; Billah, Mirza M.; Whitton, Mary C.; Moore, Reagan W. (2016). "Server‐side workflow execution using data grid technology for reproducible analyses of data‐intensive hydrologic systems." Earth and Space Science 3(4): 163-175. | |
dc.identifier.issn | 2333-5084 | |
dc.identifier.issn | 2333-5084 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/137520 | |
dc.description.abstract | Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule‐Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community‐driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment‐Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.Plain Language SummaryExecuting computational workflows in the geosciences can be challenging, especially when dealing with large, distributed, and heterogeneous data sets and computational tools. We present a methodology for addressing this challenge using the Integrated Rule‐Oriented Data System (iRODS) Workflow Structured Object (WSO). We demonstrate the approach through an end‐to‐end application of data access, processing, and publication of digital assets for a scientific study analyzing drought in the Carolinas region of the United States.Key PointsReproducibility of data‐intensive analyses remains a significant challengeData grids are useful for reproducibility of workflows requiring large, distributed data setsData and computations should be co‐located on servers to create executable Web‐resources | |
dc.publisher | Wiley Periodicals, Inc. | |
dc.publisher | Univ. of Minnesota | |
dc.subject.other | workflows | |
dc.subject.other | iRODS | |
dc.subject.other | hydrologic modeling | |
dc.subject.other | federation | |
dc.subject.other | reproducibility | |
dc.title | Server‐side workflow execution using data grid technology for reproducible analyses of data‐intensive hydrologic systems | |
dc.type | Article | en_US |
dc.rights.robots | IndexNoFollow | |
dc.subject.hlbsecondlevel | Geological Sciences | |
dc.subject.hlbsecondlevel | Space Sciences | |
dc.subject.hlbsecondlevel | Atmospheric and Oceanic Sciences | |
dc.subject.hlbtoplevel | Science | |
dc.description.peerreviewed | Peer Reviewed | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/137520/1/ess271_am.pdf | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/137520/2/ess271.pdf | |
dc.identifier.doi | 10.1002/2015EA000139 | |
dc.identifier.source | Earth and Space Science | |
dc.identifier.citedreference | Oinn, T., et al. ( 2004 ), Taverna: A tool for the composition and enactment of bioinformatics workflows, Bioinformatics, 20, 3045 – 3054, doi: 10.1093/bioinformatics/bth361. | |
dc.identifier.citedreference | Altintas, I., C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, and S. Mock ( 2004 ), Kepler: An extensible system for design and execution of scientific workflows, 16th International Conference On. IEEE, 2004. pp. 423–424. doi: 10.1109/SSDM.2004.1311241. | |
dc.identifier.citedreference | Amazon EC2 Instances [WWW Document] ( 2015 ), [Available at http://aws.amazon.com/ec2/instance‐types/, accessed 6.7.15. ] | |
dc.identifier.citedreference | Anderson, S. P., R. C. Bales, and C. J. Duffy ( 2008 ), Critical zone observatories: Building a network to advance interdisciplinary study of Earth surface processes, Mineral. Mag., 72, 7 – 10, doi: 10.1180/minmag.2008.072.1.7. | |
dc.identifier.citedreference | Billah, M. M., J. L. Goodall, U. Narayan, J. T. Reager, V. Lakshmi, and J. S. Famiglietti ( 2015 ), A methodology for evaluating evapotranspiration estimates at the watershed‐scale using GRACE, J. Hydrol., 523, 574 – 586, doi: 10.1016/j.jhydrol.2015.01.066. | |
dc.identifier.citedreference | Cornillon, P., J. Gallagher, and T. Sgouros ( 2003 ), OPeNDAP: Accessing data in a distributed, heterogeneous environment, Data Sci. J., 2, 164 – 174, doi: 10.2481/dsj.2.164. | |
dc.identifier.citedreference | Cowles, T., J. Delaney, J. Orcutt, and R. Weller ( 2010 ), The Ocean Observatories Initiative: Sustained ocean observing across a range of spatial scales, Mar. Technol. Soc. J., 44 ( 6 ), 54 – 64, doi: 10.4031/MTSJ.44.6.21. | |
dc.identifier.citedreference | De Roure, D., C. Goble, and R. Stevens ( 2009 ), The design and realisation of the virtual research environment for social sharing of workflows, Futur. Gener. Comput. Syst., 25, 561 – 567, doi: 10.1016/j.future.2008.06.010. | |
dc.identifier.citedreference | Deelman, E., G. Singh, M. Su, J. Blythe, Y. Gil, and C. Kesselman ( 2005 ), Pegasus: A framework for mapping complex scientific workflows onto distributed systems, Sci. Program., 13, 219 – 237. | |
dc.identifier.citedreference | Dunlap, R., L. Mark, S. Rugaber, V. Balaji, J. Chastang, L. Cinquini, C. DeLuca, D. Middleton, and S. Murphy ( 2008 ), Earth system curator: Metadata infrastructure for climate modeling, Earth Sci. Inf., 1, 131 – 149, doi: 10.1007/s12145-008-0016-1. | |
dc.identifier.citedreference | Foster, I. ( 2011 ), Globus online: Accelerating and democratizing science through cloud‐based services, IEEE Comput. Soc., 15, 70 – 73, doi: 10.1109/MIC.2011.64. | |
dc.identifier.citedreference | Harrison, A., et al ( 2008 ), WS‐RF workflow in Triana, Int. J. High Perform. Comput. Appl., 22, 268 – 283, doi: 10.1177/1094342007086226. | |
dc.identifier.citedreference | Horsburgh, J. S., M. M. Morsy, A. M. Castronova, J. L. Goodall, T. Gan, H. Yi, M. J. Stealey, and D. G. Tarboton ( 2015 ), Hydroshare: Sharing diverse environmental data types and models as social objects with application to the hydrology domain, J. Am. Water Resour. Assoc., doi: 10.1111/1752-1688.12363. | |
dc.identifier.citedreference | Introduction to Workflow as Objects [WWW Document] ( 2012 ), [Available at https://wiki.irods.org/index.php/Introduction_to_Workflow_as_Objects, accessed 6.7.2015. ] | |
dc.identifier.citedreference | Keller, M., D. S. Schimel, W. W. Hargrove, and F. M. Hoffman ( 2008 ), A continental strategy for the National Ecological Observatory Network, Front. Ecol. Environ., 6, 282 – 284, doi: 10.1890/1540-9295(2008)6[282:ACSFTN]2.0.CO;2. | |
dc.identifier.citedreference | Kyriazis, D., K. Tserpes, G. Kousiouris, A. Menychtas, and T. Varvarigou ( 2008 ), Data aggregation and analysis: A grid‐based approach for medicine and biology. Int. Symp. on. IEEE 841–848. | |
dc.identifier.citedreference | Liang, X., and D. P. Lettenmaier ( 1994 ), A simple hydrologically based model of land surface water and energy fluxes for general circulation models, J. Geophys. Res., 99, 14,415 – 14,428, doi: 10.1029/94JD00483. | |
dc.identifier.citedreference | Maidment, D. R. ( 2008 ), Bringing water data together, J. Water Resour. Plann. Manage., 134, 95 – 96. | |
dc.identifier.citedreference | Michener, W. K., S. Allard, A. Budden, R. B. Cook, K. Douglass, M. Frame, S. Kelling, R. Koskela, C. Tenopir, and D. A. Vieglais ( 2012 ), Participatory design of DataONE—Enabling cyberinfrastructure for the biological and environmental sciences, Ecol. Inf., 11, 5 – 15, doi: 10.1016/j.ecoinf.2011.08.007. | |
dc.identifier.citedreference | Minnesota Population Center ( 2013 ), Terra Populus: Beta Version [Machine‐Readable Database], Univ. of Minnesota, Minneapolis. | |
dc.identifier.citedreference | Morsy, M. M., J. L. Goodall, C. Bandaragoda, A. M. Castronova, and J. Greenberg ( 2014 ), Metadata for describing water models International Environmental Modelling and Software Society (iEMSs) 7th International Congress on Environmental Modelling and Software, doi:10.13140/2.1.1314.6561. | |
dc.identifier.citedreference | Myers, J., et al. ( 2015 ), Towards sustainable curation and preservation: The SEAD Project’s data services approach. Proc. IEEE 11th Int. e‐Science Conf. Munich, Ger., doi: 10.1109/eScience.2015.56. | |
dc.identifier.citedreference | Rajasekar, A. ( 2014 ), Workflows [WWW document]. 6th Annu. iRODS User Gr. Meet. June 2014 Inst. Quant. Soc. Sci. MA. [Available at http://irods.org/wp‐content/uploads/2014/06/Workflows‐iRUGM‐2014.pdf, accessed 8.12.15.] | |
dc.identifier.citedreference | Rajasekar, A., et al. ( 2010 ), iRODS Primer: Integrated rule‐oriented data system. Synthesis lectures on information concepts, retrieval, and services. doi: 10.2200/S00233ED1V01Y200912ICR012. | |
dc.identifier.citedreference | Tarboton, D. G., et al. ( 2014 ), HydroShare: Advancing collaboration through hydrologic data and model sharing, in International Environmental Modelling and Software Society (iEMSs) 7th International Congress on Environmental Modelling and Software, San Diego, Calif., edited by D. P. Ames, N. W. T. Quinn, and A. E. Rizzoli, doi:978‐88‐9035‐744‐2. | |
dc.identifier.citedreference | Vahi, K., et al. ( 2013 ), A general approach to real‐time workflow monitoring. In High Performance Computing, Networking, Storage and Analysis (SCC). pp. 108–118. doi: 10.1109/SC.Companion.2012.26. | |
dc.identifier.citedreference | Weise, A., M. Wan, W. Schroeder, and A. Hasan ( 2008 ), Managing groups of files in a Rule Oriented Data Management System (iRODS), Comput. Sci., 5103, 321 – 330, doi: 10.1007/978-3-540-69389-5_37. | |
dc.identifier.citedreference | Williams, D. N., et al. ( 2008 ), Data management and analysis for the Earth System Grid, J. Phys. Conf. Ser., 125, 012072, doi: 10.1088/1742-6596/125/1/012072. | |
dc.identifier.citedreference | Williams, D. N., B. N. Lawrence, M. Lautenschlager, D. Middleton, and V. Balaji ( 2011 ), The Earth System Grid Federation: Delivering globally accessible petascale data for CMIP5, Proceedings of the 32nd Asia‐Pacific Advanced Network Meeting. pp. 121–130. doi: 10.7125/APAN.32.15. | |
dc.identifier.citedreference | Workflow Objects (WSO) [WWW Document] ( 2013, [Available at https://wiki.irods.org/index.php/Workflow_Objects_ (WSO), accessed 6.7.15).] | |
dc.identifier.citedreference | Acharya, A., M. Uysal, and J. Saltz ( 1998 ), Active disks: Programming model, algorithms and evaluation, ACM SIGPLAN Not., 33, 81 – 91, doi: 10.1145/291006.291026. | |
dc.identifier.citedreference | Allcock, B., J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke ( 2002 ), Data management and transfer in high‐performance computational grid environments, Parallel Comput., 28, 749 – 771, doi: 10.1016/S0167-8191(02)00094-7. | |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.