Show simple item record

Server‐side workflow execution using data grid technology for reproducible analyses of data‐intensive hydrologic systems

dc.contributor.authorEssawy, Bakinam T.
dc.contributor.authorGoodall, Jonathan L.
dc.contributor.authorXu, Hao
dc.contributor.authorRajasekar, Arcot
dc.contributor.authorMyers, James D.
dc.contributor.authorKugler, Tracy A.
dc.contributor.authorBillah, Mirza M.
dc.contributor.authorWhitton, Mary C.
dc.contributor.authorMoore, Reagan W.
dc.date.accessioned2017-06-16T20:14:57Z
dc.date.available2017-06-16T20:14:57Z
dc.date.issued2016-04
dc.identifier.citationEssawy, Bakinam T.; Goodall, Jonathan L.; Xu, Hao; Rajasekar, Arcot; Myers, James D.; Kugler, Tracy A.; Billah, Mirza M.; Whitton, Mary C.; Moore, Reagan W. (2016). "Server‐side workflow execution using data grid technology for reproducible analyses of data‐intensive hydrologic systems." Earth and Space Science 3(4): 163-175.
dc.identifier.issn2333-5084
dc.identifier.issn2333-5084
dc.identifier.urihttps://hdl.handle.net/2027.42/137520
dc.description.abstractMany geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule‐Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community‐driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment‐Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.Plain Language SummaryExecuting computational workflows in the geosciences can be challenging, especially when dealing with large, distributed, and heterogeneous data sets and computational tools. We present a methodology for addressing this challenge using the Integrated Rule‐Oriented Data System (iRODS) Workflow Structured Object (WSO). We demonstrate the approach through an end‐to‐end application of data access, processing, and publication of digital assets for a scientific study analyzing drought in the Carolinas region of the United States.Key PointsReproducibility of data‐intensive analyses remains a significant challengeData grids are useful for reproducibility of workflows requiring large, distributed data setsData and computations should be co‐located on servers to create executable Web‐resources
dc.publisherWiley Periodicals, Inc.
dc.publisherUniv. of Minnesota
dc.subject.otherworkflows
dc.subject.otheriRODS
dc.subject.otherhydrologic modeling
dc.subject.otherfederation
dc.subject.otherreproducibility
dc.titleServer‐side workflow execution using data grid technology for reproducible analyses of data‐intensive hydrologic systems
dc.typeArticleen_US
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelGeological Sciences
dc.subject.hlbsecondlevelSpace Sciences
dc.subject.hlbsecondlevelAtmospheric and Oceanic Sciences
dc.subject.hlbtoplevelScience
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/137520/1/ess271_am.pdf
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/137520/2/ess271.pdf
dc.identifier.doi10.1002/2015EA000139
dc.identifier.sourceEarth and Space Science
dc.identifier.citedreferenceOinn, T., et al. ( 2004 ), Taverna: A tool for the composition and enactment of bioinformatics workflows, Bioinformatics, 20, 3045 – 3054, doi: 10.1093/bioinformatics/bth361.
dc.identifier.citedreferenceAltintas, I., C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, and S. Mock ( 2004 ), Kepler: An extensible system for design and execution of scientific workflows, 16th International Conference On. IEEE, 2004. pp. 423–424. doi: 10.1109/SSDM.2004.1311241.
dc.identifier.citedreferenceAmazon EC2 Instances [WWW Document] ( 2015 ), [Available at http://aws.amazon.com/ec2/instance‐types/, accessed 6.7.15. ]
dc.identifier.citedreferenceAnderson, S. P., R. C. Bales, and C. J. Duffy ( 2008 ), Critical zone observatories: Building a network to advance interdisciplinary study of Earth surface processes, Mineral. Mag., 72, 7 – 10, doi: 10.1180/minmag.2008.072.1.7.
dc.identifier.citedreferenceBillah, M. M., J. L. Goodall, U. Narayan, J. T. Reager, V. Lakshmi, and J. S. Famiglietti ( 2015 ), A methodology for evaluating evapotranspiration estimates at the watershed‐scale using GRACE, J. Hydrol., 523, 574 – 586, doi: 10.1016/j.jhydrol.2015.01.066.
dc.identifier.citedreferenceCornillon, P., J. Gallagher, and T. Sgouros ( 2003 ), OPeNDAP: Accessing data in a distributed, heterogeneous environment, Data Sci. J., 2, 164 – 174, doi: 10.2481/dsj.2.164.
dc.identifier.citedreferenceCowles, T., J. Delaney, J. Orcutt, and R. Weller ( 2010 ), The Ocean Observatories Initiative: Sustained ocean observing across a range of spatial scales, Mar. Technol. Soc. J., 44 ( 6 ), 54 – 64, doi: 10.4031/MTSJ.44.6.21.
dc.identifier.citedreferenceDe Roure, D., C. Goble, and R. Stevens ( 2009 ), The design and realisation of the virtual research environment for social sharing of workflows, Futur. Gener. Comput. Syst., 25, 561 – 567, doi: 10.1016/j.future.2008.06.010.
dc.identifier.citedreferenceDeelman, E., G. Singh, M. Su, J. Blythe, Y. Gil, and C. Kesselman ( 2005 ), Pegasus: A framework for mapping complex scientific workflows onto distributed systems, Sci. Program., 13, 219 – 237.
dc.identifier.citedreferenceDunlap, R., L. Mark, S. Rugaber, V. Balaji, J. Chastang, L. Cinquini, C. DeLuca, D. Middleton, and S. Murphy ( 2008 ), Earth system curator: Metadata infrastructure for climate modeling, Earth Sci. Inf., 1, 131 – 149, doi: 10.1007/s12145-008-0016-1.
dc.identifier.citedreferenceFoster, I. ( 2011 ), Globus online: Accelerating and democratizing science through cloud‐based services, IEEE Comput. Soc., 15, 70 – 73, doi: 10.1109/MIC.2011.64.
dc.identifier.citedreferenceHarrison, A., et al ( 2008 ), WS‐RF workflow in Triana, Int. J. High Perform. Comput. Appl., 22, 268 – 283, doi: 10.1177/1094342007086226.
dc.identifier.citedreferenceHorsburgh, J. S., M. M. Morsy, A. M. Castronova, J. L. Goodall, T. Gan, H. Yi, M. J. Stealey, and D. G. Tarboton ( 2015 ), Hydroshare: Sharing diverse environmental data types and models as social objects with application to the hydrology domain, J. Am. Water Resour. Assoc., doi: 10.1111/1752-1688.12363.
dc.identifier.citedreferenceIntroduction to Workflow as Objects [WWW Document] ( 2012 ), [Available at https://wiki.irods.org/index.php/Introduction_to_Workflow_as_Objects, accessed 6.7.2015. ]
dc.identifier.citedreferenceKeller, M., D. S. Schimel, W. W. Hargrove, and F. M. Hoffman ( 2008 ), A continental strategy for the National Ecological Observatory Network, Front. Ecol. Environ., 6, 282 – 284, doi: 10.1890/1540-9295(2008)6[282:ACSFTN]2.0.CO;2.
dc.identifier.citedreferenceKyriazis, D., K. Tserpes, G. Kousiouris, A. Menychtas, and T. Varvarigou ( 2008 ), Data aggregation and analysis: A grid‐based approach for medicine and biology. Int. Symp. on. IEEE 841–848.
dc.identifier.citedreferenceLiang, X., and D. P. Lettenmaier ( 1994 ), A simple hydrologically based model of land surface water and energy fluxes for general circulation models, J. Geophys. Res., 99, 14,415 – 14,428, doi: 10.1029/94JD00483.
dc.identifier.citedreferenceMaidment, D. R. ( 2008 ), Bringing water data together, J. Water Resour. Plann. Manage., 134, 95 – 96.
dc.identifier.citedreferenceMichener, W. K., S. Allard, A. Budden, R. B. Cook, K. Douglass, M. Frame, S. Kelling, R. Koskela, C. Tenopir, and D. A. Vieglais ( 2012 ), Participatory design of DataONE—Enabling cyberinfrastructure for the biological and environmental sciences, Ecol. Inf., 11, 5 – 15, doi: 10.1016/j.ecoinf.2011.08.007.
dc.identifier.citedreferenceMinnesota Population Center ( 2013 ), Terra Populus: Beta Version [Machine‐Readable Database], Univ. of Minnesota, Minneapolis.
dc.identifier.citedreferenceMorsy, M. M., J. L. Goodall, C. Bandaragoda, A. M. Castronova, and J. Greenberg ( 2014 ), Metadata for describing water models International Environmental Modelling and Software Society (iEMSs) 7th International Congress on Environmental Modelling and Software, doi:10.13140/2.1.1314.6561.
dc.identifier.citedreferenceMyers, J., et al. ( 2015 ), Towards sustainable curation and preservation: The SEAD Project’s data services approach. Proc. IEEE 11th Int. e‐Science Conf. Munich, Ger., doi: 10.1109/eScience.2015.56.
dc.identifier.citedreferenceRajasekar, A. ( 2014 ), Workflows [WWW document]. 6th Annu. iRODS User Gr. Meet. June 2014 Inst. Quant. Soc. Sci. MA. [Available at http://irods.org/wp‐content/uploads/2014/06/Workflows‐iRUGM‐2014.pdf, accessed 8.12.15.]
dc.identifier.citedreferenceRajasekar, A., et al. ( 2010 ), iRODS Primer: Integrated rule‐oriented data system. Synthesis lectures on information concepts, retrieval, and services. doi: 10.2200/S00233ED1V01Y200912ICR012.
dc.identifier.citedreferenceTarboton, D. G., et al. ( 2014 ), HydroShare: Advancing collaboration through hydrologic data and model sharing, in International Environmental Modelling and Software Society (iEMSs) 7th International Congress on Environmental Modelling and Software, San Diego, Calif., edited by D. P. Ames, N. W. T. Quinn, and A. E. Rizzoli, doi:978‐88‐9035‐744‐2.
dc.identifier.citedreferenceVahi, K., et al. ( 2013 ), A general approach to real‐time workflow monitoring. In High Performance Computing, Networking, Storage and Analysis (SCC). pp. 108–118. doi: 10.1109/SC.Companion.2012.26.
dc.identifier.citedreferenceWeise, A., M. Wan, W. Schroeder, and A. Hasan ( 2008 ), Managing groups of files in a Rule Oriented Data Management System (iRODS), Comput. Sci., 5103, 321 – 330, doi: 10.1007/978-3-540-69389-5_37.
dc.identifier.citedreferenceWilliams, D. N., et al. ( 2008 ), Data management and analysis for the Earth System Grid, J. Phys. Conf. Ser., 125, 012072, doi: 10.1088/1742-6596/125/1/012072.
dc.identifier.citedreferenceWilliams, D. N., B. N. Lawrence, M. Lautenschlager, D. Middleton, and V. Balaji ( 2011 ), The Earth System Grid Federation: Delivering globally accessible petascale data for CMIP5, Proceedings of the 32nd Asia‐Pacific Advanced Network Meeting. pp. 121–130. doi: 10.7125/APAN.32.15.
dc.identifier.citedreferenceWorkflow Objects (WSO) [WWW Document] ( 2013, [Available at https://wiki.irods.org/index.php/Workflow_Objects_ (WSO), accessed 6.7.15).]
dc.identifier.citedreferenceAcharya, A., M. Uysal, and J. Saltz ( 1998 ), Active disks: Programming model, algorithms and evaluation, ACM SIGPLAN Not., 33, 81 – 91, doi: 10.1145/291006.291026.
dc.identifier.citedreferenceAllcock, B., J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke ( 2002 ), Data management and transfer in high‐performance computational grid environments, Parallel Comput., 28, 749 – 771, doi: 10.1016/S0167-8191(02)00094-7.
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.