Work Description

Title: Simple Physics CAM6 Dataset for Training Machine Learning Algorithms Open Access Deposited

h
Attribute Value
Methodology
  • This data is intended for use in developing machine learning emulators for simplified physical parameterizations within the Community Atmosphere Model, version 6. The data was generated on the Cheyenne machine, operated by the National Center for Atmospheric Research (NCAR) and machine learning processing, development, and testing were done on NCAR's Casper machine.

  • The three sets of data differ based on their respective physical parameterization scheme. The simplest scheme used is referred to as the dry scheme, based on Held-Suarez physics and is forced primarily by a temperature relaxations scheme. The moist data set is a moist version of the dry case, which begins with a similar temperature relaxation scheme, but adds simplified terms to represent sensible heat and latent heat fluxes at the surface, boundary layer mixing, and heating/cooling due to large-scale precipitation. Lastly, we use the same moist physics scheme, but couple an additional convection scheme to add a third layer of complexity.
Description
  • The data represents weekly output from three 60-year CAM6 model runs. The output includes state (.h0. files) and tendency (.h1. files) fields for three difference model configurations of increasing complexity. State fields include temperature, surface pressure, specific humidity, among others; while tendencies include temperature tendencies, specific humidity tendencies, as well as precipitation rates. Using the state variables at a given time step, machine learning techniques can be trained to predict the following tendency field, which can then be applied to the state variables to provide the state at the next physics time step of the model.
Creator
Depositor
  • glimon@umich.edu
Contact information
Discipline
Funding agency
  • National Science Foundation (NSF)
Keyword
Citations to related material
  • Limon, G. C., Jablonowski, C. (2022) Probing the Skill of Random Forest Emulators for Physical Parameterizations via a Hierarchy of Simple CAM6 Configurations [Preprint]. ESSOAr. https://10.1002/essoar.10512353.1
Resource type
Last modified
  • 11/20/2022
Published
  • 09/20/2022
Language
DOI
  • https://doi.org/10.7302/r6v3-s064
License
To Cite this Work:
Limon, G. C. (2022). Simple Physics CAM6 Dataset for Training Machine Learning Algorithms [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/r6v3-s064

Files (Count: 2; Size: 84.8 GB)

# Simple Physics CAM6 Data set for Training Machine Learning Algorithms

## Creator

Garrett Limon (glimon@umich.edu)

## Funding

NSF GRFP

## Research and Description

This is a collection of CAM6 output to be used for training machine learning emulators for physical parameterization schemes. Three configurations of physical parameterizations in CAM6 are included, the Dry Held Suarez (HS), Moist Held Suarez (TJ) and the TJ scheme with a Betts-Miller convection scheme coupled. Intent is to be used with machine learning techniques to emulate the tendency calculations. An example workflow that utilizes this data is provided here: https://github.com/gclimon/simplePhysicsML.

## Format

Directory is organized as follows.
Dry case: ml_data/HS_19L/
Moist case: ml_data/TJ_19L/
Convection case: ml_data/TJ_convection_19L/

Within each directory there is a collection of CAM6 output files. The 'h0' files indicate containing state variables (T, ps, q, etc.), while the 'h1' files contain the tendencies (dT/dt, dq/dt) and precipitation rates.

## Data Citation

Limon, G. C. (2022) Simple Physics CAM6 Dataset for Training Machine Learning Algorithms [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/r6v3-s064

## Associated Work

Limon, G. C., Jablonowski, C. (2022) Probing the Skill of Random Forest Emulators for Physical Parameterizations via a Hierarchy of Simple CAM6 Configurations [Preprint], ESSOAr. https://10.1002/essoar.10512353.1

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 84.8 GB is too large to download directly. Consider using Globus (see below).

Files are ready   Download Data from Globus
Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.