Data associated with the manuscript “Influence of vertical heterogeneities in the canopy microenvironment on inter-annual variability of carbon uptake in temperate deciduous forests”, published in Journal of Geophysical Research – Biogeosciences (Wozniak et al., in press 2020). The data from this manuscript are organized into the data generated from simulations by 3 different flavors of the Community Land Model (CLM), as well as the restructured FLUXNET site data used for model evaluation. In all file folders, the data is further split into site and model experiment name. The following provides a general description of what to expect from the data files contained. In folder CLM4.5, the data are in NetCDF format (https://www.unidata.ucar.edu/software/netcdf/). Each file is labeled by the actual year to which it corresponds. Variables that are time-variant are hourly, even though their units are given as “days since” a specified reference time. You will find a useful script for reading this data (a MATLAB *.m script) in the code folder Code/Read/read_clm45_data. The full metadata for the variables and the datasets, themselves, are self-contained as is standard for NetCDF format. In folder CLM5, the data are in NetCDF format (https://www.unidata.ucar.edu/software/netcdf/). The subfolders here are labeled by site name AND model experiment, where *_bberry corresponds to the “CLM5-noPHS” experiment, and *_medlyn corresponds to the “CLM5-PHS” experiment. Each data file (*h0*) is labeled by the relative year to which it corresponds (year since beginning of simulation). Variables that are time-variant are hourly, even though their units are given as “days since” a specified reference time. Users will have to use the time variable in coordination with its units to determine the timestamp for each data point. You will find a useful script for reading this data (a MATLAB *.m script) in the code folder Code/Read/read_clm5_data, which accounts for the timestamping already. The full metadata for the variables and the datasets, themselves, are self-contained as is standard for NetCDF format. Files other than the *h0* label (*r* and *rs*) files are model restart-related files and are left for advanced users familiar with the Community Land Model version 5, but are not needed to explore data directly related to the manuscript. In folder CLM-ml, the data are comma separated value (CSV) format. Each file is labeled by the year to which it corresponds. The data are column-order (each column corresponds to a particular variable). The *flux* files contain scalar variables like GPP and canopy temperature, in which each row corresponds to a timestamp (hourly). The *profile* files contain variables of one spatial dimension (the vertical), and every row corresponds first to a level within the vertical, and secondly to a timestamp. For example, if there are 45 vertical levels, the first 45 rows would pertain to each of the 45 levels at the 1st simulation timestamp, rows 46-90 would pertain to each of the 45 vertical levels at the 2nd timestamp, and so on. You will find a useful script for reading this data (a MATLAB *.m script) in the code folder Code/Read/read_CLMml_output_sunShade. In this script, all variables are defined and indexed by their proper positions within the *.csv files. For details about the simulations used to create this data, please see the methods described in the full manuscript.