Work Description

Title: Big Ship Data: Pre- and Post-Processed Spatiotemporal Data for 2006-2014 for Great Lakes Air Temperature, Dew Point, Surface Water Temperature, and Wind Speed Open Access Deposited

There are no ingest append scripts associated with this work.
Attribute Value
  • Pre-Processed data was acquired from the NDFD for air temperature, dew point, and wind speed, and from the GLCFS for lake surface temperature. This data was re-gridded to be on a 0.1-degree grid and at hourly resolution (3-hourly for lake temperature). Ship Observations were acquired from NOAA's Volunteer Observing Ships program. Using Gaussian Process Regression, the ship observations were assimilated into the modeled data to generate new spatiotemporal estimates of the 4 study variables. Additionally, uncertainty estimates were provided
  • This data is in support of the WRR paper by Fries and Kerkez: Big Ship Data: Using Vessel Measurements to Improve Estimates of Temperature and Wind Speed on the Great Lakes Code is also provided
Contact information
Resource type
Last modified
  • 03/22/2022
  • 03/30/2017
To Cite this Work:
Fries, K. J. (2017). Big Ship Data: Pre- and Post-Processed Spatiotemporal Data for 2006-2014 for Great Lakes Air Temperature, Dew Point, Surface Water Temperature, and Wind Speed [Data set], University of Michigan - Deep Blue Data.


This work is not a member of any user collections.

Files (Count: 174; Size: 12.2 GB)

This is a repository for the results discussed in 'Using Vessel Measurements to Improve Hydrometeorological Estimates Across Large Water Systems: Big Ship Data on the Great Lakes'
IMPORTANT: The GPML toolbox for MATLAB/Octave is required to run any of the .m files here. Please download this toolbox at

There are two different types of .mat files. The first type are simply the co-located ship and model data, which is all the data necessary to run the Gaussian Process Regression code.
Each file is a mat file with naming ‘pre_lakeyear.mat' with Lat,Lon,Time,Value vectors for Air Temperature, Dew Point, and Wind Speed and then ‘pre_lakeSSTyear.mat' with Lat,Lon,Time,SST vectors for surface temperature.
The model data is retrieved from NOAA GLERL's public repository and is resampled to be on a 1/10-degree grid to match the resolution of the ship reports
The ship reports are also retrieved from NOAA GLERL, though to get data going back further than the current calendar year you must contact Greg Lang (

The second type of file has 1/10-degree hourly data for both the GLCFS (initial estimate) and the GP regressed data, with 3-hourly data for SST.
Each file is a mat file with naming ‘post_lakeyear.mat' with Lat,Lon, and Time vectors and then matrices (Lat/Lon x Time in size) for the inputs to the GP regression (GLCFS_variable) and the estimates from the regression (GP_variable) as well as the variance in these estimates (s2_variable)

-Dew point, air temperature, and surface temperature are in degrees C
-Wind speeds are in m/s
-Time is in MATLAB datenum format (days since 0-Jan-0000)
-Latitude is in degrees N, Longitude in degrees E (i.e. negative values)

'covSEard_nosf.m' is the squared exponential kernel from the GPML toolbox, modified so that there is no scale factor applied to the kernel. This makes comparison between model runs easier. Place this file in the 'cov' folder of your GPML toolbox to be able to use it

'example_GP_script.m' demonstrates how to use the data in the 'collocated ship and model data' folder to learn a Gaussian Process Regression model
'generate_estimates_example.m' demonstrates how to use a learned model to get updated estimates as well as uncertainty estimates

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 12.2 GB is too large to download directly. Consider using Globus (see below).

Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus