CHANGES Project - Lake Summary Curated Data

King, Katelyn; Schell, Justin; Alofs, Karen; Thomer, Andrea; Wehrly, Kevin; Lenard, Michael; Lopez-Fernandez, Hernan

Work Description

Title: CHANGES Project - Lake Summary Curated Data Open Access Deposited

Attribute	Value
Methodology	Michigan Department of Natural Resources (MDNR) historically collected lake survey data on index cards. We used the Zooniverse crowdsourcing platform for volunteer transcription of these records using various workflows that captured different data. To be included in the dataset, each card was transcribed by three or more volunteers. Zooniverse transcriptions require significant cleaning and curation before the data is in a usable format. We used code to aggregate the transcribed data from each person in order to provide a consensus-based “final answer” and confidence score for each data field, based on how well entries from the different volunteers matched. We then standardized data using techniques such as changing all text to lowercase, trimming excess whitespace, and converting fractions to decimals. We separated numeric and alphabetic values into different data columns. Finally, we standardized units for each variable into a single unit, and when applicable, transformed to metric units (e.g. inches to millimeters). We checked data numeric values by plotting, identifying outliers, and reviewing the original document. In order to combine multiple sampling events for one lake or connect the transcribed data to more contemporary survey data from the MDNR, we matched the records with the corresponding MDNR unique lake identifiers. The transcribed data included each lake’s name, county, and in some instances geographic reference data in the form of Township, Range, and Section from the United States Public Land Survey System (TRS). We joined data entries on lake names, counties, and TRS when available. Remaining lakes that were unmatched due to issues like lakes crossing county lines or changing names over time, were manually matched to data using experts from the research team. Finally, we were unable to match some of the historical data due to insufficient geographic information.
Description	Michigan lakes are an important resource, however, their ecosystems are declining and projected to continue to face further impacts under future land use and climate change. Understanding how lake ecosystems respond to environmental stressors and management actions is critical for identifying resilient lakes and developing adaptation strategies. However, the ability to manage lakes is hampered by a lack of historical information. Historical lake data in Michigan were originally archived as index cards at the Michigan Department of Natural Resources. All of the images of these cards are stored in this collection, Collections, Heterogeneous data, and Next Generation Ecological Studies (CHANGES) - Michigan Lake Surveys, and the images for this specific dataset are stored in the CHANGES Project- Lake Summary (SUMM) dataset. The CHANGES project used a crowd sourcing platform called Zooniverse to transcribe at least basic information (i.e. dates, collected by) from all of these cards. Some of the card types, such as the one in this dataset, were prioritized to transcribe to produce a usable (i.e. machine-readable, uniform, and standardized) historical dataset. Lake summary cards that we transcribed and curated include habitat information for a lake as well as observed fish species (summ_data.csv). These variables include anthropogenic lake characteristics such as fishing intensity, shoreline structures, and dams; lake morphometric characteristics like depth and area; as well as in situ measures of temperature, dissolved oxygen, and Secchi depth. Many of the characteristics were listed as a range, and therefore, have a column for minimum and maximum in the data file (e.g. temp_surface_min_c and temp_surface_max_c). In addition, the lake summary cards listed the fish species present, so the csv file includes columns with the fish species common name (summ_species_table) and corresponding values are either a ‘1’ representing presence of a species or ‘0’ representing absence. For a full description of all the fields of this data table see summ_datadictionary.
Creator	King, Katelyn ; Schell, Justin; Alofs, Karen; Thomer, Andrea; Wehrly, Kevin; Lenard, Michael ; and Lopez-Fernandez, Hernan
Depositor	[email protected]
Contact information	[email protected]
Discipline	Science
Funding agency	Other Funding Agency
Other Funding agency	Michigan Institute For Data & AI In Society (MIDAS) Propelling Original Data Science Grant
Keyword	lake fish Secchi temperature nutrients oxygen shoreline habitat dams lake depth lake area fishing intensity
Date coverage	1926 to 1995
Citations to related material	King, K.B.S., Schell, J, Wehrly, K.E., Lenard, M., Singer, R., López-Fernández, H., Thomer, A.K., & Alofs, K.M. Community science helps digitize 78 years of fish and habitat data for thousands of lakes in Michigan, USA. under review
Resource type	Dataset
Last modified	05/05/2025
Published	05/05/2025
Language	English
DOI	https://doi.org/10.7302/72e8-ka38
License	http://creativecommons.org/publicdomain/zero/1.0/

To Cite this Work:
King, K., Schell, J., Alofs, K., Thomer, A., Wehrly, K., Lenard, M., Lopez-Fernandez, H. (2025). CHANGES Project - Lake Summary Curated Data [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/72e8-ka38

Relationships

In Collection:

Collections, Heterogeneous data, and Next Generation Ecological Studies (CHANGES) - Michigan Lake Surveys

Files (Count: 4; Size: 818 KB)

Title	Original Upload	Last Modified	File Size	Access	Actions
summ_data.csv	2025-05-05	2025-05-05	799 KB	Open Access	View Details Download
summ_species_table.csv	2025-05-05	2025-05-05	1.77 KB	Open Access	View Details Download
_summ_data_Readme.txt	2025-05-05	2025-05-05	4.24 KB	Open Access	View Details Download
summ_datadictionary.csv	2025-05-05	2025-05-05	13 KB	Open Access	View Details Download

Date: 05 Feb, 2025

Dataset Title: CHANGES Project - Lake Summary Curated Data

Dataset Creators: King, Katelyn; Schell, Justin; Alofs, Karen; Thomer, Andrea; Wehrly, Kevin; Lenard, Michael; and Lopez-Fernandez, Hernan

Dataset Contact: Katelyn King [email protected]

Funding: Michigan Institute For Data & AI In Society (MIDAS) Propelling Original Data Science Grant

Research Overview:
Archives at the Institute for Fisheries Research (IFR) hold records of thousands of lake surveys from the University of Michigan and Michigan Department of Natural Resources.

Lake summary cards that we transcribed and curated include habitat information for a lake as well as observed fish species (summ_data.csv). These variables include anthropogenic lake characteristics such as fishing intensity, shoreline structures, and dams; lake morphometric characteristics like depth and area; as well as in situ measures of temperature, dissolved oxygen, and Secchi depth. Many of the characteristics were listed as a range, and therefore, have a column for minimum and maximum in the data file (e.g. temp_surface_min_c and temp_surface_max_c). In addition, the lake summary cards listed the fish species present, so the csv file includes columns with the fish species common name (summ_species_table.csv) and corresponding values are either a ‘1’ representing presence of a species or ‘0’ representing absence. For a full description of all the fields of this data table see summ_datadictionary.csv.

Methodology:
Michigan Department of Natural Resources historically collected lake survey data on index cards. We used the Zooniverse crowdsourcing platform for volunteer transcription of these records using various workflows that captured different data. To be included in the dataset, each card was transcribed by three or more volunteers. Zooniverse transcriptions require significant cleaning and curation before the data is in a usable format. We used code to aggregate the transcribed data from each person in order to provide a consensus-based “final answer” and confidence score for each data field, based on how well entries from the different volunteers matched. We then standardized data using techniques such as changing all text to lowercase, trimming excess whitespace, and converting fractions to decimals. We separated numeric and alphabetic values into different data columns. Finally, we standardized units for each variable into a single unit, and when applicable, transformed to metric units (e.g. inches to millimeters). We checked data numeric values by plotting, identifying outliers, and reviewing the original document. In order to combine multiple sampling events for one lake or connect the transcribed data to more contemporary survey data from the MDNR, we matched the records with the corresponding MDNR unique lake identifiers. The transcribed data included each lake’s name, county, and in some instances geographic reference data in the form of Township, Range, and Section from the United States Public Land Survey System (TRS). We joined data entries on lake names, counties, and TRS when available. Remaining lakes that were unmatched due to issues like lakes crossing county lines or changing names over time, were manually matched to data using experts from the research team. Finally, we were unable to match some of the historical data due to insufficient geographic information.

Instrument and/or Software specifications: NA

Files contained here:
summ_data.csv
summ_species_table.csv
summ_datadictionary.csv

We tried to use a standard naming convention for all of our data fields except for identifiers, dates, and comments. The naming convention is as follows: [variable name]_[min or max]_[unit].

Related publication(s):
King, K.B.S., Schell, J, Wehrly, K.E., Lenard, M., Singer, R., López-Fernández, H., Thomer, A.K., & Alofs, K.M. Community science helps digitize 78 years of fish and habitat data for thousands of lakes in Michigan, USA. under review

Use and Access:
This data set is made available under a Creative Commons Public Domain license (CC0 1.0).

To Cite Data:
King, K.B.S., K.M. Alofs, J. Schell, A. Thomer, K. Wehrly, M. Lenard, & H. Lopez-Fernandez (2025). CHANGES Project - Lake Summary Curated Data [Data set]. University of Michigan - Deep Blue.

Update Provenance Log Entries

Download All Files (To download individual files, select them in the “Files” panel above)

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to contact us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.