Work Description

Title: Batch-Mask: An automated Mask R-CNN workflow to isolate non-standard biological specimens for color pattern analysis (Data set) Open Access Deposited

h
Attribute Value
Methodology
  • Batch-Mask utilizes a customized region-based convolutional neural network (R-CNN) model to generate masks of snakes in photographs. This neural network uses the training process to fine-tune mask weights from pre-trained weights provided with Mask R-CNN. On Google Colab, we set the GPU count to 1 and the number of images per GPU to 1. Our learning rate was 0.0001. All other parameters in the configuration file were left to their default values. The number of validation steps must be equal to the number of tiles in the validation set so that loss is calculated on the full validation set for every epoch. Mask-RCNN suggests using twice as many training steps as validation steps. The number of training and validation steps in an epoch does not affect model accuracy, but if training and validation loss values converge after a single epoch, decreasing the number of training steps will reveal the progression of loss values. Decreasing training steps should be accompanied by decreasing validation steps, such that a roughly 2:1 ratio is maintained. If the loss values take more than 12 hours to converge, the number of training steps can be increased. If both the training and validation94loss plateau at non-zero values, Section 4.1 discusses how model settings can be adjusted to increase accuracy. The training that resulted in the best masks used 450 training steps and 50 validation steps for each epoch. We trained for 20epochs, each lasting 1.21 hours. The training and validation losses plateaued at 16 epochs, after which the validation losses began increasing (likely due to overfitting). The weight values at epochs were used for inference. Our training process duration was 24.2 hours.
Description
  • Efficient comparisons of biological color patterns are critical for understanding the mechanisms by which organisms evolve in ecosystems, including sexual selection, predator-prey interactions, and thermoregulation. However, elongate or spiral-shaped organisms do not conform to the standard orientation and photographic techniques required for automated analysis. Currently, large-scale color analysis of elongate animals requires time-consuming manual landmarking, which reduces their representation in coloration research despite their ecological importance. We present Batch-Mask: an automated and customizable workflow to facilitate the analysis of large photographic data sets of non-standard biological subjects. First, we present a user guide to run an open-source region-based convolutional neural network with fine-tuned weights for identifying and isolating a biological subject from a background (masking). Then, we demonstrate how to combine masking with existing manual visual analysis tools into a single streamlined, automated workflow for comparing color patterns across images. Batch-Mask was 60x faster than manual landmarking, produced masks that correctly identified 96% of all snake pixels, and produced pattern energy results that were not significantly different from the manually landmarked data set. The fine-tuned weights for the masking neural network, user guide, and automated workflow substantially decrease the amount of time and attention required to quantitatively analyze non-standard biological subjects. By using these tools, biologists will be able to compare color, pattern, and shape differences in large data sets that include significant morphological variation in elongate body forms. This advance will be especially valuable for comparative analyses of natural history collections, and through automation can greatly expand the scale of space, time, or taxonomic breadth across which color variation can be quantitatively examined.
Creator
Depositor
  • taliaym@umich.edu
Contact information
Discipline
Funding agency
  • Other Funding Agency
Other Funding agency
  • University of Michigan MCubed Grant
Keyword
Date coverage
  • 2018-08-10 to 2021-10-09
Citations to related material
  • Curlis, Renney, Davis Rabosky, Moore (submitted) Batch-Mask: An automated Mask R-CNN workflow to isolate non-standard biological specimens for color pattern analysis.
Related items in Deep Blue Documents
  • University of Michigan, D., Davis Rabosky, A., Larson, J., Moore, T., Curlis, J. Neotropical snake photographs [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/qta3-xs67
Resource type
Last modified
  • 11/22/2022
Published
  • 04/07/2022
Language
DOI
  • https://doi.org/10.7302/3xwv-7n71
License
To Cite this Work:
Curlis, J., Renney, T., Davis Rabosky, A., Moore, T. (2022). Batch-Mask: An automated Mask R-CNN workflow to isolate non-standard biological specimens for color pattern analysis (Data set) [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/3xwv-7n71

Relationships

This work is not a member of any user collections.

Files (Count: 3; Size: 3.63 GB)

Date: 9 October, 2021

Dataset Title: Batch-Mask: An automated Mask R-CNN workflow to isolate non-standard biological specimens for color pattern analysis (Dataset)

Dataset Creators: J.D. Curlis, T. Renney, A.R. Davis Rabosky, T.Y. Moore

Dataset Contact: Talia Moore taliaym@umich.edu, Timothy Renney renneytj@umich.edu

Funding: MCubed award

Research Overview:

1: Efficient comparisons of biological color patterns are critical for understanding the mechanisms by which organisms evolve in ecosystems, including sexual selection, predator-prey interactions, and thermoregulation. However, elongate or spiral-shaped organisms do not conform to the standard orientation and photographic techniques required for automated analysis. Currently, large-scale color analysis of elongate animals requires time-consuming manual landmarking, which reduces their representation in coloration research despite their ecological importance.

2: We present Batch-Mask: an automated and customizable workflow to facilitate the analysis of large photographic datasets of non-standard biological subjects. First, we present a user guide to run an open-source region-based convolutional neural network with fine-tuned weights for identifying and isolating a biological subject from a background (masking). Then, we demonstrate how to combine masking with existing manual visual analysis tools into a single streamlined, automated workflow for comparing color patterns across images.

3: Batch-Mask was 60x faster than manual landmarking, produced masks that correctly identified 96% of all snake pixels, and produced pattern energy results that were not significantly different from the manually landmarked dataset.

4: The fine-tuned weights for the masking neural network, user guide, and automated workflow substantially decrease the amount of time and attention required to quantitatively analyze non-standard biological subjects. By using these tools, biologists will be able to compare color, pattern, and shape differences in large datasets that include significant morphological variation in elongate body forms. This advance will be especially valuable for comparative analyses of natural history collections, and through automation can greatly expand the scale of space, time, or taxonomic breadth across which color variation can be quantitatively examined.

Methodology:
Batch-Mask utilizes a customized region-based convolutional neural network (R-CNN) model to generate masks of snakes in photographs. This neural network uses the training process to fine-tune mask weights from pre-trained weights provided with Mask R-CNN. On Google Colab, we set the GPU count to 1 and the number of images per GPU to 1. Our learning rate was 0.0001. All other parameters in the configuration file were left to their default values. The number of validation steps must be equal to the number of tiles in the validation set so that loss is calculated on the full validation set for every epoch. Mask-RCNN suggests using twice as many training steps as validation steps. The number of training and validation steps in an epoch does not affect model accuracy, but if training and validation loss values converge after a single epoch, decreasing the number of training steps will reveal the progression of loss values. Decreasing training steps should be accompanied by decreasing validation steps, such that a roughly 2:1 ratio is maintained. If the loss values take more than 12 hours to converge, the number of training steps can be increased. If both the training and validation loss plateau at non-zero values, model settings can be adjusted to increase accuracy. The training that resulted in the best masks used 450 training steps and 50 validation steps for each epoch. We trained for 20 epochs, each lasting 1.21 hours. The training and validation losses plateaued at 16 epochs, after which the validation losses began increasing (likely due to overfitting). The weight values at epochs were used for inference. Our training process duration was 24.2 hours.

Instrument and/or Software specifications:

Follow the Read.me document https://github.com/EMBiRLab/batch-mask.git to download and use the software associated with these files.

Files contained here:

imagej:
training_masks: The json files with human-labeled masks that can be used for micaToolbox pattern analysis.
inference_masks: The json files with masks generated by the neural network that can be used for micaToolbox pattern analysis.
non color_corrected images: Contains all training without color correction. These are used for micaToolbox pattern analysis. There may be some images that look similar, but have small pixel-wise differences.

mask-rcnn:
weights: This folder contains the epoch 16 weights file (.h5) for Mask RCNN trained on the combined dorsal and ventral dataset. These are the fine-tuned weights that are ready to use for inference.
test_set: A set of color corrected images that can be used to test the neural network qualitatively.
test_set_output: Example images overlayed with masks generated by the neural network.
training_sets:
dorsal_set: All dorsal images with their respective json file with human-labeled masks used for the training process.
ventral_set: All ventral images with their respective json file with human-labeled masks used for the training process.

landmarks:
tps_files: The original labels in the form of tps files. The name of the corresponding image is included in the text of the tps file. A tutorial for our tpsdig protocol is here: https://docs.google.com/document/d/1BjQ3EBL_E9ZnubEANbEuoD6T2zMF9Np1I7cy1fKOUGc/edit?usp=sharing

batch-mask-v1.0.2.zip:
A zip file that contains all of the code and software that correlates to the Github v1.0.2 release with the nessecary datasets to reproduce the outputs.

batch-mask-v1.0.0.zip:
Depreciated version of batch-mask. Not recommended for use. Correlates to the Github v1.0.0 release.

Related publication(s): Curlis, Renney, Davis Rabosky, Moore (submitted) Batch-Mask: An automated Mask R-CNN workflow to isolate non-standard biological specimens for color pattern analysis.

Use and Access: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

To Cite Data: Curlis, J., Renney, T., Davis Rabosky, A., Moore, T. Batch-Mask: An automated Mask R-CNN workflow to isolate non-standard biological specimens for color pattern analysis [Dataset], University of Michigan - Deep Blue Data.

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 3.63 GB may be too large to download directly. Consider using Globus (see below).



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.