Work Description
Title: Batch-Mask: An automated Mask R-CNN workflow to isolate non-standard biological specimens for color pattern analysis (Data set) Open Access Deposited
Attribute | Value |
---|---|
Methodology |
|
Description |
|
Creator | |
Depositor |
|
Contact information | |
Discipline | |
Funding agency |
|
Other Funding agency |
|
Keyword | |
Date coverage |
|
Citations to related material |
|
Related items in Deep Blue Documents |
|
Resource type | |
Last modified |
|
Published |
|
Language | |
DOI |
|
License |
(2022). Batch-Mask: An automated Mask R-CNN workflow to isolate non-standard biological specimens for color pattern analysis (Data set) [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/3xwv-7n71
Relationships
- This work is not a member of any user collections.
Files (Count: 3; Size: 3.63 GB)
Thumbnailthumbnail-column | Title | Original Upload | Last Modified | File Size | Access | Actions |
---|---|---|---|---|---|---|
README.txt | 2021-11-10 | 2022-05-20 | 5.74 KB | Open Access |
|
|
batch-mask-v1.0.0.zip | 2022-04-07 | 2022-04-08 | 1.82 GB | Open Access |
|
|
batch-mask-v1.0.2.zip | 2022-05-20 | 2022-07-09 | 1.82 GB | Open Access |
|
Date: 9 October, 2021
Dataset Title: Batch-Mask: An automated Mask R-CNN workflow to isolate non-standard biological specimens for color pattern analysis (Dataset)
Dataset Creators: J.D. Curlis, T. Renney, A.R. Davis Rabosky, T.Y. Moore
Dataset Contact: Talia Moore taliaym@umich.edu, Timothy Renney renneytj@umich.edu
Funding: MCubed award
Research Overview:
1: Efficient comparisons of biological color patterns are critical for understanding the mechanisms by which organisms evolve in ecosystems, including sexual selection, predator-prey interactions, and thermoregulation. However, elongate or spiral-shaped organisms do not conform to the standard orientation and photographic techniques required for automated analysis. Currently, large-scale color analysis of elongate animals requires time-consuming manual landmarking, which reduces their representation in coloration research despite their ecological importance.
2: We present Batch-Mask: an automated and customizable workflow to facilitate the analysis of large photographic datasets of non-standard biological subjects. First, we present a user guide to run an open-source region-based convolutional neural network with fine-tuned weights for identifying and isolating a biological subject from a background (masking). Then, we demonstrate how to combine masking with existing manual visual analysis tools into a single streamlined, automated workflow for comparing color patterns across images.
3: Batch-Mask was 60x faster than manual landmarking, produced masks that correctly identified 96% of all snake pixels, and produced pattern energy results that were not significantly different from the manually landmarked dataset.
4: The fine-tuned weights for the masking neural network, user guide, and automated workflow substantially decrease the amount of time and attention required to quantitatively analyze non-standard biological subjects. By using these tools, biologists will be able to compare color, pattern, and shape differences in large datasets that include significant morphological variation in elongate body forms. This advance will be especially valuable for comparative analyses of natural history collections, and through automation can greatly expand the scale of space, time, or taxonomic breadth across which color variation can be quantitatively examined.
Methodology:
Batch-Mask utilizes a customized region-based convolutional neural network (R-CNN) model to generate masks of snakes in photographs. This neural network uses the training process to fine-tune mask weights from pre-trained weights provided with Mask R-CNN. On Google Colab, we set the GPU count to 1 and the number of images per GPU to 1. Our learning rate was 0.0001. All other parameters in the configuration file were left to their default values. The number of validation steps must be equal to the number of tiles in the validation set so that loss is calculated on the full validation set for every epoch. Mask-RCNN suggests using twice as many training steps as validation steps. The number of training and validation steps in an epoch does not affect model accuracy, but if training and validation loss values converge after a single epoch, decreasing the number of training steps will reveal the progression of loss values. Decreasing training steps should be accompanied by decreasing validation steps, such that a roughly 2:1 ratio is maintained. If the loss values take more than 12 hours to converge, the number of training steps can be increased. If both the training and validation loss plateau at non-zero values, model settings can be adjusted to increase accuracy. The training that resulted in the best masks used 450 training steps and 50 validation steps for each epoch. We trained for 20 epochs, each lasting 1.21 hours. The training and validation losses plateaued at 16 epochs, after which the validation losses began increasing (likely due to overfitting). The weight values at epochs were used for inference. Our training process duration was 24.2 hours.
Instrument and/or Software specifications:
Follow the Read.me document https://github.com/EMBiRLab/batch-mask.git to download and use the software associated with these files.
Files contained here:
imagej:
training_masks: The json files with human-labeled masks that can be used for micaToolbox pattern analysis.
inference_masks: The json files with masks generated by the neural network that can be used for micaToolbox pattern analysis.
non color_corrected images: Contains all training without color correction. These are used for micaToolbox pattern analysis. There may be some images that look similar, but have small pixel-wise differences.
mask-rcnn:
weights: This folder contains the epoch 16 weights file (.h5) for Mask RCNN trained on the combined dorsal and ventral dataset. These are the fine-tuned weights that are ready to use for inference.
test_set: A set of color corrected images that can be used to test the neural network qualitatively.
test_set_output: Example images overlayed with masks generated by the neural network.
training_sets:
dorsal_set: All dorsal images with their respective json file with human-labeled masks used for the training process.
ventral_set: All ventral images with their respective json file with human-labeled masks used for the training process.
landmarks:
tps_files: The original labels in the form of tps files. The name of the corresponding image is included in the text of the tps file. A tutorial for our tpsdig protocol is here: https://docs.google.com/document/d/1BjQ3EBL_E9ZnubEANbEuoD6T2zMF9Np1I7cy1fKOUGc/edit?usp=sharing
batch-mask-v1.0.2.zip:
A zip file that contains all of the code and software that correlates to the Github v1.0.2 release with the nessecary datasets to reproduce the outputs.
batch-mask-v1.0.0.zip:
Depreciated version of batch-mask. Not recommended for use. Correlates to the Github v1.0.0 release.
Related publication(s): Curlis, Renney, Davis Rabosky, Moore (submitted) Batch-Mask: An automated Mask R-CNN workflow to isolate non-standard biological specimens for color pattern analysis.
Use and Access: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
To Cite Data: Curlis, J., Renney, T., Davis Rabosky, A., Moore, T. Batch-Mask: An automated Mask R-CNN workflow to isolate non-standard biological specimens for color pattern analysis [Dataset], University of Michigan - Deep Blue Data.