Work Description

Title: Cell-morphodynamic phenotype classification with application to cancer metastasis using cell magnetorotation and machine-learning Open Access Deposited

h
Attribute Value
Methodology
  • Raw images of 4 cell lines, HR-14, PC3, MDA-MB-231, and MCF-7 were imaged in a microwell array via fluorescence microscopy. Image files show single cells in microwells. Subfolders indicating area indicate only different positions of the microwell array that were imaged. A computer controlled microscope ensured areas were not imaged twice. All areas were imaged once/minute for around 60 minutes.

  • Using the python file provided here, getPos6.2.py, and ImageJ, these array images can be cropped into individual cells. These individual cropped cell images can then be used with CellProfiler to characterize and process each image before machine learning classification. The file, 'remys pipeline.cp' is the CellProfiler pipeline we used to process our images. CellProfiler version 11710 was used for this analysis.

  • The output of the CellProfiler pipeline is the data used for Principal Component Analysis and machine-learning based classification. These results are contained in the '.csv' files, with the name if the file corresponding to a single cell line. This raw data can be used with the python analysis files, which will perform PCA, supervised classification (Adaboost), and unsupervised classification (k-means).

  • Please see our publication in PLOS ONE, titled "Cell-morphodynamic phenotype classification with application to cancer metastasis using cell magnetorotation and machine-learning" for a full explanation of the methodology.
Description
  • This data set includes four zipped files each containing unprocessed cell images from a single cell line collected as raw data, the scripts used to process these images and tabular files with the processed data outputs. This data set supports the PLOS ONE publication, "Cell-morphodynamic phenotype classification with application to cancer metastasis using cell magnetorotation and machine-learning."
Creator
Depositor
  • folzja@umich.edu
Contact information
Discipline
Funding agency
  • National Institutes of Health (NIH)
Resource type
Last modified
  • 11/02/2021
Published
  • 11/02/2021
DOI
  • https://doi.org/10.7302/513f-1h23
License
To Cite this Work:
Folz, J. (2021). Cell-morphodynamic phenotype classification with application to cancer metastasis using cell magnetorotation and machine-learning [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/513f-1h23

Relationships

This work is not a member of any user collections.

Files (Count: 15; Size: 108 GB)

Date: 25 October, 2021

Dataset Title: Cell-morphodynamic phenotype classification with application to cancer metastasis using cell magnetorotation and machine-learning [dataset]

Dataset Creators: R Elbez, J Folz, A McLean, H Roca, JM Labuz, KJ Pienta, S Takayama, R Kopelman

Dataset Contact: Jeff Folz folzja@umich.edu

Funding: R21 CA160157 (NIH), CA136829(NIH), R01CA186769 (NIH), 1R01CA250499 (NIH), T32-DE007057 (NIH, Tissue Engineering and Regenerative Medicine Training Program) & T32 ED005582-05(NIH, Microfluidics in Biomedical Sciences Training Program)

Key Points:
- We collected fluorescence images of magnetically activated cells loaded into a microfluidic device
- Images were cropped into single cell images, which were then processed using CellProfiler (v11710)
- After processing, we employed machine-learning algorithms to cluster and identify cell phenotypes

Research Overview:
We define cell morphodynamics as the cell’s time dependent morphology. It could be called the cell’s shape shifting ability. To measure it we use a biomarker free, dynamic histology method, which is based on multiplexed Cell Magneto-Rotation and Machine Learning. We note that standard studies looking at cells immobilized on microscope slides cannot reveal their shape shifting, no more than pinned butterfly collections can reveal their flight patterns. Using cell magnetorotation, with the aid of cell embedded magnetic nanoparticles, our method allows each cell to move freely in 3 dimensions,
with a rapid following of cell deformations in all 3-dimensions, so as to identify and classify a cell by its dynamic morphology. Using object recognition and machine learning algorithms, we continuously measure the real-time shape dynamics of each cell, where from we successfully resolve the inherent broad heterogeneity of the morphological phenotypes found in a given cancer cell population. In three illustrative experiments we have achieved clustering, differentiation, and identification of cells from (A) two distinct cell lines, (B) cells having gone through the epithelial-to mesenchymal transition, and (C) cells differing only by their motility. This microfluidic method may enable a fast screening and identification of invasive cells, e.g., metastatic cancer cells, even in the absence of biomarkers, thus providing a rapid diagnostics and assessment protocol for effective personalized cancer therapy.

Instrument and/or Software specifications: CellProfiler v11710 (https://cellprofiler.org/), Python 2.7

Files contained here:
Four .tar files can be unzipped into directories that contain unprocessed cell images collected as raw data. The four folders each pertain to a single cell line, one of MCF-7, MDA-MB-231, HR-14, and PC-3. Each folder contains 2-3 subfolders named after one cell line. Within these folders are a list of area folders and 1 positions folder. Each of the area folders represent a single area of the microfluidic device that was imaged. For each cell line, there are typically 10-20 area folders. Each of these area folders contains 60+ unprocessed cell images (.tiff files). The positions folder contains the pixel locations of the cells imaged in each corresponding area. Thus, there exists one positions folder that contains a number of .txt files, with each .txt file matching a single area that was imaged.
To recrop the images, place the python file getPos6.2.py into the same directory as the area+positions folders, and run the relevant python command (see comments/instructions in getPos6.2.py).

Files:
10_22_2014 – Raw data for MCF-7 cell line
10_24_2104 - Raw data for MDA-MB-231 cell line
10_25_2104 - Raw data for HR14 cell line
10_26_2104 - Raw data for PC3 cell line

CellProfiler File (File)
‘Remys pipeline.cp’ is the CellProfiler (v11710) pipeline used to process cropped cell images via the program CellProfiler (https://cellprofiler.org/). Once loaded into CellProfiler, it will use cropped cell images to characterize each cell.

Files:

‘remys pipeline.cp’ – The CellProfiler Pipeline used for image processing. Requires CellProfiler v11710

.csv Files:
For the curious investigator, there is no need to recrop the images. Instead, the output of the CellProfiler pipeline for each cell line is gathered in a series of .csv folders. The .csv folders are named after the cell line they describe. All measured variables, unless otherwise specified, pertain to measurements performed in pixels, or ‘pixel units’.

In the event that a ‘row’ is missing data (a representing 1 cell at one time), these cells should be ignored and removed from analysis.
For cells missing only a few parameters (the row is full, but is missing a few columns or contains NaNs), the average value of the parameter in question is inserted in pace of the NaN or missing value.

Files:

‘data_all_clean_hr14.csv’ – processed data for the HR14 cell line
‘data_all_clean_pc3.csv’ - processed data for the PC3 cell line
‘data_image_mda_06’ - processed data for the MDA-MB-231 cell line before Boyden chamber migration
’data_image_mda_09.csv’ - processed data for the MDA-MB-231 cell line post Boyden chamber migration
‘data_image_mcf_1.csv’ - processed data for the MCF-7 cell line

Python Files
For analysis and machine learning of the processed data, we utilized the Scikit Learn library (https://scikit-learn.org/stable/) available for python. Note, all python files are written in python 2.7.

Files:
‘getPos6.2.py’ – This python file is used for preprocessing, specifically, cropping the images. To use it, insert it into the same directory as the areas/position folder for each cell line,]and run the command from terminal ‘python getPos6.2.py &.tiff 00’. This action will automatically crop the images.

‘pca_feature_selection.py’ – This program conducts Principal Component Analysis, and can be sued to determine which features in the processed data contribute to the overall variance. Users will need to change the program to reflect which dataset they need to analyze.

‘estimators_adaboost.py’ – Uses the Adaboost ML algorithm to differentiate and identify the cells. Relevant for Figure 2 in the associated publication in PLOS.

‘classifier.py’ – Uses k-means clustering to cluster cells and identify subphenotypes. Relevant for figures 3 and 4 in the associated publication in PLOS.

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 108 GB is too large to download directly. Consider using Globus (see below).

Files are ready   Download Data from Globus
Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.