Work Description

Title: Estimating Student Capital of Community College Student Populations Open Access Deposited
Attribute Value
  • The dataset here is simulated, designed to mimic information about a cohort of degree-seeking community college students who aimed to transfer to a 4-year college. It contains information about the number of credits each simulated student earned and whether they dropped out, graduated with an associate's degree, or transferred to a four-year college. The dataset can be used as a template for running the associated R files.
  • Student capital is the set of skills, traits, and resources that an individual can draw upon to be successful in school. With dropout rates around 50%, community college students often don't have enough student capital to achieve their goals. The R code in this dataset estimates the average student capital of a group of community college students using data on their total credits and academic outcomes. It also contains R code to create figures, as found in the paper "The Shape of Educational Inequality" by Quarles, Budak & Resnick.
Contact information
Citations to related material
Resource type
Last modified
  • 07/15/2020
  • 05/19/2020
To Cite this Work:
Quarles, C. L. (2020). Estimating Student Capital of Community College Student Populations [Data set], University of Michigan - Deep Blue Data.


Files (Count: 4; Size: 52.2 KB)

DATE: May 13, 2020
TITLE: Code and sample data to accompany "The Shape of Educational Inequality" by Quarles, Budak & Resnick, published in Science Advances
AUTHOR: Christopher L. Quarles,

This repository contains four files:
-- readme.txt: The file you're reading now.
-- mlecens.R: This file contains the R code for estimating student capital in a population of students. It contains one function, mlecens, which performs right-censored maximum likelihood estimation to fit a distribution to a data set.
-- code from QBR paper.R: This contains the R code used to make (most of) the images and tables in the paper. Because our data is unavailable, all of the code will run on the sample_data.csv. If you want to make an image from the paper with your own data, you can just format your data like in sample_data.csv and then run the code in this file.
-- sample_data.csv: For privacy reasons, the data used in the paper is not available to the public. This dataset mimics the type of data used for the analysis. The dataset has 4 variables:
- credits_earned = # of credits earned by a given student, rounded to the nearest positive integer
- droppedout = FALSE if the student graduated or transferred, TRUE otherwise
- transferred = TRUE iff the student transferred to a 4-year college
- transnograd = TRUE iff the student transferred but didn't graduate


- Make sure that your cohort is large enough. In simulations based on real data, the standard error of the estimated average student capital was roughly: SE = 150/sqrt(sample size).
- Also, make sure that a middling number of your students dropped out. Otherwise, you won't observe enough students' capital to make an accurate inference. I don't have a good rule of thumb here, but 20% or fewer dropouts probably won't work. Nor will >90% dropouts.
- Save your data in the same format as sample_data.csv, or you can copy and paste over the sample data. You only need two variables: credits_earned and droppedout. droppedout can be either TRUE/FALSE or 1/0.
- Make sure all the files are in the same directory.
- Make sure that you have the VGAM package installed. You can run install.packages("VGAM") or install it through Tools menu in RStudio.
- Run the following lines in R. You'll have to change the file name to match your file. (The sample file should give q=.9917 and mu_s=120.2.)


Download All Files (To download individual files, select them in the “Files” panel above)

Best for data sets < 3 GB. Downloads all files plus metadata into a zip file.

Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus