DATE: May 13, 2020 TITLE: Code and sample data to accompany "The Shape of Educational Inequality" by Quarles, Budak & Resnick, published in Science Advances AUTHOR: Christopher L. Quarles, chrisquarles@gmail.com This repository contains four files: -- readme.txt: The file you're reading now. -- mlecens.R: This file contains the R code for estimating student capital in a population of students. It contains one function, mlecens, which performs right-censored maximum likelihood estimation to fit a distribution to a data set. -- code from QBR paper.R: This contains the R code used to make (most of) the images and tables in the paper. Because our data is unavailable, all of the code will run on the sample_data.csv. If you want to make an image from the paper with your own data, you can just format your data like in sample_data.csv and then run the code in this file. -- sample_data.csv: For privacy reasons, the data used in the paper is not available to the public. This dataset mimics the type of data used for the analysis. The dataset has 4 variables: - credits_earned = # of credits earned by a given student, rounded to the nearest positive integer - droppedout = FALSE if the student graduated or transferred, TRUE otherwise - transferred = TRUE iff the student transferred to a 4-year college - transnograd = TRUE iff the student transferred but didn't graduate IF YOU JUST WANT TO CALCULATE THE STUDENT CAPITAL OF A GROUP OF STUDENTS: - Make sure that your cohort is large enough. In simulations based on real data, the standard error of the estimated average student capital was roughly: SE = 150/sqrt(sample size). - Also, make sure that a middling number of your students dropped out. Otherwise, you won't observe enough students' capital to make an accurate inference. I don't have a good rule of thumb here, but 20% or fewer dropouts probably won't work. Nor will >90% dropouts. - Save your data in the same format as sample_data.csv, or you can copy and paste over the sample data. You only need two variables: credits_earned and droppedout. droppedout can be either TRUE/FALSE or 1/0. - Make sure all the files are in the same directory. - Make sure that you have the VGAM package installed. You can run install.packages("VGAM") or install it through Tools menu in RStudio. - Run the following lines in R. You'll have to change the file name to match your file. (The sample file should give q=.9917 and mu_s=120.2.) source("mlecens.R") coldat <- read.csv("sample_data.csv") q = mlecens(x=coldat$credits_earned, yc=coldat$droppedout) # This returns the "per-credit retention rate" mu_s = 1/(1-q) # This returns the "average student capital", measured in credits.