Scientists often try to reproduce observations with a model, helping them explain the observations by adjusting known and controllable features within the model. They then use a large variety of metrics for assessing the ability of a model to reproduce the observations. One such metric is called the relative operating characteristic (ROC) curve, a tool that assesses a model’s ability to predict events within the data. The ROC curve is made by sliding the event-definition threshold in the model output, calculating certain metrics and making a graph of the results. Here, a new model assessment tool is introduced, called the sliding threshold of observation for numeric evaluation (STONE) curve. The STONE curve is created by sliding the event definition threshold not only for the model output but also simultaneously for the data values. This is applicable when the model output is trying to reproduce the exact values of a particular data set. While the ROC curve is still a highly valuable tool for optimizing the prediction of known and pre-classified events, it is argued here that the STONE curve is better for assessing model prediction of a continuous-valued data set.
and Data and code were created using IDL, but can also be accessed with the open-source Gnu Data Language (GDL; see https://github.com/gnudatalanguage/gdl)
Liemohn, M. W., Azari, A. R., Ganushkina, N. Yu., & Rastätter, L. (2020). The STONE curve: A ROC-derived model performance assessment tool. Earth and Space Science, 7, e2020EA001106. https://doi.org/10.2019/2020EA001106