The STONE Curve: A ROC‐Derived Model Performance Assessment Tool
Liemohn, Michael W.; Azari, Abigail R.; Ganushkina, Natalia Y.; Rastätter, Lutz
2020-08
Citation
Liemohn, Michael W.; Azari, Abigail R.; Ganushkina, Natalia Y.; Rastätter, Lutz (2020). "The STONE Curve: A ROC‐Derived Model Performance Assessment Tool." Earth and Space Science 7(8): n/a-n/a.
Abstract
A new model validation and performance assessment tool is introduced, the sliding threshold of observation for numeric evaluation (STONE) curve. It is based on the relative operating characteristic (ROC) curve technique, but instead of sorting all observations in a categorical classification, the STONE tool uses the continuous nature of the observations. Rather than defining events in the observations and then sliding the threshold only in the classifier/model data set, the threshold is changed simultaneously for both the observational and model values, with the same threshold value for both data and model. This is only possible if the observations are continuous and the model output is in the same units and scale as the observations, that is, the model is trying to exactly reproduce the data. The STONE curve has several similarities with the ROC curve—plotting probability of detection against probability of false detection, ranging from the (1,1) corner for low thresholds to the (0,0) corner for high thresholds, and values above the zero‐intercept unity‐slope line indicating better than random predictive ability. The main difference is that the STONE curve can be nonmonotonic, doubling back in both the x and y directions. These ripples reveal asymmetries in the data‐model value pairs. This new technique is applied to modeling output of a common geomagnetic activity index as well as energetic electron fluxes in the Earth’s inner magnetosphere. It is not limited to space physics applications but can be used for any scientific or engineering field where numerical models are used to reproduce observations.Plain Language SummaryScientists often try to reproduce observations with a model, helping them explain the observations by adjusting known and controllable features within the model. They then use a large variety of metrics for assessing the ability of a model to reproduce the observations. One such metric is called the relative operating characteristic (ROC) curve, a tool that assesses a model’s ability to predict events within the data. The ROC curve is made by sliding the event‐definition threshold in the model output, calculating certain metrics and making a graph of the results. Here, a new model assessment tool is introduced, called the sliding threshold of observation for numeric evaluation (STONE) curve. The STONE curve is created by sliding the event definition threshold not only for the model output but also simultaneously for the data values. This is applicable when the model output is trying to reproduce the exact values of a particular data set. While the ROC curve is still a highly valuable tool for optimizing the prediction of known and preclassified events, it is argued here that the STONE curve is better for assessing model prediction of a continuous‐valued data set.Key PointsA new event‐detection‐based metric for model performance appraisal is given with sliding thresholds in both observational and model valuesThe new metric is like the relative operating characteristic curve but uses continuous observational values, not just categorical statusThe new metric is used on real‐time model predictions of common geomagnetic activity parameters, demonstrating its features and strengthsPublisher
Wiley‐Blackwell
ISSN
2333-5084 2333-5084
Other DOIs
Types
Article
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.