Work Description

Title: Data for Defining an independent reference model for event detection skill scores Open Access Deposited

h
Attribute Value
Methodology
  • Using Excel spreadsheets, event detection skill scores are calculated. This was done by starting from an array of values for one of the contingency table quadrants (typically, hits) and then using the metric formula and one or more other assumptions to obtain the other quadrant values (misses, false alarms, and correct negatives). One of the plots is a reconstruction of values from a table in a previous publication; these numbers were reproduced in a spreadsheet so that new skill scores could be calculated from those published contingency table entries.
Description
  • When we assess a model's ability to predict observed events, there are many equations to choose from, commonly called metrics, that quantify particular aspects of that data-model relationship. One set of such relationships are called skill scores, in which the value from a metric is compared against the same metric but from a different model, a reference model. For assessing event detection, there are several well-known skill scores, all of which are based on a particular reference model. It is shown here that this reference model is not ideal for assessing a new model's skill because it is, unfortunately, based in part on the new model's performance against the data. It is shown that these well-known skill scores have an ambiguous connection to the underlying metric score. Holding the metric value of the new model constant, there is a range of possible skill scores, and conversely, a given skill score value could result from a range of original metric values. It is recommended to stop using these famous skill scores and instead adopt one of several presented alternatives, all of which are fully independent of the new model. All of the plots for this study were created in Excel spreadsheets. The resulting plot files were then combined into the multi-panel figures for the paper using Adobe Illustrator. Specifically, the "xlsx" files were created using Excel Version 16.94 for the Mac and the "txt" files are were generated with Save As -> Tab Delimited Text format.
Creator
Creator ORCID iD
Depositor
Depositor creator
  • true
Contact information
Discipline
Funding agency
  • National Science Foundation (NSF)
  • National Aeronautics and Space Administration (NASA)
  • Other Funding Agency
Other Funding agency
  • University of Michigan

  • International Space Science Institute
Keyword
Citations to related material
  • Liemohn, Michael W., Ganushkina, Natalia Yu., Welling, Daniel T., & Azari, Abigail R. (2025). Defining an independent reference model for event detection skill scores. Submitted to AGU Advances, 21 February 2025, manuscript # 2025AV001xxx.
Resource type
Last modified
  • 03/10/2025
Published
  • 03/10/2025
Language
DOI
  • https://doi.org/10.7302/jc0a-vt65
License
To Cite this Work:
Liemohn, M. W., Ganushkina, N. Y., Welling, D. T., Azari, A. R. (2025). Data for Defining an independent reference model for event detection skill scores [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/jc0a-vt65

Relationships

This work is not a member of any user collections.

Files (Count: 14; Size: 2.88 MB)

Date: 20 February 2025

Title:

Defining an independent reference model for event detection skill scores

Authors:

Michael W. Liemohn, Natalia Y. Ganushkina, Daniel T. Welling, and Abigail R. Azari

Contact:

Mike Liemohn ([email protected])

Acknowledgment and Supporting Grants:

The authors thank the University of Michigan for its financial support of this project, as well as financial support from NASA (specifically, grant numbers 80NSSC20K0353, 80NSSC19K0077, 80NSSC21K1127, and 80NSSC21K1405) and NSF (specifically, grant AGS-1414517). M. Liemohn and N. Ganushkina were partially supported by the International Space Science Institute (ISSI) in Bern, through ISSI International Team project #24-609“1-100 keV electrons in the Earth’s magnetosphere: Unique and unpredictable?” MWL ideated the study concept, and all authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Key Points of the Study:

-- Four common event detection skill scores - Heidke, Peirce, Clayton, and Gilbert - use a reference model that is dependent on the new model
-- For a given metric score, there is a range of possible skill scores, making their interpretation and comparison fraught with ambiguity
-- New skill scores are derived based on two independent reference models, for both the proportion correct and critical success index metrics

Research Overview:

When we assess a model's ability to predict observed events, there are many equations to choose from, commonly called metrics, that quantify particular aspects of that data-model relationship. One set of such relationships are called skill scores, in which the value from a metric is compared against the same metric but from a different model, a reference model. For assessing event detection, there are several well-known skill scores, all of which are based on a particular reference model. It is shown here that this reference model is not ideal for assessing a new model's skill because it is, unfortunately, based in part on the new model's performance against the data. It is shown that these well-known skill scores have an ambiguous connection to the underlying metric score. Holding the metric value of the new model constant, there is a range of possible skill scores, and conversely, a given skill score value could result from a range of original metric values. It is recommended to stop using these famous skill scores and instead adopt one of several presented alternatives, all of which are fully independent of the new model.

Methodology:

Using Excel spreadsheets, event detection skill scores are calculated. This was done by starting from an array of values for one of the contingency table quadrants (typically, hits) and then using the metric formula and one or more other assumptions to obtain the other quadrant values (misses, false alarms, and correct negatives). One of the plots is a reconstruction of values from a table in a previous publication; these numbers were reproduced in a spreadsheet so that new skill scores could be calculated from those published contingency table entries.

Instrument and/or Software specifications:

All of the plots for this study were created in Excel spreadsheets. The resulting plot files were then combined into the multi-panel figures for the paper using Adobe Illustrator. Specifically, the "xlsx" files were created using Excel Version 16.94 for the Mac and the "txt" files are were generated with Save As -> Tab Delimited Text format.

Files contained here:

Please note that the collection includes the original Excel files (*.xlsx) and a tab-delimited text format version (*.txt) of each of the files listed below.

-- Fig1_300_Skill_vs_Hits: tables use to generate the panels of Figure 1, curves of the skill score value versus the number of hits for a given/constant proportion correct metric score, assuming 300 observed events out of 1000. Calculations are for the Heidke, Peirce, and Clayton skill scores.

-- Fig2_700_Skill_vs_Hits: tables use to generate the panels of Figure 2, curves of the skill score value versus the number of hits for a given/constant proportion correct metric score, assuming 700 observed events out of 1000. Calculations are for the Heidke, Peirce, and Clayton skill scores.

-- Fig3_300_Gilbert_vs_Hits: tables use to generate Figure 3, curves of the skill score value versus the number of hits for a given/constant critical success index metric score, assuming 300 observed events out of 1000. Calculations are for the Gilbert skill score.

-- Fig4_300_NewPCSkill_vs_Hits: tables use to generate lines for two of the panels of Figure 4, curves of the skill score value versus the number of hits for a given/constant proportion correct metric score, assuming 300 observed events out of 1000. Calculations are for the LSS1 and LSS3 skill scores.

-- Fig4_300_NewCSISkill_vs_Hits: tables use to generate lines for two of the panels of Figure 4, curves of the skill score value versus the number of hits for a given/constant critical success index metric score, assuming 300 observed events out of 1000. Calculations are for the LSS2 and LSS4 skill scores.

-- Fig5_Summary_plots: tables of values extracted from the other files used to generate the panels of Figure 5, of skill score versus either the proportion correct metric or the critical success index.

-- Fig6_Ganushkina_recalc: tables of values extracted from the tables in Ganushkina et al., Space Weather, 2019, and then subsequent calculations for the new skill scores of LSS1, LSS2, LSS3, and LSS4.

Acronyms in the Excel/text files:
-- CFR: coin flip reference, one of the new independent reference models presented in the study
-- CFR CSI_ref (or CSI-CFR): the new reference critical success index value for the new independent reference skill scores
-- CFR CSI SS (or CSI-SS-cfr): new skill score based on the coin-flip reference
-- CorrNegs: correct negatives (no-no for the observed and model event status)
-- Counter: the index incremented from low to high for that calculation set
-- CSI: critical success index metric
-- CSI_ref (or CSI-OER): the new reference critical success index value for the new independent reference skill scores
-- CSI-SS (or SS-CSI-OER): the new skill score based on the critical success index metric and the new independent reference skill score
-- CSS: Clayton skill score
-- CSS max: the maximum Clayton skill score for each proportion correct setting
-- CSS min: the minimum Clayton skill score for each proportion correct setting
-- FalseAlarms (or FA): incorrect positive predictions (no-yes for the observed and model event status, respectively)
-- GSS: Gilbert skill score
-- GSS max: the maximum Gilbert skill score for each proportion correct setting
-- GSS min: the minimum gilbert skill score for each proportion correct setting
-- Hits: correct positives (yes-yes for the observed and model event status)
-- H_ref: the reference hits value for the Gilbert skill score
-- HSS: Heidke skill score
-- HSS denom (or Denom): the denominator of the Heidke skill score
-- HSS max: the maximum Heidke skill score for each proportion correct setting
-- HSS min: the minimum Heidke skill score for each proportion correct setting
-- HSS numer (or Numer): the numerator of the Heidke skill score
-- keV: kiloelectron volt, a unit of energy commonly used for fast charged particles in deep space
-- Misses: incorrect negative predictions (yes-no for the observed and model event status, respectively)
-- Obs events: the setting for the number of observed events, held constant in that calculation set
-- OER: observed event reference, the new observation-based independent reference model
-- PC: proportion correct metric
-- PC_ref (or PC-OER): proportion correct metric from the new independent reference model
-- PC-RER: proportion correct metric based on Gilbert's "randomized event reference" model
-- PC-SS (or SS-PC-OER): the new proportion-correct skill score using an independent reference model
-- PC-SS-cfr (or SS-PC-CFR): new proportion-correct skill score using the coin flip reference model
-- PC-SS check: a double check of the math on the new skill score calculations, using a different form of the equation
-- PSS: Peirce skill score
-- PSS max: the maximum Peirce skill score for each proportion correct setting
-- PSS min: the minimum Peirce skill score for each proportion correct setting
-- RER: randomized event reference, the original reference model developed by Gilbert
-- SS check: a double check of the math on the new skill score calculations, using a different form of the equation
-- SS numer: the numerator of the new independent reference skill score calculations
-- SS denom: the denominator of the new independent reference skill score calculations
-- TargetCSI: the setting for the critical success index metric, held constant in that calculation set
-- TargetPC: the setting for the proportion correct metric, held constant in that calculation set
-- Total counts: the number of observed-modeled value pairs in the contingency table (held constant at 1000)

Related publication:

Publication to includes the plots from these files:
Liemohn, Michael W., Ganushkina, Natalia Yu., Welling, Daniel T., & Azari, Abigail R. (2025). Defining an independent reference model for event detection skill scores. Submitted to AGU Advances, 21 February 2025, manuscript # 2025AV001710.

Publication cited with the original space weather data and model result analysis:
Ganushkina, N. Y., Sillanpää, I., Welling, D. T., Haiducek, J., Liemohn, M., Dubyagin, S., & Rodriguez, J. V. (2019). Validation of Inner Magnetosphere Particle Transport and Acceleration Model (IMPTAM) with long-term GOES MAGED measurements of keV electron fluxes at geostationary orbit. Space Weather, 17. https://doi.org/10.1029/2018SW002028

Use and Access:
This data set is made available under a Creative Commons Attribution 4.0 International (CC BY 4.0).

To Cite Data:
Liemohn, M. W., et al. (2025). Data for Defining an independent reference model for event detection skill scores. University of Michigan Deep Blue Data Repository. https:TBD.

Download All Files (To download individual files, select them in the “Files” panel above)

Best for data sets < 3 GB. Downloads all files plus metadata into a zip file.



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to contact us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.