Work Description

Title: Data for Analysis of features in a sliding threshold of observation for numeric evaluation (STONE) curve Open Access Deposited

h
Attribute Value
Methodology
  • Idealized data-model scatterplot distributions with known features are used to connect these features with attributes of the STONE curve and other metrics as a function of event identification threshold. Two codes are provided, one in Python using Jupyter Notebooks version 6.3.0 and another in IDL, Interactive Data Language, version 8.7.2. The IDL code can also be accessed with the open-software Gnu Data Language.
Description
  • Many statistical tools have been developed to aid in the assessment of a numerical model’s quality at reproducing observations. Some of these techniques focus on the identification of events within the data set, times when the observed value is beyond some threshold value that defines it as a value of keen interest. An example of this is whether it will rain, in which events are defined as any precipitation above some defined amount. A method called the sliding threshold of observation for numeric evaluation (STONE) curve sweeps the event definition threshold of both the model output and the observations, resulting in the identification of threshold intervals for which the model does well at sorting the observations into events and nonevents. An excellent data-model comparison will have a smooth STONE curve, but the STONE curve can have wiggles and ripples in it. These features reveal clusters when the model systematically overestimates or underestimates the observations. This study establishes the connection between features in the STONE curve and attributes of the data-model relationship. The method is applied to a space weather example.
Creator
Depositor
  • liemohn@umich.edu
Contact information
Discipline
Funding agency
  • National Aeronautics and Space Administration (NASA)
  • National Science Foundation (NSF)
  • Other Funding Agency
Other Funding agency
  • European Union
Keyword
Date coverage
  • 2013-09-20 to 2015-03-31
Citations to related material
  • Liemohn, M. W., Adam, J. G., & Ganushkina, N. Y. (2022). Analysis of features in a sliding threshold of observation for numeric evaluation (STONE) curve. Space Weather, 20, e2022SW003102. https://doi.org/10.1029/2022SW003102
Resource type
Last modified
  • 02/17/2023
Published
  • 02/17/2023
Language
DOI
  • https://doi.org/10.7302/2mcx-5749
License
To Cite this Work:
Liemohn, M. W., Adam, J. G., Ganushkina, N. Y. (2023). Data for Analysis of features in a sliding threshold of observation for numeric evaluation (STONE) curve [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/2mcx-5749

Relationships

This work is not a member of any user collections.

Files (Count: 15; Size: 30.1 MB)

Date: 2 May 2022

Title: Analysis of features in a sliding threshold of observation for numeric evaluation (STONE) curve

Authors: Michael W. Liemohn, Joshua G. Adam, and Natalia Yu. Ganushkina

Contact: Mike Liemohn liemohn@umich.edu

Acknowldegment and Supporting Grants:
The authors would like to thank the US government for sponsoring this research, in particular research grants from NASA (NNX17AI48G, 80NSSC20K0353, 80NSSC17K0015, 80NSSC19K0077, 80NSSC21K1127, and NNX17AB87G) and NSF (AGS-1414517). The authors received funding from the European Union Horizon 2020 Research and Innovation programme under grant agreement 870452 (PAGER). The IMPTAM simulations are available through the Finnish Meteorological Institute (http://imptam.fmi.fi/) and at the University of Michigan (http://citrine.engin.umich.edu/imptam/).

Key Points:
- The STONE curve, an event detection sweeping-threshold data-model comparison metric, reveals thresholds where the model matches the data
- STONE curves can be nonmonotonic, revealing the location and size of clusters of model under- or over-estimations of the observations
- STONE curve features are analyzed, quantifying the shape of nonmonotonicities relative to distribution characteristics and other metrics

Research Overview:
Many statistical tools have been developed to aid in the assessment of a numerical model’s quality at reproducing observations. Some of these techniques focus on the identification of events within the data set, times when the observed value is beyond some threshold value that defines it as a value of keen interest. An example of this is whether it will rain, in which events are defined as any precipitation above some defined amount. A method called the sliding threshold of observation for numeric evaluation (STONE) curve sweeps the event definition threshold of both the model output and the observations, resulting in the identification of threshold intervals for which the model does well at sorting the observations into events and nonevents. An excellent data-model comparison will have a smooth STONE curve, but the STONE curve can have wiggles and ripples in it. These features reveal clusters when the model systematically overestimates or underestimates the observations. This study establishes the connection between features in the STONE curve and attributes of the data-model relationship.

Methodology:
Idealized data-model scatterplot distributions with known features are used to connect these features with attributes of the STONE curve and other metrics as a function of event identification threshold.

Instrument and/or Software specifications:
Two codes are provided, one in Python using Jupyter Notebooks version 6.3.0 and another in IDL, Interactive Data Language, version 8.7.2. The resulting plot files were then combined into the multi-panel figures for the paper using Adobe Illustrator.

Files contained here:

- Combined_IMPTAM_MAGED_OMNIDATA.dat: energetic electron fluxes measured by the magnetosphere electron detector (MAGED) on the geosynchronus orbiting environmental satellites (GOES) in geostationary orbit at 6.62 Earth radii geocentric distance over the American sector, specifically from GOES-13, GOES-14, and GOES-15, along with corresponding output from the inner magnetosphere particle transport and acceleration model (IMPTAM) running in real time at the University of Michigan. Dates of the data and model values span September 20, 2013 through March 31, 2015.

- IMPTAM_GOES_STONEandMetrics.pro: Interactive Data Language (IDL) code that calculates the STONE curve values and several other metrics for the GOES-IMPTAM comparison

- IMPTAM_GOES_STONE__STONEmax06.30_STONEmin03.30_40keV: file created by the IDL routine of the GOES-IMPTAM comparison listing the values plotting the STONE curve and metrics. This output was created for the limited magnetic local time range of 03 to 09.

- STONE_Analysis.ipynb: Jupyter notebook file the creates (or reads in) the created number sets for the known data-model distributions, then conducts STONE curve and other metrics calculations, and makes plots of all quantities.

- Fig1_2_3_files.zip: contains two csv files associated with Figures 1, 2, and 3; one is the input data-model pairs for the scatterplot in Figure 1 and the other is the contingency table counts displayed in Figure 2 and used to compute the metrics in Figure 3.

- Fig4_files.zip: contains eight csv files in two types, data-model pairs and contingency table counts like the two in Fig1_2_3_files, a pair of files for each column of Figure 4.

- Fig5_files.zip: contains eight csv files in two types, data-model pairs and contingency table counts like the two in Fig1_2_3_files, a pair of files for each column of Figure 5.

- Fig6_files.zip: contains eight csv files in two types, data-model pairs and contingency table counts like the two in Fig1_2_3_files, a pair of files for each column of Figure 6.

- Fig7_files.zip: contains eight csv files in two types, data-model pairs and contingency table counts like the two in Fig1_2_3_files, a pair of files for each column of Figure 7.

- Fig8_files.zip: contains eight csv files in two types, data-model pairs and contingency table counts like the two in Fig1_2_3_files, a pair of files for each column of Figure 8.

- Fig9_files.zip: contains eight csv files in two types, data-model pairs and contingency table counts like the two in Fig1_2_3_files, a pair of files for each column of Figure 9.

- Fig10_files.zip: contains eight csv files in two types, data-model pairs and contingency table counts like the two in Fig1_2_3_files, a pair of files for each column of Figure 10.

- Fig11_files.zip: contains eight csv files in two types, data-model pairs and contingency table counts like the two in Fig1_2_3_files, a pair of files for each column of Figure 11.

- Fig12_files.zip: contains 16 csv files of contingency table counts used to calculate the STONE curves shown in each panel of Figure 12.

The column headers in the data-model pair input value csv files are "model" and "obs", for the x and y axis values, respectively.

The column headers in the contingency table count csv files are as follows:
- "Threshold" for the event identification value used to define the quadrants of the scatterplot
- "TP" for true positive, or hits, the counts in the upper-right quadrant, event status for both model and observation
- "TN" for true negative, or correct negative, the counts in the lower-left quadrant, non-event status for both model and observation
- "FP" for false positive, or false alarm, the counts in the lower-right quadrant, modeled events and observed non-events
- "FN" for false negative, or misses, the counts in the upper-left quadrant, modeled non-events and observed events

Related publication:
Liemohn, M. W., Adam, J., and Ganushkina, N. Y. (2022). Analysis of features in a sliding threshold of observation for numeric evaluation (STONE) curve. Space Weather, 20, e2022SW003102. https://doi.org/10.2019/2022SW003102

Use and Access:
This data set is made available under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

To Cite Data:
Liemohn, M. W., et al. (2022). Analysis of features in a sliding threshold of observation for numeric evaluation (STONE) curve [Data set]. University of Michigan Deep Blue Data Repository. https:TBD.

Download All Files (To download individual files, select them in the “Files” panel above)

Best for data sets < 3 GB. Downloads all files plus metadata into a zip file.



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.