Work Description

Title: Results for "New Findings from Explainable SYM-H Forecasting using Gradient Boosting Machines" Open Access Deposited

h
Attribute Value
Methodology
  • We used data from the Coordinated Data Analysis Web (CDAWeb) and OMNI-Web databases maintained by NASA Goddard Space Flight Center's Space Physics Data Facility (SPDF) ( https://cdaweb.gsfc.nasa.gov/) to train gradient boosted trees to predict the SYM-H index. For more information about the data used to create the materials in this repository, please see the Data section in the main paper ( https://doi.org/10.1002/essoar.10508063.3).
Description
  • In this work, we trained gradient boosted trees using XGBoost to predict the SYM-H forecasting using different combinations of solar wind and interplanetary magnetic field (IMF) parameters. Data are in csv and Python pickle formats.
Creator
Depositor
  • daniong@umich.edu
Contact information
Discipline
Funding agency
  • National Aeronautics and Space Administration (NASA)
ORSP grant number
  • NASA DRIVE616Science Center grant 80NSSC20K0600, NASA MMS grant 80NSSC19K0564, NSF PRE-617EVENTS grant 1663800, NSF SWQU grant PHY-2027555
Keyword
Date coverage
  • 2021-09-22
Citations to related material
  • Iong, D., Y. Chen, G. Toth, S. Zou, T. I. Pulkkinen, J. Ren, E. Camporeale, and T. I. Gombosi, New Findings from Explainable SYM-H Forecasting using Gradient Boosting Machines, Space Weather,11, accepted, 2022. https://doi.org/10.1002/essoar.10508063.3
Resource type
Last modified
  • 11/18/2022
Published
  • 07/01/2022
Language
DOI
  • https://doi.org/10.7302/v27p-z270
License
To Cite this Work:
Iong, D., Chen, Y., Toth, G., Zou, S., Pulkkinen, T. I., Ren, J., Camporeale, E., Gombosi, T. I. I. (2022). Results for "New Findings from Explainable SYM-H Forecasting using Gradient Boosting Machines" [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/v27p-z270

Relationships

This work is not a member of any user collections.

Files (Count: 2; Size: 316 MB)

################################################################################################
Materials for "New Findings from Explainable SYM-H Forecasting using Gradient Boosting Machines"
################################################################################################

Authors:

- Daniel Iong (University of Michigan, Ann Arbor)
- Yang Chen (University of Michigan, Ann Arbor)
- Gabor Toth (University of Michigan, Ann Arbor)
- Shasha Zou (University of Michigan, Ann Arbor)
- Tuija Pulkkinen (University of Michigan, Ann Arbor)
- Jiaen Ren (University of Michigan, Ann Arbor)
- Enrico Camporeale (University of Colorado, Boulder/NOAA Space Weather Prediction
Center)
- Tamas Gombosi (University of Michigan, Ann Arbor)

Description of files
--------------------

generate_results.ipynb: Main Jupyter notebook file used to generate plots/tables in the manuscript

generate_results.py: Python file containing helper functions to generate results in _generate_results.ipynb_

results/: Directory with subdirectories named xgboost_[LEAD TIME]_[FEATURES]
containing results for the corresponding lead time and features. The directory
results/villaverde_et_al2021/ was downloaded from
https://zenodo.org/record/4562456#.YreaotLMJQ8. The files in the results/ directory are used in _generate_results.ipynb_.

- contribs.pkl*: Pickle file containing feature contributions.
- features.pkl*: Pickle file containing data used in plotting features used.
- model/: Directory containing model used
- model_configs/processed_data_configs: Directories containing config files for internal use.
- shap_values.pkl: Pickle file containing SHAP values for predictions
- X_test.pkl/y_test.pkl: Pickle files containing test data
- ypred.pkl: Pickle file containing predictions

*Note: We only use the files contribs.pkl and features.pkl from the
xgboost_[LEAD_TIME]_es_dyn_pressure and xgboost_[LEAD_TIME]_es_dyn_pressure_no_symh
subdirectories so they do not exist in the other subdirectories.

rmse_siciliano.csv/rmse_villaverde.csv/rmse_tbl_burton.pkl: CSV/Pickle files
containing a table of RMSE values from methods that we compared to in our manuscipt

stormtimes_siciliano.csv: CSV file containing storm times used for training/testing
models in manuscript

Download All Files (To download individual files, select them in the “Files” panel above)

Best for data sets < 3 GB. Downloads all files plus metadata into a zip file.



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.