################################################################################################ Materials for "New Findings from Explainable SYM-H Forecasting using Gradient Boosting Machines" ################################################################################################ Authors: - Daniel Iong (University of Michigan, Ann Arbor) - Yang Chen (University of Michigan, Ann Arbor) - Gabor Toth (University of Michigan, Ann Arbor) - Shasha Zou (University of Michigan, Ann Arbor) - Tuija Pulkkinen (University of Michigan, Ann Arbor) - Jiaen Ren (University of Michigan, Ann Arbor) - Enrico Camporeale (University of Colorado, Boulder/NOAA Space Weather Prediction Center) - Tamas Gombosi (University of Michigan, Ann Arbor) Description of files -------------------- generate_results.ipynb: Main Jupyter notebook file used to generate plots/tables in the manuscript generate_results.py: Python file containing helper functions to generate results in _generate_results.ipynb_ results/: Directory with subdirectories named xgboost_[LEAD TIME]_[FEATURES] containing results for the corresponding lead time and features. The directory results/villaverde_et_al2021/ was downloaded from https://zenodo.org/record/4562456#.YreaotLMJQ8. The files in the results/ directory are used in _generate_results.ipynb_. - contribs.pkl*: Pickle file containing feature contributions. - features.pkl*: Pickle file containing data used in plotting features used. - model/: Directory containing model used - model_configs/processed_data_configs: Directories containing config files for internal use. - shap_values.pkl: Pickle file containing SHAP values for predictions - X_test.pkl/y_test.pkl: Pickle files containing test data - ypred.pkl: Pickle file containing predictions *Note: We only use the files contribs.pkl and features.pkl from the xgboost_[LEAD_TIME]_es_dyn_pressure and xgboost_[LEAD_TIME]_es_dyn_pressure_no_symh subdirectories so they do not exist in the other subdirectories. rmse_siciliano.csv/rmse_villaverde.csv/rmse_tbl_burton.pkl: CSV/Pickle files containing a table of RMSE values from methods that we compared to in our manuscipt stormtimes_siciliano.csv: CSV file containing storm times used for training/testing models in manuscript