Date: 14 June, 2023 Dataset Title: A Statistical Analysis of High-frequency Transient-Large-Amplitude Geomagnetic Disturbances: Data Dataset Creators: McCuen, Brett A. Dataset Contact: bmccuen@umich.edu Funding: National Science Foundation (NSF) National Aeronautics and Space Administration (NASA) Abstract: We present a comprehensive statistical analysis of high-frequency transient-large-amplitude (TLA) magnetic perturbation events that occurred at 12 high-latitude ground magnetometer stations throughout solar cycle 24 from 2009 to 2019. TLA signatures are defined as one or more second-timescale dB/dt interval with magnitude ≥ 6 nT/s within an hour event window. This study characterizes high-frequency TLA events based on their spatial and temporal behavior as well as relation to auroral substorms, geomagnetic storm phases and nighttime geomagnetic disturbance events events (GMD). We show that TLA events occur primarily at nighttime and solely in the high-latitude region above 60 degrees geomagnetic latitude. The largest TLA events occurred more often in the declining phase of the solar cycle when solar wind velocity was higher and ring current activity was lower, suggesting association to high-speed flows caused by coronal holes and subsequent corotating interaction regions reaching Earth. TLA perturbations often occurred preceding or within the most extreme nighttime geomagnetic disturbance (GMD) events with 5-10 minute timescales, but the TLA intervals were often even more localized than the ~300 km effective radius of GMDs: occurring at only some of the stations at which GMDs occurred. We show that TLA-related GMD events can result from dipolarization fronts in the magnetotail and fast flows toward Earth and are closely temporally associated to poleward boundary intensifications (PBI) and auroral streamers. The highly localized behavior and connection to the most extreme GMD events suggests that TLA intervals are a ground manifestation of the features within rapid and complex ionospheric structures that can drive GICs. -------------------------------------------------------------------------------------------------- Methodology: The data in this repository are transient-large-amplitude dB/dt intervals with second-timescale and amplitude > 6 nT/s. These dB/dt intervals are identified by the automated geomagnetic disturbance classifier described in detail in McCuen et al. (2023). This automated technique is run on yearly data files from individual magnetometer stations and saves the data product files that are in the filepath: 'prod_init/Products/"year"/' and have the filename structure: '"station code"/"year"/_algfinal_2-60s_6-1000nT_6-100nTs_xyz_filterf_svmvote_chk.csv'. The data within these files is structured with the columns: ['start time', 'end time', 'start B value', 'end B value', 'dB (change in magnetic field strength)', 'dt (change in seconds)', 'dbdt (change in B divided by change in seconds)', 'component', 'station code']. These data product files are created for a set of stations from five magnetometer arrays, the details of which are described in the instruments section below. All available data from these stations from the years of 2009 to 2019 were used and the data products saved in the yearly folders located at the filepath mentioned above. *Note that the terms nighttime magnetic perturbation event (MPE) and geomagnetic disturbance event (GMD) are used interchangeably in this document and in the relative python scripts and data file names. -------------------------------------------------------------------------------------------------- Files contained: /prod_init/ folder: - - - - - - - - - - - - - - - - - - - *Note that the '/init_tools/' folder has an empty '__init__.py' script that is required for the package to be used in the analysis_init.py and mpe_init.py scripts analysis_init.py From each station and each year of data product files in the filepaths 'prod_init/Products/"year"/', analysis_init.py will load each file and create 11 dataframes, one for each year 2009-2019 with the load_products() function. For each yearly dataframe, the following functions are performed: /prod_init/init_tools/mlts.py contains get_mlts() which returns the magnetic local start and end time of each dB/dt interval /prod_init/init_tools/get_onset_delays_withdiff.py contains get_onset_delays_withdiff() which uses the substorm event lists from '/prod_init/Substorm Lists/' and then returns the time delay from the most recent substorm onset for each interval. /prod_init/init_tools/get_smes.py contains get_smes() which uses the SME lists from '/prod_init/SME Lists/' and then returns the SuperMAG Electrojet Index (SME) for the corresponding starting minute of each interval /prod_init/init_tools/get_smrs.py contains get_smrs() which uses the SMR lists from the path '/prod_init/SMR Lists/' and then returns the SuperMAG Ring Current Index (SMR) for the corresponding starting minute of each interval These columns are then added to the main data product dataframe (column titles and descriptions listed above in Methodology section) and saved as 'Products/'+'prod_allyrs_allstns_onstdiff_sme_smr.csv'. The added columns are: 'mlt_st' and 'mlt_et' = local magnetic start and end time 'onset_delay' = time delay from the most recent substorm onset 'sme' = SuperMAG electrojet index during the starting minute of the interval 'smr' = SuperMAG electrojet index during the starting minute of the interval analysis_init_od.py This script is similar to analysis_init.py with a few distinctions. From each station and each year of data product files in the filepaths 'prod_init/Products/"year"/', analysis_init.py will load each file and create 11 dataframes, one for each year 2009-2019 with the load_products() function. For each yearly dataframe, the following functions are performed: /prod_init/init_tools/mlts.py contains get_mlts() which returns the magnetic local start and end time of each dB/dt interval /prod_init/init_tools/get_onset_info.py contains get_onset_info() which uses the substorm event lists from '/prod_init/Substorm Lists/' and then returns the time delay from the most recent substorm onset for each interval, as well as the magnetic local time of the onset delay of the substorm onsets and the latitude and longitude of the substorm onset. These columns are then added to the main data product dataframe (column titles and descriptions listed above in Methodology section) and saved as 'Products/'+'prod_allyrs_allstns_od.csv'. The added columns are: 'mlt_st' and 'mlt_et' = local magnetic start and end time 'onset_delay' = time delay from the most recent substorm onset 'od_mlt' = magnetic local time of substorm onset 'od_lat' = geographic latitude of substorm onset 'od_lon' = geographic longitude of substorm onset analysis_init_ascii.py This file loads the data product .csv files and creates a .txt file of all of the TLA interval data as well as a .txt file of the maximum dB/dt interval of each TLA event. These files are titled "prod_allyrs_allstns.txt" and "TLAEventList_allyrs_allstns.txt", respectively. From each station and each year of data product files in the filepaths 'prod_init/Products/"year"/', analysis_init_ascii.py will load each file and create 11 dataframes, one for each year 2009-2019 with the load_products() function. For each yearly dataframe, the following functions are performed: /prod_init/init_tools/mlts.py contains get_mlts() which returns the magnetic local start and end time of each dB/dt interval These columns are then added to the main data product dataframe (column titles and descriptions listed above in Methodology section) and saved as 'prod_allyrs_allstns.txt'. The added columns are: 'mlt_st' = local magnetic start time This file also has some formatting differences to remove any instances of string characters so that the data columns have only numbers. The component values in the 'comp' columns are: '1' = x '2' = y '3' = z The station locations are in number codes instead of letter codes: 'igl' = 1 'gjo' = 2 'rby' = 3 'cdr' = 4 'pgg' = 5 'rank' = 6 'ykc' = 7 'fcc' = 8 'gill' = 9 'whit' = 10 'atha' = 11 'mea' = 12 The text file lists the start and end times as purely numerical values. The numerical values represent the start and end times in columns 'st' and 'et' as: 'YYYYMMDDHHmmSSfff'. The column 'mlt_st' is represented numerically as 'HHMMSS'. This script also creates the file 'prodmax_allyrs_allstns.txt' that has all of the same formatting but contains only the maximum dB/dt interval information for each event. mpe_init.py This file loads MPE text files from the path '/MPE Derivs GE 6/' (data in Deep Blue https://doi.org/10.7302/275e-da06) for the CDR, RBY, PGG stations for 2015-2019, then performs the following functions: /prod_init/init_tools/mpe_mlts.py contains get_mlts() which returns the magnetic local start and end time of each dB/dt interval for the MPES get_onset_delays_withdiff.py (see above) get_smes() (see above) get_smrs() (see above) These columns are then added to the main data product dataframe and saved as 'Products/'+'mpes_onstdiff_sme_smr.csv'. The columns of this file are: 'st' = start time of the MPE 'station' = station where MPE occurred 'mpe_maxdbdt' = max dB/dt value of the MPE 'mlt_st', 'onset_delay', 'sme', 'smr' = same as described above These three files: 'prod_allyrs_allstns_onstdiff_sme_smr.csv' 'mpes_onstdiff_sme_smr.csv' 'prod_allyrs_allstns_od.csv' are the general data files for which all data analysis is based on. All statistics and subsets of data are created from the data in these two data files. Two other TLA data files were created manually from the main TLA data file: 'prod_allmlat_unrelated.csv' : which consists of all of the 'unrelated' events from 2015-2019 'prod_allmlat_maxes.csv' which consists of the maximum dB/dt of each event from 2015-2019, this file has two additional columns: 'abs_dbdt' = absolute value of the dB/dt for each interval 'storm_phase' = the geomagnetic storm phase in which the event occurred. If the event did not occur in relation to a geomagnetic storm, the value is '-' tla_analysis.py script: - - - - - - - - - - - - - - - - - - - This python script loads the full data product file 'prod_allyrs_allstns_onstdiff_sme_smr.csv' that was created via the analysis_init.py script as well as the file 'prod_allmlat_unrelated.csv' that contains data products for the years of 2015-2019 that are termed unrelated because they occur more than 60 minutes from substorm onset and in the absence of a geomagnetic storm phase. This file contains a subset of events form the file 'prod_allyrs_allstns_onstdiff_sme_smr.csv' and was created manually. Then uses functions from the file '/tools/analysis_tools.py' to create figures for the manuscript. These are described below: *Note that the '/tools/' folder has an empty '__init__.py' script that is required for the package to be used in the analysis_init.py and mpe_init.py scripts sep_fullyrs(prod_allyrs): this function takes the full data set file and separates it in to the stations that have available data for all years from 2009-2019 (see Supporting Information Table S1) and returns the 'prod_fullcycle' dataframe plot_num_vs_year_vsw_smr(prod_fullcycle): function takes the prod_fullcycle dataframe and creates a plot of the number of extreme (>12 nT/s) events per year as a function of the SMR value and the solar wind flow velocity. plot_num_vs_year_multi(prod_fullcycle): function takes the prod_fullcycle dataframe and plots the normalized number of TLA events per year as well as the number of substorm onsets, mean sunspot number and number of extreme TLA events per year sep_fullmlat(prod_allyrs): this function takes the full data set file and separates it in to the stations that have available data for all years from 2015-2019 (see Supporting Information Table S1) and returns the 'prod_allmlat' dataframe sep_mlats(prod_mrge): takes the prod_mrge dataframe that is the prod_allyrs dataframe merged with the prod_unrel dataframe so that there is an indicator of which events are'unrelated' and separates this dataframe into four separate regions of magnetic latitude plot_maxdbdt_nums_vs_mlt(prod_high, prod_midhigh, prod_midlow, prod_low): takes the four mlat region dataframes and creates plots for the maximum dB/dt vs. MLT and the number of events vs. MLT plot_dbdt_vs_sme_smr(prod_high, prod_midhigh, prod_midlow, prod_low): takes the four mlat region dataframes and creates plots for the maxmimum dB/dt vs. SME and maximum dB/dt vs. SMR mpe_plot_v4.py script: - - - - - - - - - - - - - - - - - - - This script loads the MPE data files from 2015-2019 for CDR, RBY, and PGG similarly to the mpe_init.py file but with a slightly different format to include TLA dB/dt information in the dataframe. Then there are two functions for plotting: plot_(prod): plots the max dB/dt of each MPE-related TLA event as a function of the time difference from the maximum dB/dt of the nearest MPE event mpe_tla_compare.py script: - - - - - - - - - - - - - - - - - - - This script loads the TLA max dB/dt file 'prod_allmlat_maxes.csv and the MPE data file 'mpes_onstdiff_sme_smr.csv', then uses the following functions get_mpe_diff(prod, mpe): finds the minimum time difference of each MPE event to the nearest TLA event, the station that the TLA event occurred and the max dB/dt value of the TLA event, these are then added to the MPE dataframe spatial_scales(prod): which takes in the MPE dataframe and plots the number of stations at which an event occurred based on how many stations the TLA event was identified at mpe_rel_histos(): plots the percent of total MPE events that had TLA intervals and did not have TLA intervals as a function of the maximum dB/d range of the events. plot_09302016.py script: - - - - - - - - - - - - - - - - - - - This script uses the following functions to plot magnetic field data and TLA intervals. get_thm_data(station, year, month, day): downloads data from the THEMIS database for the station and data specific as the function arguments, then returns the data as a dataframe get_maccs_data(station, year, month, day): downloads data from the MACCS database for the station and data specific as the function arguments, then returns the data as a dataframe get_prod(prod, station, date): takes the prod_allyears dataframe loaded from the 'prod_allyrs_allstns_onstdiff_sme_smr.csv' file and identifies the specific TLA intervals that correspond to the station and date avg_prod( prod, data, start, duration): calculates the average magnetic field data B value for the interval specified by start and duration and substracts it from the start and end B value of the TLA intervals of the product avg_data(data, start, duration): calculates the average magnetic field data B value for the interval specified by start and duration and subtracts from each data value avg_data2(data, start, duration): performs same function as avg_data but for a 1-sec measurement frequency rather than 1/2 second plot_goesdata_09302016.py script: - - - - - - - - - - - - - - - - - - - This script takes magnetic field data and plots it. The data is downloaded manually from the GOES-13 database (https://satdat.ngdc.noaa.gov/sem/goes/data/full/), the specific data file is included in the repository as filename 'g13_magneto_512ms_20160930_20160930.csv' tla_analysis_v3.py script: - - - - - - - - - - - - - - - - - - - This python script loads the full data product file 'prod_allyrs_allstns_onstdiff_sme_smr.csv' that was created via the analysis_init.py script and performs some functions to create the plots included in the publication plot_daily_rates(prod): creates a plot of the number of TLA events per day from the 'prod' dataframe with the number of substorms per day from the substorm event lists in the '/prod_init/Substorm Lists/' folder mlt_lat_od_pdf(prod): plots histograms of the distribution of events for the magnetic local time at which the max dB/dt of the event occurred and the distribution of events for the magnetic latitude at which the event occurred get_vsw(prod): returns a dataframe of the max dB/dt of each event that includes a column of the solar wind flow speed, Vsw, for the minute of the max dB/dt for each event. The Vsw values are read from the lists located in '/prod_init/Vsw Lists/'. vsw_smr_pdf(): creates histograms of the total SuperMAG Ring current values (SMR) for 2009-2019 with the SMR values at the minute of the max dB/dt of each event, as well as the total Vsw values for 2009-2019 and the Vsw during the max dB/dt of each event plot_od_londiff.py script: - - - - - - - - - - - - - - - - - - - This script loads the TLA max dB/dt file 'prod_allyrs_allstns_od.csv' and the MPE data file 'mpes_onstdiff_sme_smr.csv', then uses the following functions: get_allmlat_od_maxes(): loads the 'prod_allyrs_allstns_od.csv' and returns a subset of events from 2015-2019 with only the max dB/dt of each event od_londiff(prod_max, ms_df, ms_df_ext): creates distributions of the number of events for the time delay from substorm onset and for the difference in longitude from substorm onset for each event with color separation for the GMD-related events /TLA Substorms/ folder: - - - - - - - - - - - - - - - - - - - This folder contains the script 'get_substorms.py' that locates the substorms that have TLA intervals within 30 minutes of the time of substorm onset and saves this list of TLA related substorm onsets in the same folder. These lists have the filenames: 'TLA_substorms-newell-'+year+'0101_000000_to_'+year+'1231_000000.csv' for each year from 2009-2019 -------------------------------------------------------------------------------------------------- Instrument and/or Software specifications: This database used data products created using magnetic field data from five magnetometer arrays. The details of these programs and their instrumentation is listed below. 1) The Magnetometer Array for Cusp and Cleft Studies (MACCS) is a system of magnetometers located in north-east Nunavut, Canada from about 65° to 80° geomagnetic latitude (Engebretson, 1995) MACCS is operated by Augsburg University and the University of Michigan and is funded by the National Science Foundation (NSF). The MACCS stations contain fluxgate magnetometers with axes aligned with the Earth's magnetic field (H: magnetic north-south, D: east-west, Z: vertical with positive direction downward into Earth). The MACCS magnetometers measure the magnetic field at 8 Hz and then average and record the measurements at 2 Hz (half-second cadence). This results in temporal resolution of 0.025 nT and the measurements are accurate to 0.01 nT. MACCS stations used are: IGL, GJO, RBY, PGG, CDR 2) The Canadian Array for Realtime InvestigationS of Magnetic Activity (CARISMA) is a system of ground-based magnetometers located across central Canada (Mann, 2008). CARISMA is operated by the University of Alberta as part of the Canadian Geospace Monitoring Program (CGSM) and is funded by the Canadian Space Agency (CSA). Like MACCS, the CARISMA system consists of fluxgate magnetometers that measure the magnetic field at 8 samples/second. The stations used in this study offer final data products that are averaged to 2 samples/s and rotated from the geographic coordinates they are originally measured in to local geomagnetic coordinates. These magnetometer systems offer 0.025 nT resolution data. CARISMA stations used are: GILL, ATHA 3) The the CANadian Magnetic Observatory System (CANMOS) (Nikitina, 2016) is a ground magnetometer array operated by Natural Resources Canada (NRCan). CANMOS employs fluxgate magnetometers across Canada that sample the magnetic field at 8 Hz, then resamples to 1 Hz after despiking and performing a 9-point rectangular filter. The data from CANMOS is in geographic coordinates: X (geographic north-south), Y (geographic east-west) and Z (vertical). CANMOS station used are: YKC, FCC, MEA 4) The Athabasca University Time History of Events and Macroscale Interactions During Substorms (THEMIS) University of California, Los Angeles (UCLA) Magnetometer Network eXtension (AUTUMNX) (Connors, 2016) is located in the eastern region of Canada. The AUTUMNX instruments are fluxgate magnetometers provided by UCLA that measure the magnetic field with 0.01 nT resolution at 2 samples/second and in local geomagnetic coordinates. AUTUMNX stations used are: SALU, KJPK 5) THEMIS Ground-Based Observatory (GBO) systems (Russell, 2008) are a part of the larger collaboration of stations that contribute magnetic data to the THEMIS Ground Magnetometer (GMAG) cooperative. THEMIS GBO stations are operated by UCLA, contain UCLA instruments as in (4) and thus have the same resolution, measurement frequency and coordinate system as mentioned above. THEMIS GBO stations used are: WHIT -------------------------------------------------------------------------------------------------- References: Connors, M., Schofield, I., Reiter, K., Chi, P. J., Rowe, K. M., & Russell, C. T. (2016). The AUTUMNX magnetometer meridian chain in Québec, Canada. Earth, Planets and Space, 68(1). https://doi.org/10.1186/s40623-015-0354-4 Engebretson, M. J., Hughes, W. J., Alford, J. L., Zesta, E., Cahill, L. J., Arnoldy, R. L., & Reeves, G. D. (1995). Magnetometer array for cusp and cleft studies observations of the spatial extent of broadband ULF magnetic pulsations at cusp/cleft latitudes. Journal of Geophysical Research, 100(A10), 19371. https://doi.org/10.1029/95ja00768 Mann, I. R., Milling, D. K., Rae, I. J., Ozeke, L. G., Kale, A., Kale, Z. C., Murphy, K. R., Parent, A., Usanova, M., Pahud, D. M., Lee, E.-A., Wallis, D. D., Angelopoulos, V., Glassmeier K.-H., Russell, C. T., Auster, H.-U., Singer, H. J. (2008). The upgraded CARISMA magnetometer array in the THEMIS era. Space Science Reviews, 141(1–4), 413–451. https://doi.org/10.1007/s11214-008-9457-6 McCuen, B. A., Moldwin, M. B., Steinmetz, E. S., & Engebretson, M. J. (2023). Automated High-Frequency Geomagnetic Disturbance Classifier: A Machine Learning Approach to Identifying Noise While Retaining High-Frequency Components of the Geomagnetic Field. Journal of Geophysical Research: Space Physics, 128(2). https://doi.org/10.1029/2022JA030842 Nikitina, L., Trichtchenko, L., & Boteler, D. H. (2016). Assessment of extreme values in geomagnetic and geoelectric field variations for Canada. Space Weather, Vol. 14, pp. 481–494. https://doi.org/10.1002/2016SW001386 Russell, C. T., Chi, P. J., Dearborn, D. J., Ge, Y. S., Kuo-Tiong, B., Means, J. D., … Snare, R. C. (2008). THEMIS ground-based magnetometers. Space Science Reviews, 141(1–4), 389–412. https://doi.org/10.1007/s11214-008-9337-0 Curation Notes: On 7 June 2023, the prod_initDepreciated.zip folder was deprecated and a new prod_init.zip folder was added. The deprecated folder has an error in the code of the script "get_onset_delays_withdiff.py" within the /prod_initDepreciated/init_tools/ folder. The new /prod_init.zip folder now includes the script titled "get_onset_info.py" in the /prod_init/init_tools/ folder, as well as the folder /Vsw_lists/ consisting of lists of solar wind flow speed data and two additional python scripts titled 'analysis_init_od.py' and 'analysis_init_ascii.py'. The folder 'TLA Substorms' was also added to include lists of substorms that have associated TLA intervals within 30 minutes. These corrections and additions are prompted by referee reviews to the original manuscript. On June 14, 2023 the movie file titled "thg_asi_mosaic_201609300100kuuj.mpeg" was added. This is a mosaic composition of images from THEMIS all-sky imagers (ASI) at four stations for an hour interval on September 30, 2016. On June 24, 2023 the file "TLAEventList_allyrs_allstns.txt" was added. The file "prod_allyrs_allstns.txt" was deprecated and a new version with the same name was added as there was an error with the MLT values of the previous version.