Work Description

Title: Automated High-frequency Geomagnetic Disturbance Classifier: Data Open Access Deposited

h
Attribute Value
Methodology
Description
  • The data were used to study the high-frequency geomagnetic disturbances within the magnetic field data. Included in this repository are the python scripts that perform an identification and classification of high-frequency signals within the magnetometer data that is downloaded from the databases listed in the Methodology section. All analysis and plots were created using subsequent Python libraries. The machine learning study implemented libraries from the sci-kit learn software. All of the specific methodology can be accessed in the readme.txt script.
Creator
Depositor
  • bmccuen@umich.edu
Contact information
Discipline
Funding agency
  • National Aeronautics and Space Administration (NASA)
  • National Science Foundation (NSF)
ORSP grant number
  • 2013433, 1848724, 80NSSC20K1779
Keyword
Resource type
Curation notes
  • The script "HF_flagger.py" was added on July 7, 2022 and supersedes the masterflagfinal.py script. All of the elements in this are the same as in masterflagfinal.py however there is one added line of code in line 186 that is added to drop duplicates in the final returned list of hour windows that contain highly frequency events. The "readme" file was also updated to include information about the "HF_flagger.py" script.

  • The readme file was updated on July 18th to better explain the organization of the data set.

  • The following changes were made on October 10, 2022. An updated script "HF-flagger_final.py" was added to correct previously unknown errors. The previous "HF-flagger.py" script was renamed "HF-flagger_SUPERSEDED.py". In addition, the zipped file "2017 Products.zip" was added as a means of sharing a sample of usable artifacts generated from the research (as requested by reviewers of the associated article). The readme file was updated accordingly and renamed "readme_final.txt".
Last modified
  • 11/29/2022
Published
  • 07/01/2022
Language
DOI
  • https://doi.org/10.7302/78zf-yw59
License
To Cite this Work:
McCuen, B. A. (2022). Automated High-frequency Geomagnetic Disturbance Classifier: Data [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/78zf-yw59

Relationships

This work is not a member of any user collections.

Files (Count: 9; Size: 62.4 MB)

Date: October 10, 2022

Dataset Title:
Automated High-frequency Geomagnetic Disturbance Classifier: Data

Dataset Creators:
McCuen, Brett A.

Dataset Contact:
bmccuen@umich.edu

Funding:
National Science Foundation (NSF)
National Aeronautics and Space Administration (NASA)

Abstract:
We present an automated method to identify high-frequency geomagnetic disturbances
in ground magnetometer data and classify the events by the source of the perturbations.
We developed an algorithm to search for and identify changes in the surface magnetic
field, dB/dt, with user-specified amplitude and timescale. We used this algorithm to
identify transient-large-amplitude dB/dt that have timescale less than 60 seconds and am-
plitude > 6 nT/s. Because these magnetic variations have similar amplitude and time
characteristics to instrumental or man-made noise, the algorithm identified a large
number of noise-type signatures as well as geophysical signatures. We manually classified
these events by their sources (noise-type or geophysical) and statistically characterized
each type of event; the insights gained were used to more specifically define a
transient-large-amplitude geophysical event and greatly reduce the number of noise-type
dB/dts identified. Next, we implemented a support vector machine classification algorithm
to classify the remaining events in order to further reduce the number of noise-type dB/dt
in the final data set. We examine the performance of our complete dB/dt search algorithm
in widely-used magnetometer databases and the effect of a common data processing technique
on the results. The automated algorithm is a new technique to identify geomagnetic
disturbances and instrumental or man-made noise, enabling systematic identification
and analysis of space weather related dB/dt events.

Methodology and Files Contained:

---------------------------------------------------------------------------------------------------
DATA ACQUISITION-

***NOTE: The following .py files are used to obtain and format magnetic field data from four databases
used in this project. The links for these data are current on 07/01/2022. General information on
these databases are listed below fur future reference, followed by the explanations of the .py scripts
used to perfom the data acuisition.

-MACCS: Magnetometer Array for Cusp and Cleft Studies (MACCS) - an array of magnetometers in Arctic
Canada started by Boston University and Augsburg College with assistance from the University of Alberta
and the Geological Survey of Canada, and supported by the National Science Foundation's Magnetospheric
Physics Program.

-AUTUMNX/THEMIS: The AUTUMNX magnetometer meridian chain is an array that consists of 10
THEMIS-class ground-based magnetometers. Data from AUTUMNX is provided to the THEMIS ground-based
observatory (GBO) system and distributed by the THEMIS GBO. THEMIS ground-based observatories provide
observations of the aurora and corresponding magnetic field perturbations over northern North America.
is comprised of a global network of observatories, monitoring the Earth's magnetic field.

-CANMOS/Intermagnet: Data from an observatory of the CANadian Magnetic Observatory System (CANMOS) are
provided to the Intermagnet network to be processed and distributed. Intermagnet is comprised of a
global network of observatories, monitoring the Earth's magnetic field.

-SuperMag: SuperMAG is a worldwide collaboration of organizations and national agencies that
currently operate more than 300 ground based magnetometers. The data set provided by the ground
magnetometer community is truly unique since it provides a nearly global and continuous measurement
of a fundamental parameter - the ground level magnetic field perturbations.

get_data_iaga2002.py
Magnetic field data used for original dB/dt search are downloaded from the MACCS server
via the get_data_iaga2002.py. Running this file will download each daily data file from the
web address beginning with 'http://space.augsburg.edu/maccs/IAGA2002/' into the
'save_path' specified for the year and station specified in the 'year' and 'station' variables
within the script. This will also remove any rows of data that have B values greater than
80,000 nT as these are sections of data marked as unusable. If there are missing days of data
they will be downloaded as empty .csv files and are then deleted as the final task.

concat_iaga2002.py
The individual daily files downloaded via the get_data_iaga2002.py into the save_path specified
will be concatenated into one yearly data file with this script. The station, year and
save_path must be specified and the yearly file will be saved in the save_path directory as well.
Note this script will append the data from all files with .txt extention that are in the
'save_path' directory specified in the script, so this directory should contain only data files.

get_data_thm.py
This script will download daily data .cdf files from the THEMIS ground based observatory database
from the web address beginning with 'http://themis.ssl.berkeley.edu/data/themis/thg/l2/mag/' to
the 'save_path' specified and deletes empty .cdf files. The 'station', 'year' and 'save_path'
strings must be specified.

concat_thm.py
Like concat_iaga2002.py, running this script will create a file with all of the data from the
.cdf files in the 'save_path' directory specified and saves to the 'save_path' directory as a
.csv file.

intermag_concat_iaga2002.py
Data from https://www.intermagnet.org/data-donnee/download-eng.php must be downloaded manually
as monthly files in the 'save_path' directory. This script will then append the data in the
.gz files into one yearly .csv file and save to the 'save_path' string specified.

---------------------------------------------------------------------------------------------------
INITIAL DB/DT STUDY-

dbdt_search.py
The 'dbdts' function is loaded from the dbdt_search.py script and requires the arguments:
[date, time, B, comp, dt_min, dt_max, db_min, db_max, dbdt_min, dbdt_max, fs]
where the 'date' and 'time' arguments are single column dataframes of any length, B is a single
column dataframe of magnetic field strength values of the same length as 'date' and 'time'.
The 'dt_min', 'dt_max' values are the specified requirements for the timescale of the dB/dt event,
the 'db_min and 'db_max' are the values for the minimum and maximum dB value of the event, and
'dbdt_min' and 'dbdt_max' are the minimum and maximum values for the dB/dt of each signature. The
fs is the measurement frequency of the data. This function performs a
search and identification for dB/dt intervals in the data where the required parameters (specified
by the arguments) are met. The returned data product is a dataframe with the columns:
['start time', 'end time', 'start B value', 'end B value', 'dB (change in magnetic
field strength)', 'dt (change in seconds)', 'dbdt (change in B divided by change in
seconds)', 'component']

***NOTE: this dbdt_search.py contains the base function to search for and identify dB/dt
intervals in the data with the specified arguments. This file is repeated in other
folders throughout this repository as it is used in the master.py, mastersearch.py and
mastersearchfinal.py (explanations for the latter two scripts are detailed in later in this
readme.txt file.

master.py
The script requires that the dbdt_search folder with the dbdt_search function is in the same
directory as the master.py script. Running this will perform the dbdt_search on the year of data.

Data can be loaded from Intermag, THEMIS, MACCS from the directory specified by 'data_path'.
And for a single day of SuperMAG data downloaded manually from https://supermag.jhuapl.edu/
at the directory specified by 'daypath'.
The resulting files with the dB/dt data are saved in the 'product_path' directory specified
within the script. Filenames are:
station+'2015_raw_1-60s_6-10000nT_6-1000nTs_xyz.csv'
where station is a string specified in the script

manual separation of events-
The events were manually separated and placed in either files:
station+'2015REAL_algfinal_1-60s_6-10000nT_6-1000nTs_xyz.csv' for TLA events
station+'2015NOISE_algfinal_1-60s_6-10000nT_6-1000nTs_xyz_prefilter.csv' for noise events
Then noise types were assigned and added to the noise files manually, saved with '_wna' to
signify 'with noise assignments':
station+'2015NOISE_algfinal_1-60s_6-10000nT_6-1000nTs_xyz_prefilter_wna.csv'
where station is a string specified in the script

NOTE: the individual station files are not included in the data repository as they are
redundant to the .csv files that contain all of the dB/dt data from all six stations used in
the Initial dB/dt Study (see analysis_prefilter.py below):
'allst2015REAL_algfinal_1-60s_6-10000nT_6-1000nTs_xyz.csv'
'allst2015NOISE_algfinal_1-60s_6-10000nT_6-1000nTs_xyz.csv'
'allst2015NOISE_algfinal_1-60s_6-10000nT_6-1000nTs_xyz_wna.csv'

***
After running the master.py routine on the 2015 MACCS data, manually assigning the noise-types
and saving the data product files for 'real' i.e. geophysical events and noise-type events for
each station, the following analysis script creates data product files with the real and
noise-type events with all 6 stations. These 'all station' files are included in the data
repository while the individual station data product files are not included due to the
large number of files.
***

analysis_prefilter.py
The script analysis_prefilter.py uses functions from the toolsfinal.py file in
toolsfinal folder to:
- create and save files with dB/dts from all stations:
'allst2015REAL_algfinal_1-60s_6-10000nT_6-1000nTs_xyz.csv'
'allst2015NOISE_algfinal_1-60s_6-10000nT_6-1000nTs_xyz.csv'
'allst2015NOISE_algfinal_1-60s_6-10000nT_6-1000nTs_xyz_wna.csv'
- create and save files with dB/dts from all stations and ordered by start time:
'allst2015REAL_algfinal_1-60s_6-10000nT_6-1000nTs_xyz_prefilter_ordered.csv'
'allst2015NOISE_algfinal_1-60s_6-10000nT_6-1000nTs_xyz_prefilter_ordered.csv'
- print total number of dB/dts, number of hour event windows and average dB/dt per event
for each station. The total number of dB/dts and hour-windows pre-filter are
listed in the table in Pre_Post_Filter_Nums.csv
- create histograms of distribution of dB, dt and dB/dt for the 2015 event list

plotevent.py
The plots created for the research paper are created with the plotevent.py file. This file
uses a single day of MACCS data downloaded manually into the 'read_path' directory specified
in the script. The 'month', 'day', 'year', and 'station' variable strings must be specified
within the script. The 'start' and 'howlong' variables determine the starting point and how
many data points to use in the dbdt_search and the plotting.
These daily data files are manually downloaded from the MACCS website and have the filename
structure listed below. Also these plotting files require the dbdt_search function within
dbdt_search.py script be placed in the same directory as the scipt to plot.
MACCS website data download service: http://space.augsburg.edu/maccs/requestdatafile.jsp
filename structure: cdr20151110v_l1_half_sec.sec.txt

numevents_opt.py
This file loads the yearly data files specified in the 'data_path' variable and find the
number of dB/dt signatures in each hour of data for the entire year using the main dbdt_search
function in the dbdt_search directory. The dbdt_search algorithm finds the number of dB/dt
signatures in each hour of data with varying characteristics ( [a] all dbdts with
second-timescale, [b] dbdts with second timescale and dbdt > 6 nT/s, and [c] dbdts with
second-timescale and dbdt > 6 nT/s AND dB > 60 nT ), as well as the ratios of [b]:[a]
and [c]:[a]. A list of these values for each hour is saved in the directory specified by
'product_path' with filenames formatted as follows:
- station+year+'_alleventnums_60minwindow_allcomp_final.csv'

eventnum_info.py
This script loads the 'event nums' files saved by the numeventsopt.py script for each station
with the load_eventnums() function. This function takes a string with the station code as the
singular argument st and loads the '_alleventnums_60minwindow_allcomp_final.csv' files.
then separates the hours with TLA events from the hours with noise-type events (these indices
were identified from the files manually). The statistics of the hour windows for each of
the stations are then printed when running the script. These values are compiled in the table in
Ratios.csv.

---------------------------------------------------------------------------------------------------
FILTER STUDY

dbdt_search.py
The 'dbdts' function is loaded from the dbdt_search.py script as detailed in the INITIAL DBDT
STUDY section and used in the following mastersearch.py script.

mastersearch.py
This script loads the yearly data specified by 'data_path' for Intermag, MACCS and THEMIS data
and 'daypath' for a single day of data from SuperMAG. Each of these data types are loaded with
their own respective formatting so that the data are in a dataframe with columns 'date',
'time', 'bx', 'by', 'bz'. The get_product function has the arguments of the (date, time,
B1, B1_comp, B2, B2_comp, B3, B3_comp, fs) where 'date' and 'time' are single-column dataframes
of date and time values; B1, B2 and B3 are single-column dataframes of magnetic field strength
values; B1_comp, B2_comp and B3_comp are strings to signify the component names of B1, B2 and
B3 respectively; and fs is the measurment frequency of the magnetic field data. This function
performs the dbdt search with the filters discussed in the research paper and saves the data
product in the directory specified as 'product_path' and with the 'station' and 'year' strings
specified at the beginning of the script.
Running this script with the yearly data created data product files for each station with the
file ending format:
'_algfinal_2-60s_6-1000nT_6-100nTs_xyz_filterf.csv'

analysis_postfilter.py
This script will load the post-filtered data files using the 'loadpfprod_split' function from the
toolsfinal.py script for the 2015 dB/dt data for MACCS stations as well as the INUK station
(THEMIS) and the IQA (Intermagnet) stations. This file will print statistics and characteristics
of the events with totals_and_hours function, also create histograms of the dB/dt distributions
with the 'multihisto_' function from the toolsfinal.py file, and create single-day plots with the
'plotsingleday' function which requires that the single day data files from MACCS be saved in the
'read_path' directory specified within the 'plotsingleday' function. Some of these features require
the section of code be un-commented in order to compile. The totals and event-windows post-filter
are listed with the pre-filter stats in the table in Pre_Post_Filter_Nums.csv

---------------------------------------------------------------------------------------------------
MACHINE LEARNING STUDY

getfeatures_all.py
This script will load the postfilter dB/dt data files for each station for the year of 2015 with the
following filename formats:
'CDR2015_algfinal_2-60s_6-1000nT_6-100nTs_xyz_filterf.csv'
from the FILTER STUDY folder
then create a formatted training set of features for tuning each machine learning model. The indices
of the TLA dB/dt signatures were manually identified and are specified in this script for labels for
the machine learning models. The main 'getfeatures_all' function has two arguments: the
'include_lat' argument which, if True, includes the geomagnetic latitude of each station as a
fraction of 90 to the training set and the 'norm' argument which, if True, takes the dB, dt, and
dB/dt values and min-max scales them individually by each station and otherwise leaves the original
values in the training set. Each station dB/dt set is formatted individually and then
concatenated together and returned for the training set. The training set features are:
db, dt, dbdt, comp, lat, doy, day frac. Where comp is either a value of either 0, 0.5, or 1
depending on the component of the dB/dt signature, lat is the geomagnetic latitude as a fraction
of 90, doy is the day of the year as a fraction of 365 and day frac is the minute of the day as
a fraction of 86400.

gettestfeatures_all.py
This script does the same formatting as getfeatures_all.py but for the stations throughout 2016
with file formats:
'CDR2016_algfinal_2-60s_6-1000nT_6-100nTs_xyz_filterf.csv'
from the FILTER STUDY folder

gpc_tuning.py
svm_tuning.py
dectree_tuning.py
rf_tuning.py
These scripts use the getfeatures_all function from the getfeatures_all.py script to load the
training set and then use this training set to tune each of the respective machine learning models:
Gaussian process classifier (GPC), support vector machine (SVM), decision tree, and random forest.
All machine learning models used are from the sci-kit-learn library.
More on the cross-validation techniques and the specific number of folds/fits to each model can
be found in the Supplemental Information file.

gpc_fit_test.py
svm_fit_test.py
dectree_fit_test.py
rf_fit_test.py
These scripts use the getfeatures_all function from the getfeatures_all.py script to load the
training set. Then the respective models are fit to the training set with the optimal
hyper-parameters that were identified via the tuning process. The models can be saved as a .pkl
file (may need to uncomment the 'joblib.dump' line in the script). Then the 'gettestfeatures_all'
function from the gettestfeatures_all.py script loads the dB/dt data from the same stations that
were identified with the filtered dB/dt search (and the TLA dB/dts manually identified) but for
the year of 2016. The test set is formatted the same way as the training set. Then the trained
model is used to predict the test set features and the test scores and confusion matrix printed.

svm_model_final_minmaxscl.pkl
The SVM model was saved in this .pkl file upon running the the svm_fit_test.py file after
determining that the SVM model had the highest test scores (also from the svm_fit_test.py
file)

convert_to_features.py
This script contains the 'convert_to_features' function that takes the arguments df: a dataframe
containing the dB/dt signature information that is output from the get_product function, year:
numerical value of the year of corresponding product data, norm: if True, normalizes data via
min-max scaler and if False, does not normalize and lat: a numerical value of the geomagnetic
latitude of the station from which the data were recorded as a fraction of 90. This function
reads in the dB/dt data and creates a formatted dataframe of feature data that can be used
for SVM classification.

dbdt_search.py
The 'dbdts' function is loaded from the dbdt_search.py script as detailed in the INITIAL DBDT
STUDY section and used in the following mastersearchfinal.py script.

mastersearchfinal.py
This script has the same 'get_product' function as used previously in the mastersearch.py script,
however this script has a last step which is the 'svm_vote' function. This function takes the
dB/dt data returned from the 'get_product' function, converts these to a feature set with the
'convert_to_features' function from the convert_to_features.py script within the loadmldata
directory, then performs the SVM classification. Then these predictions are grouped if the
dB/dt signatures occur within one hour of another and the majority prediction of these groups is
the final prediction result for the dB/dts within the hour. The final TLA dB/dt set and the
list of predictions are saved in the directory specified by 'product_path'.
Running this script will save the data products in .csv files with the file ending formatted:
'_algfinal_2-60s_6-1000nT_6-100nTs_xyz_filterf_svmvote.csv'
as well as .csv files with the list of SVM predictions:
'_filterf_ypred.csv'

svm_analysis.py
This script will load the 'station+'2016_filterf_ypred.csv'' files from the 'product_path'
specified and compare the results against the 'y_test' list that is created with the indices of
TLA events that were identified manually (the 'station' variable and the station specified in
line 39 must be adjusted to which station is being analyzed). The test scores and confusion matrix
will be printed in the command window. The mislabeled events for the particular station are also
separated out and saved in a separate .csv file in the path specified by product_path.
This script also uses the loadpfprod_split function to load 2016 dB/dt data from each station and
print the totals and hours via the totals_and_hours function as is used in analysis_prefilter.py'
and analysis_postfilter.py. The final results of the SVM classification are compiled in the\
table in ML_Results.csv

/2016mislabeled/:
This folder contains the .csv files with the filename formats
'CDR2016_mislabeled_samples.csv'
that were created and saved from the svm_analysis.py script. The contents of these files are
the dB/dt product information (as described in the dbdt_search.py section above) that were
incorrectly labeled by the SVM model. It was important to save these files to analyze what
characteristics are observed in the mislabeled events and understand where the model fails
to accurately predict these signatures as noise-type or geophysical.

masterflagfinal.py
This script has the 'get_flags' function which takes the same arguments as the 'get_product'
function but also has the 'yr' and 'gm_lat' arguments which are numerical values for the
year of data and the geomagnetic latitude as a function of 90, respectively. This script will
load a yearly data file and the get_flags function uses the same principles as the
mastersearchfinal.py, however rather than returning a list of the dB/dt signatures that meet
all of the criteria, this functions creates a file with dates and times of the hour windows
of data that contain high frequency signals and the SVM majority vote classification of the
hour window.
Running this script creates a .csv file with the hour window date and time and the algorithm
classification with the file ending formatted as follows:
'_HFflags.csv'

HF_flagger.py
This script is the final version of the previous masterflagfinal.py script. All of the elements
in this are the same as in masterflagfinal.py however there is one added line of code in line 186
that is added to drop duplicates in the final returned list of hour windows that contain highi
frequency events. (Added to Deep Blue Repository 07/07/2022)

HF_flagger_final.py
This script is the *final* version of the previous HF_flagger.py script. There was a small issue
with the HF_flagger.py script in that the SVM classifications were performed within each hour
and thus the dB/dt interval statistics were not normalized based on the entire year of data prior
to making the classifications. This new script has corrected this issue so that all of the
classifications are consistent with those returned from the mastersearchfinal.py script.
(Added to Deep Blue Repository 10/10/2022)

---------------------------------------------------------------------------------------------------

DATA PROCESSING COMPARISON STUDY

***NOTE: this section uses repeated files from previous sections to analyze the effect of the
filtered db/dt search on raw unprocessed data and on processed data from the SuperMAG database.

dbdt_search.py
The 'dbdts' function is loaded from the dbdt_search.py script as detailed in the INITIAL DBDT
STUDY section and used in the following mastersearch.py script.

mastersearch.py-
The filtered dbdt_search was run on two days of SuperMAG data (11/10 and 06/20) to obtain data
product files with names:
'PGG2015_algfinal_2-60s_6-1000nT_6-100nTs_xyz_0620_supermag.csv'
'PGG2015_algfinal_2-60s_6-1000nT_6-100nTs_xyz_1110_supermag.csv'

mastersearch.py run on two days of MACCS data (11/10 and 06/20) to obtain data product files with
names:
'PGG2015_algfinal_2-60s_6-1000nT_6-100nTs_xyz_0620_maccs.csv'
'PGG2015_algfinal_2-60s_6-1000nT_6-100nTs_xyz_1110_maccs.csv'

plotevent_supermag.py
This script will create plots of an event with 1-second SuperMAG data downloaded manually into
the read_path specified in the script. The filename within line 14 must be specified. The
'start' and 'howlong' variables determine the starting point and how many data points to
use in the dbdt_search and the plotting.

processing_compare.py
This script will load the files above from 'read_path' specified and print statistics on the
comparison of the mastersearch.py results with unprocessed and processed data.

---------------------------------------------------------------------------------------------------

2017Products.zip

The 2017Products.zip folder contains data products for the MACCS stations for 2017 data. These data
products are the saved ouputs from the HF_flagger_final.py script that performs the filtered dB/dt
search and the SVM classification and saves both the dB/dt interval information as
station+year+'_algfinal_2-60s_6-1000nT_6-100nTs_xyz_filterf_svmvote.csv as well as the list of
hour event windows and their high-frequency flag value (0 for noise-type, 1 for TLA type) as
station+year+'_HFflags.csv'. We include these as usable research artifacts.
(Added to Deep Blue Repository 07/07/2022)

--------------------------------------------------------------------------------------------------

Instrument and/or Software specifications:

The magnetometers used in this study at the MACCS and the IQA Intermagnet stations are
Narod ringcore fluxgate magnetometers designed and supplied by Dr. Barry Narod of
Narod Geophysics, Ltd., Vancouver, B.C., Canada. The THEMIS ground-based
observatories are equipped with fluxgate magnetometer systems provided by UCLA
and based on the design for the earlier Sino Magnetic Array at Low Latitudes (SMALL)
terrestrial vector fluxgate magnetometers.
The Narod magnetometers collect 8 samples per second in three axes, then average and record
the data at two samples per second for MACCS data and one sample per second for the Intermagnet
magnetometers \cite{St-louis2014}. The THEMIS magnetometers record the magnetic field at 2 Hz.
All of these magnetometer systems have data resolution of 0.01 nT and timing accuracy of at
least 1 ms. The high-resolution, sampling rate and timing accuracy are sufficient to detect
short-timescale Pi 1-2 pulsations. The magnetometer data from all three arrays have been rotated
so that each measurement direction is in geomagnetic coordinates: X (north-south), Y (east-west)
and Z (vertical).

Download All Files (To download individual files, select them in the “Files” panel above)

Best for data sets < 3 GB. Downloads all files plus metadata into a zip file.



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.