Index Catalog // Deep Blue Data

Data and Analysis for Global Probabilistic Geomagnetic Perturbation Forecasting Using the Data-Driven Model GeoDGP

Creator:: Chen, Hongfan, Chen, Yang, Huang, Zhenguang, Zou, Shasha, Huan, Xun, and Toth, Gabor
Description:: Accurately predicting the horizontal component of the ground magnetic field perturbation (dBH), which can be used to calculate the Geomagnetically Induced Currents (GICs), is crucial for estimating the space weather impact of geomagnetic disturbances. In this work, we develop a new data-driven model GeoDGP using deep Gaussian process (DGP), which is a Bayesian non-parametric approach. The model provides global probabilistic forecasts of dBH at 1-minute time cadence and with arbitrary spatial resolutions. We evaluate the model comprehensively on a wide range of geomagnetic storms, including the 2024 Gannon extreme storm. The results show that GeoDGP significantly outperforms the state-of-the-art physics-based first-principles Space Weather Modeling Framework (SWMF) Michigan Geospace model and the data-driven DAGGER model.
Keyword:: Space Weather, Uncertainty Quantification, Machine Learning, and Bayesian Inference
Citation to related publication:: Chen, H., et al. (2024). GeoDGP: One-Hour Ahead Global Probabilistic Geomagnetic Perturbation Forecasting using Deep Gaussian Process.
Discipline:: Science and Engineering

Machine Learning Dataset to Support Paper "A comparison of machine learning classifiers in predicting safety for a multi-component dynamic system representation of an autonomous vessel"

Creator:: Sulkowski, Brendan and Collette, Matthew
Description:: This data set supports the published four-component integration problem using real-world weather forecasts from the European Centre for Medium-Range Weather Forecast and a simulated linear spring--mass--damper system excited by wave elevation. Each component in the spring--mass--damper system is monitored with techniques of differing accuracies representative of marine-type health uncertainties. Weather forecast uncertainty is included using weather predictions of significant wave height and peak period up to 10 days out. As well as their exact values, different test cases include the spring, mass, and damper being modeled as noisy sensors representative of sensors onboard a vessel, as well as the spring being modeled as a visually-inspected system component reflective of human impact onboard a vessel. Complete details are given in the referenced paper; this data set represents the inputs to the machine learning classifiers discussed.
Keyword:: Machine Learning, Inspection, Marine system, and Weather forecast
Citation to related publication:: Sulkowski, B and M. Collette. (2025) A comparison of machine learning classifiers in predicting safety for a multi-component dynamic system representation of an autonomous vessel. Applied Ocean Research, 154 (104368), https://doi.org/10.1016/j.apor.2024.104368
Discipline:: Engineering

ASVSpoof Laundered Database

Creator:: Ali, Hashim, Subramani, Surya , Sudhir, Shefali , Varahamurthy, Raksha , and Malik, Hafiz
Description:: Voice-cloning (VC) systems have seen an exceptional increase in the realism of synthesized speech in recent years. The high quality of synthesized speech and the availability of low-cost VC services have given rise to many potential abuses of this technology such as online smearing campaigns and dissemination of fabricated information etc. A number of detection methodologies have been proposed over the years that can detect voice spoofs with reasonably good accuracy. However, these methodologies are mostly evaluated on clean audio databases, such as Asvspoof 2019. This research aims to evaluate state-of-the-art (SOTA) Audio Spoof Detection approaches in the presence of laundering attacks. In that regard, a new laundering attack database, called ASVspoof Laundering Database, is created. This database is based on the ASVspoof 2019 LA eval database comprising a total of 1388.22 hours of audio recordings. Seven SOTA audio spoof detection approaches are evaluated on this laundered database. The results indicate that SOTA systems perform poorly in the presence of aggressive laundering attacks, especially reverberation and additive noise attacks. This suggests the need for robust audio spoof detection.
Keyword:: Audio Forensics, Audio Antispoofing, Audio Deepfakes, ASVSpoof, and Machine Learning
Discipline:: Engineering

Classification of complex local environments in systems of particle shapes through shape-symmetry encoded data augmentation

Creator:: Lee, Shih Kuang, Tsai, Sun Ting, and Glotzer, Sharon C.
Description:: The trajectory data and codes were generated for our work "Classification of complex local environments in systems of particle shapes through shape-symmetry encoded data augmentation" (amidst peer review process). The data sets contain trajectory data in GSD file format for 7 test systems, including cubic structures, two-dimensional and three-dimensional patchy particle shape systems, hexagonal bipyramids with two aspect ratios, and truncated shapes with two degrees of truncation. Besides, the corresponding Python code and Jupyter notebook used to perform data augmentation, MLP classifier training, and MLP classifier testing are included.
Keyword:: Machine Learning, Colloids Self-Assembly, Crystallization, and Order Parameter
Citation to related publication:: https://doi.org/10.48550/arXiv.2312.11822
Discipline:: Other, Science, and Engineering

Atari Games Dataset

Creator:: Brian, Chen
Description:: The procedure followed while creating this data is summarized in Section II of Chen, Brian, et al. "Behavioral cloning in atari games using a combined variational autoencoder and predictor model." 2021 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2021. This data is not a result of a research but an intermediate product that is used in research. This dataset is generated to train a behavioral cloning framework from gameplay screen captures and keystrokes of an "expert" player. The RL agent that is trained using "RL Baselines Zoo package" acts as the "expert" player, whose decision making process we desire to learn. In addition to behavioral cloning experiments, this dataset is further used to demonstrate the efficacy of a novel incremental tensor decomposition algorithm on image-based data streams.
Keyword:: Imitation Learning, Behavioral Cloning, Reinforcement Learning, Machine Learning, and Gameplay Data
Citation to related publication:: Chen, Brian, et al. "Behavioral cloning in atari games using a combined variational autoencoder and predictor model." 2021 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2021., Aksoy, Doruk, et al. "An Incremental Tensor Train Decomposition Algorithm." arXiv preprint arXiv:2211.12487 (2022)., and Chen, Brian, et al. "Low-Rank Tensor-Network Encodings for Video-to-Action Behavioral Cloning", forthcoming
Discipline:: Engineering and Science

Simple Physics CAM6 Codebase for Training Machine Learning Algorithms

Creator:: Limon, Garrett C.
Description:: The work guides the processing of CAM6 data for use in machine learning applications. We also provide workflow scripts for training both random forests and neural networks to emulate physic s schemes from the data, as well as analysis scripts written in both Python and NCL in order to process our results.
Keyword:: Machine Learning, Climate Modeling, and Physics Emulation
Citation to related publication:: Limon, G. C., Jablonowski, C. (2022) Probing the Skill of Random Forest Emulators for Physical Parameterizations via a Hierarchy of Simple CAM6 Configurations [Pre Print]. ESSOAr. https://10.1002/essoar.10512353.1
Discipline:: Engineering and Science

Resources for Training Machine Learning Algorithms Using CAM6 Simple Physics Packages

User Collection

Creator:: Limon, Garrett
Description:: The collection contains the code and the data used to train machine learning algorithms to emulate simplified physical parameterizations within the Community Atmosphere Model (CAM6). CAM6 is the atmospheric general circulation model (GCM) within the Community Earth System Model (CESM) framework, developed by the National Center for Atmospheric Research (NCAR). GCMs are made up of a dynamical core, responsible for the geophysical fluid flow calculations, and physical parameterization schemes, which estimate various unresolved processes. Simple physics schemes were used to train both random forests and neural networks in the interest of exploring the feasibility of machine learning techniques being used in conjunction with the dynamical core for improved efficiency of future climate and weather models. The results of the research show that various physical forcing tendencies and precipitation rates can be effectively emulated by the machine learning models.
Keyword:: Machine Learning, Climate Modeling, and Physics Emulators
Discipline:: Science and Engineering

2Works

Simple Physics CAM6 Dataset for Training Machine Learning Algorithms

Creator:: Limon, Garrett C.
Description:: The data represents weekly output from three 60-year CAM6 model runs. The output includes state (.h0. files) and tendency (.h1. files) fields for three difference model configurations of increasing complexity. State fields include temperature, surface pressure, specific humidity, among others; while tendencies include temperature tendencies, specific humidity tendencies, as well as precipitation rates. Using the state variables at a given time step, machine learning techniques can be trained to predict the following tendency field, which can then be applied to the state variables to provide the state at the next physics time step of the model.
Keyword:: Machine Learning, Climate Modeling, and Physics Emulation
Citation to related publication:: Limon, G. C., Jablonowski, C. (2022) Probing the Skill of Random Forest Emulators for Physical Parameterizations via a Hierarchy of Simple CAM6 Configurations [Preprint]. ESSOAr. https://10.1002/essoar.10512353.1
Discipline:: Engineering and Science

Data for Solar Flare Intensity Prediction with Machine Learning Models

Creator:: Jiao, Zhenbang, Chen, Yang, and Manchester, Ward
Description:: GOES_flare_list: contains a list of more than 12,013 flare events. The list has 6 columns, flare classification, active region number, date, start time end time, emission peak time. SHARP_data.hdf5 files contain time series of 20 physical variables derived from the SDO/HMI SHARP data files. These data are saved at a 12 minute cadence and are used to train the LSTM model.
Keyword:: Solar Flare Prediction and Machine Learning
Citation to related publication:: Jiao, Z., Sun, H., Wang, X., Manchester, W., Gombosi, T., Hero, A., & Chen, Y. (2020). Solar Flare Intensity Prediction With Machine Learning Models. Space Weather, 18(7), e2020SW002440. https://doi.org/10.1029/2020SW002440 and Chen, Y., & Manchester, W. (2019). Data and Data products for machine learning applied to solar flares [Data set], University of Michigan - Deep Blue. https://doi.org/10.7302/qnsq-cs38
Discipline:: Engineering and Science

Large Lake Statistical Water Balance Model - 12 month time window - 1980 through 2015 monthly summary data and model output

Creator:: Smith, Joeseph P., Gronewold, Andrew D., Read, Laura, Crooks, James L., School for Environment and Sustainability, University of Michigan, Department of Civil and Environmental Engineering, University of Michigan, and Cooperative Institute for Great Lakes Research, University of Michigan
Description:: Using the statistical programming package R ( https://cran.r-project.org/), and JAGS (Just Another Gibbs Sampler, http://mcmc-jags.sourceforge.net/), we processed multiple estimates of the Laurentian Great Lakes water balance components -- over-lake precipitation, evaporation, lateral tributary runoff, connecting channel flows, and diversions -- feeding them into prior distributions (using data from 1950 through 1979), and likelihood functions. The Bayesian Network is coded in the BUGS language. Water balance computations assume that monthly change in storage for a given lake is the difference between beginning of month water levels surrounding each month. For example, the change in storage for June 2015 is the difference between the beginning of month water level for July 2015 and that for June 2015., More details on the model can be found in the following summary report for the International Watersheds Initiative of the International Joint Commission, where the model was used to generate a new water balance historical record from 1950 through 2015: https://www.glerl.noaa.gov/pubs/fulltext/2018/20180021.pdf. Large Lake Statistical Water Balance Model (L2SWBM): https://www.glerl.noaa.gov/data/WaterBalanceModel/, and This data set has a shorter timespan to accommodate a prior which uses data not used in the likelihood functions.
Keyword:: Water, Balance, Great Lakes, Laurentian, Machine Learning, Machine, Learning, Lakes, Bayesian, and Network
Citation to related publication:: Smith, J., Gronewald, A. et al. Summary Report: Development of the Large Lake Statistical Water Balance Model for Constructing a New Historical Record of the Great Lakes Water Balance. Submitted to: The International Watersheds Initiative of the International Joint Commission. Accessible at https://www.glerl.noaa.gov/pubs/fulltext/2018/20180021.pdf, Large Lake Statistical Water Balance Model (L2SWBM). https://www.glerl.noaa.gov/data/WaterBalanceModel/, and Gronewold, A.D., Smith, J.P., Read, L. and Crooks, J.L., 2020. Reconciling the water balance of large lake systems. Advances in Water Resources, p.103505.
Discipline:: Science and Engineering

Data and Analysis for Global Probabilistic Geomagnetic Perturbation Forecasting Using the Data-Driven Model GeoDGP

Machine Learning Dataset to Support Paper "A comparison of machine learning classifiers in predicting safety for a multi-component dynamic system representation of an autonomous vessel"

ASVSpoof Laundered Database

Classification of complex local environments in systems of particle shapes through shape-symmetry encoded data augmentation

Atari Games Dataset

Simple Physics CAM6 Codebase for Training Machine Learning Algorithms

Resources for Training Machine Learning Algorithms Using CAM6 Simple Physics Packages

Simple Physics CAM6 Dataset for Training Machine Learning Algorithms

Data for Solar Flare Intensity Prediction with Machine Learning Models

Large Lake Statistical Water Balance Model - 12 month time window - 1980 through 2015 monthly summary data and model output

Limit your search

Resource type

Creator

Discipline

Language

Search Results

Search Constraints

Search Results

Limit your search