Work Description

Title: Data and Data products for machine learning applied to solar flares Open Access Deposited
Attribute Value
  • The data deposited here are the both the input and resulting output of our machine learning efforts for predicting solar flare events. The input data are produced by our pre-processing pipeline, which is built to extract from two sources -- the Geostationary Operational Environmental Satellites (GOES) and Solar Dynamics Observatory (SDO)/Helioseismic and Magnetic Imager (HMI) . The GOES files provide a list of flare events in excel csv format. The SHARP files provide time sequences of more than 20 physical parameters derived from SDO/HMI data, which are saved in hdf5 format. The remaining files (B_HARPs_CNNencoded_part_XXX.hdf5, and M_X_HARPs_CNNencoded_part_XXX.hdf5) are the output from the convolutional neural network (CNN), a deep learning algorithm used here to extract/select features from raw HMI data. The CNN is a deep learning model trained to capture both the spatial and temporal information from HMI magnetogram data for strong/weak flare classification and for predictions of flare intensities. The input data for the CNN are HMI Active Region Patch (HARP) 3-component vector magnetograms available online from the Stanford Joint Science Operations Center (JSOC) at
  • GOES_flare_list: contains a list of more than 10,000 flare events. The list has 6 columns, flare classification, active region number, date, start time end time, emission peak time

  • GOES_B_flare_list: contains time series data of SDO/HMI SHARP parameters for B class solar flares

  • GOES_MX_flare_list: contains time series data of SDO/HMI SHARP parameters for M and X class solar flares

  • SHARP_B_flare_data_300.hdf5 and SHARP_MX_flare_data_300.hdf5 files contain time series more than 20 physical variables derived from the SDO/HMI SHARP data files. These data are saved at a 12 minute cadence and are used to train the LSTM model.

  • B_HARPs_CNNencoded_part_xxx.hdf5 and M_X HARPs_CNNencoded_part_xxx.hdf5 include neural network encoded features derived from vector magnetogram images derived from the Solar Dynamics Observatory (SDO) Helioseismic and Magnetic Imager (HMI). These data files typically contains one or two sequences of magnetograms covering an active region for a period of 24h with a 1 hour cadence. We encode each magnetogram with frames of a fixed size of 8x16 with 512 channels.
Contact information
Funding agency
  • National Aeronautics and Space Administration (NASA)
  • National Science Foundation (NSF)
ORSP grant number
  • F051148 F033166
Citations to related material
  • Chen, Y., Manchester, W., Hero, A., Toth, G., DuFumier, B. Zhou, T., Wang, X., Zhu, H., Sun, Zeyu, Gombosi, T., Identifying Solar Flare Precursors Using Time Series of SDO/HMI Images and SHARP Parameters, Space Weather Journal, submitted
Resource type
Last modified
  • 11/14/2019
  • 08/13/2019
To Cite this Work:
Chen, Y., Manchester, W. (2019). Data and Data products for machine learning applied to solar flares [Data set]. University of Michigan - Deep Blue.


Files (Count: 408; Size: 5.36 GB)

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 5.36 GB may be too large to download directly. Consider using Globus (see below).

Files are ready   Download Data from Globus
Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus