dataset for BATSRUS on GPU

An, Yifu

Work Description

Title: dataset for BATSRUS on GPU Open Access Deposited

Attribute	Value
Methodology	This dataset documents the input parameters for tests conducted in our paper and the raw timing results. Both can be found in the runlog files. We have conducted two simulations on supercomputers: combined weak and strong scaling tests using the fastwave test, and a production-level earth magnetosphere simulation. In the combined weak and strong scaling tests, two parameters control the grid size: (1) the number of processes np and (2) an exponent m. When m is constant, the number of grid blocks is proportional to np; when np is constant, the number of grid blocks is 2^m times that when m=0. In the earth magnetosphere simulation, the computational grid and settings are the same as a production level run at the Space Weather Prediction Center (SWPC). We test 1st- and 2nd-order time stepping for 60s of simulated time. See the preprint (below) for more details about the test setups. Raw timing results are a standard form of output from BATSRUS. BATSRUS implements a custom timing subroutine, which we have used to obtain the run time of various components of the code. Acknowledgement: Frontera is maintained by the Texas Advanced Computing Center (TACC) at the University of Texas at Austin. Pleiades is housed at the NASA Advanced Supercomputing (NAS) facility. Software specifications: load modules nvhpc-hpcx/24.5 for compiling on Frontera, nvhpc-nompi/24.3 + mpi-hpe/mpt for compiling on Pleiades. The runs can be reproduced with the following GitHub versions of BATSRUS: BATSRUS original master 2024-11-12 07e4c3b share original master 2024-11-12 5e19671 srcBATL original master 2024-11-09 5722baf util original master 2024-10-01 01f5253 In most cases, newer versions of BATSRUS are backward compatible, but this is not guaranteed when the related functionalities are in active development. Also note that for large runs in this dataset (m=12, see below) we disabled some unused arrays to save memory. This is reflected in the runlog files where the GitHub versions are "modified".
Description	We have ported our MHD code, BATSRUS ( https://github.com/SWMFsoftware/BATSRUS), to the GPU. This dataset contains the input parameters and raw timing results for the Paper. To reproduce the results, please follow the instructions and use the software specifications contained in readme.txt. Abstract: BATSRUS, our state-of-the-art extended magnetohydrodynamic code, is the most used and one of the most resource-consuming models in the Space Weather Modeling Framework. It has always been our objective to improve its efficiency and speed with emerging techniques, such as GPU acceleration. To utilize the GPU nodes on modern supercomputers, we port BATSRUS to GPUs with the OpenACC API. Porting the code to a single GPU requires rewriting and optimizing the most used functionalities of the original code into a new solver, which accounts for around 1% of the entire program in length. To port it to multiple GPUs, we implement a new message passing algorithm to support its unique block-adaptive grid feature. We conduct weak scaling tests on as many as 256 GPUs and find good performance. The program has 50-60% parallel efficiency on up to 256 GPUs, and up to 95% efficiency within a single node (4 GPUs). Running large problems on more than one node has reduced efficiency due to hardware bottlenecks. We also demonstrate our ability to run representative magnetospheric simulations on GPUs. The performance for a single A100 GPU is about the same as 270 AMD "Rome" CPU cores, and it runs 3.6 times faster than real time. The simulation can run 6.9 times faster than real time on four A100 GPUs.
Creator	An, Yifu
Depositor	anyifu@umich.edu
Contact information	anyifu@umich.edu
Discipline	Engineering
Funding agency	National Science Foundation (NSF)
Keyword	BATSRUS GPU MHD simulation
Citations to related material	An, Y., Chen, Y., Zhou, H., Gaenko, A. and Toth, G. (2024). BATSRUS GPU: Faster than Real Time Magnetospheric Simulations with a Block Adaptive Grid Code. Being revised. A preprint is available at http://arxiv.org/abs/2501.06717.
Resource type	Dataset
Curation notes	Dataset was updated on 2025-01-21, according to the depositor, "We have significantly revised our paper following the reviews we get. In doing so we (1) expand the original tests to allow an extra dimension of test parameters and (2) rerun all tests using an updated version of the code. The updated dataset still serves as a reproducibility package for the paper but becomes more comprehensive and structured than the original one. The file hierarchy of the new dataset is easier to interpret (it now combines the two zip files into one)."
Last modified	01/24/2025
Published	10/01/2024
DOI	https://doi.org/10.7302/9eab-kz39
License	http://creativecommons.org/publicdomain/zero/1.0/

To Cite this Work:
An, Y. (2024). dataset for BATSRUS on GPU [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/9eab-kz39

Relationships


This work is not a member of any user collections.

Files (Count: 2; Size: 387 KB)

Thumbnailthumbnail-column	Title	Original Upload	Last Modified	File Size	Access	Actions
	readme.txt	2024-09-25	2025-01-23	5.66 KB	Open Access	View Details Download
	BATSRUS_GPU_dataset.tar.gz	2025-01-24	2025-01-24	382 KB	Open Access	View Details Download

Date: 14 January, 2024

Dataset Title: Dataset for BATSRUS on GPU

Dataset Creators: Y. An

Dataset Contact: Yifu An (anyifu@umich.edu)

Funding: National Science Foundation grant PHY-2027555

Key Points:
- We port BATSRUS to GPU and conduct scaling tests on the supercomputers Frontera and Pleiades
- One A100 GPU is as fast as 270 AMD "Rome" CPU cores.
- Weak scaling is good for a small problem on up to 256 A100 GPUs, or for large problems on one GPU node.
- Parallel efficiency drops for large problems on more than one GPU node due to hardware congestion.
- We are able to run a production level magnetospheric simulation 6.9 times faster than real time on one A100 GPU node.

Research Overview:

BATSRUS, our state-of-the-art extended magnetohydrodynamic code, is the most used and one of the most resource-consuming models in the Space Weather Modeling Framework. It has always been our objective to improve its efficiency and speed with emerging techniques, such as GPU acceleration. To utilize the GPU nodes on modern supercomputers, we port BATSRUS to GPUs with the OpenACC API. Porting the code to a single GPU requires rewriting and optimizing the most used functionalities of the original code into a new solver, which accounts for around 1% of the entire program in length. To port it to multiple GPUs, we implement a new message passing algorithm to support its unique block-adaptive grid feature. We conduct weak scaling tests on as many as 256 GPUs and find good performance. The program has 50-60% parallel efficiency on up to 256 GPUs, and up to 95% efficiency within a single node (4 GPUs). Running large problems on more than one node has reduced efficiency due to hardware bottlenecks. We also demonstrate our ability to run representative magnetospheric simulations on GPUs. The performance for a single A100 GPU is about the same as 270 AMD "Rome" CPU cores, and it runs 3.6 times faster than real time. The simulation can run 6.9 times faster than real time on four A100 GPUs.

Methodology:

This dataset documents the input parameters for tests conducted in our paper and the raw timing results. Both can be found in the runlog files. We have conducted two simulations on supercomputers: combined weak and strong scaling tests using the fastwave test, and a production-level earth magnetosphere simulation. In the combined weak and strong scaling tests, two parameters control the grid size: (1) the number of processes np and (2) an exponent m. When m is constant, the number of grid blocks is proportional to np; when np is constant, the number of grid blocks is 2^m times that when m=0. In the earth magnetosphere simulation, the computational grid and settings are the same as a production level run at the Space Weather Prediction Center (SWPC). We test 1st- and 2nd-order time stepping for 60s of simulated time. See the preprint (below) for more details about the test setups. Raw timing results are a standard form of output from BATSRUS. BATSRUS implements a custom timing subroutine, which we have used to obtain the run time of various components of the code.

Acknowledgement: Frontera is maintained by the Texas Advanced Computing Center (TACC) at the University of Texas at Austin. Pleiades is housed at the NASA Advanced Supercomputing (NAS) facility.

Software specifications: load modules nvhpc-hpcx/24.5 for compiling on Frontera, nvhpc-nompi/24.3 + mpi-hpe/mpt for compiling on Pleiades. The runs can be reproduced with the following GitHub versions of BATSRUS:
BATSRUS original master 2024-11-12 07e4c3b
share original master 2024-11-12 5e19671
srcBATL original master 2024-11-09 5722baf
util original master 2024-10-01 01f5253
In most cases, newer versions of BATSRUS are backward compatible, but this is not guaranteed when the related functionalities are in active development. Also note that for large runs in this dataset (m=12, see below) we disabled some unused arrays to save memory. This is reflected in the runlog files where the GitHub versions are "modified".

Files contained here:

The input (PARAM.in) and runlog files of each individual run presented in the paper can be found under such hierarchy:

1. The computational device, GPU or CPU

2. The simulation, fastwave or earth

3. The supercomputer, pleiades or frontera

4. Parameters for the simulation. For the fastwave runs, run_m*_n* denotes the exponent (m) and number of processes (n). For the earth runs, run_n*_s* denotes the number of processes (n) and the order of accuracy in time (s).

5. For runs on the Pleiades supercomputer, some directories contain one runlog*.A100 and one runlog*.V100 file. This simply distinguishes between the runs on A100 and V100 GPUs with the same input.

6. For fastwave tests run on the Pleiades supercomputer, we also provide the job scripts. In each of the folders, job.pfe.A100 submits the run on A100 GPU nodes, while job.pfe.V100 submits that on V100 GPU nodes.

The directories should be straightforward to interpret. For example:
"data/GPU/earth/frontera/run_n004_s1/runlog.earth" is the runlog file for an earth simulation, using a first order time integration scheme, on 4 GPUs, on the Frontera supercomputer;
"data/GPU/fastwave/pleiades/run_m12_n016/runlog.np16.A100" is the runlog file for a fastwave simulation, using m=12 for the grid size, on 16 GPUs, on A100 GPU nodes on the Pleiades supercomputer.

Some Instructions on reproducing the results:

0. Clone the GitHub/SWMFsoftware/BATSRUS repository. The BATSRUS model will require three more repositories: srcBATL, util, share. Note that the srcUserExtra repository is not public (contains private user modules) and it is not needed.

The exact Git references that identify the versions of the repositories are contained in the runlog files, although we expect that newer versions of BATSRUS will work the same way.

Make sure that the ssh key of your machine is uploaded to GitHub. This is necessary for installation, as it involves downloading the share, util and srcBATL repositories.

1. Install BATSRUS
./Config.pl -install -compiler=nvfortran,nvc
Switch on the OpenACC compiler flag for running on the GPU:
./Config.pl -acc
You can test installation with a set of small tests on a single GPU:
make -j test_small_gpu NP=1
By default, the tests run on 2 GPUs. If all .diff files are empty (0 size) in the end, it means all tests pass.

To reproduce the results in the paper, first set the block sizes according to the paper:
./Config.pl -g=10,10,10
for the fastwave test and
./Config.pl -g=8,8,8
for the earth magnetosphere simulation. The equation set should be Mhd:
./Config.pl -e=Mhd

2. Compile BATSRUS:
make -j BATSRUS

3. Make a run directory:
make rundir
The default directory name is "run".

4. Copy one of the PARAM.in files contained in this dataset into the run directory. The name of the copied file should not be changed (PARAM.in).

5. Run BATSRUS in the rundir directory on 2 GPUs (if you have 1 GPUs, use -n 1):
cd run
mpiexec -n 2 BATSRUS.exe > runlog
This reproduces the test.

6*: If you have access to the A100 or V100 GPU queues on the Pleiades supercomputer, you can use the job scripts to reproduce the fastwave tests. Make sure
- You compile BATSRUS on a GPU node
- You set the environmental variables as contained in the job script
- You load the correct modules when compiling BATSRUS (also as contained in the job script)
- You have created a rundir. Put job.pfe.[AV]100 in the rundir, cd there and type "qsub job.pfe.[AV]100". As of January 2025, submission of the jobscript should happen on pbspl4 instead of the Pleiades frontends ("ssh pbspl4" after login).

Related publication(s):

An, Y., Chen, Y., Zhou, H., Gaenko, A. and Toth, G. (2024). BATSRUS GPU: Faster than Real Time Magnetospheric Simulations with a Block Adaptive Grid Code. Being revised.
A preprint is available at http://arxiv.org/abs/2501.06717.

Use and Access:
This data set is made available under the CC0 1.0 Universal Public Domain Dedication license (CC0 1.0).

To Cite Data:
An, Y. dataset for BATSRUS on GPU [Data set], University of Michigan - Deep Blue Data.

Update Provenance Log Entries

Download All Files (To download individual files, select them in the “Files” panel above)

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to contact us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.