Skip to content

Skip ACC data xfers around halo exchanges for single-GPU runs#1471

Draft
abishekg7 wants to merge 1 commit into
MPAS-Dev:developfrom
abishekg7:atmosphere/fix_acc_halo_serial
Draft

Skip ACC data xfers around halo exchanges for single-GPU runs#1471
abishekg7 wants to merge 1 commit into
MPAS-Dev:developfrom
abishekg7:atmosphere/fix_acc_halo_serial

Conversation

@abishekg7

@abishekg7 abishekg7 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Issue:

For GPU atmosphere model runs on a single GPU, the OpenACC host <-> device data transfers around halo exchanges are triggered unnecessarily, leading to performance degradation. This issue was first noticed during the 2026 OpenHackathon, and follows the introduction of PR #1355 into the develop branch. The config_gpu_aware_mpi namelist option is meant as a fallback option for performing halo exchanges with MPI distributions that do not support GPU-aware communications. When performing multi-GPU runs with a supported MPI distribution, the config_gpu_aware_mpi must be set to true for better performance. A side effect of this introduction is that config_gpu_aware_mpi defaults to false for the serial GPU case, triggering an extraneous set of halo exchanges, and degrading performance.

Solution

This PR introduces another logical flag, do_halo_exchange, to the if clauses of the OpenACC data transfer statements so that they only trigger when running on multiple-GPUs.

The Nsys profiles of single-GPU runs (Global, real, 30km case) show the performance improvement after the introduction of the do_halo_exchange flag. The host <-> device data transfers that occur during the course of one dynamics step has been minimized.

Before:
image
After:
image

This work was completed in part at the NCAR/NLR/NOAA Open Hackathon, part of the Open Hackathons program. The authors would like to acknowledge OpenACC-Standard.org for their support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant