Skip ACC data xfers around halo exchanges for single-GPU runs#1471
Draft
abishekg7 wants to merge 1 commit into
Draft
Skip ACC data xfers around halo exchanges for single-GPU runs#1471abishekg7 wants to merge 1 commit into
abishekg7 wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue:
For GPU atmosphere model runs on a single GPU, the OpenACC host <-> device data transfers around halo exchanges are triggered unnecessarily, leading to performance degradation. This issue was first noticed during the 2026 OpenHackathon, and follows the introduction of PR #1355 into the develop branch. The
config_gpu_aware_mpinamelist option is meant as a fallback option for performing halo exchanges with MPI distributions that do not support GPU-aware communications. When performing multi-GPU runs with a supported MPI distribution, theconfig_gpu_aware_mpimust be set to true for better performance. A side effect of this introduction is thatconfig_gpu_aware_mpidefaults to false for the serial GPU case, triggering an extraneous set of halo exchanges, and degrading performance.Solution
This PR introduces another logical flag,
do_halo_exchange, to theifclauses of the OpenACC data transfer statements so that they only trigger when running on multiple-GPUs.The Nsys profiles of single-GPU runs (Global, real, 30km case) show the performance improvement after the introduction of the
do_halo_exchangeflag. The host <-> device data transfers that occur during the course of one dynamics step has been minimized.Before:


After:
This work was completed in part at the NCAR/NLR/NOAA Open Hackathon, part of the Open Hackathons program. The authors would like to acknowledge OpenACC-Standard.org for their support.