Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUDSC HIP (SCC, SCC-HOIST, SCC-K-CACHING) #59

Merged
merged 4 commits into from
Dec 5, 2023
Merged

Conversation

MichaelSt98
Copy link
Contributor

On Lumi build with:

/cloudsc-bundle build --clean --cmake="OpenMP_C_LIB_NAMES=craymp OpenMP_CXX_LIB_NAMES=craymp OpenMP_Fortran_LIB_NAMES=craymp OpenMP_craymp_LIBRARY=/opt/cray/pe/cce/15.0.1/cce/x86_64/lib/libcraymp.so OpenMP_C_FLAGS=-fopenmp OpenMP_CXX_FLAGS=-fopenmp OpenMP_Fortran_FLAGS=-homp" --arch arch/eurohpc/lumi/cray-gpu/15.0.1 --build-dir=build --with-hip --with-serialbox

  • wrong results for NPROMA >= 128
  • best performance for NPROMA = 64
  • performance heavily relies on __launch_bounds__(128, 1) (similar performance for __launch_bounds__(64, 1), __launch_bounds__(256, 1)) instead of default __launch_bounds__(1024, 1)
  • improving performance for increasing NGPTOT, e.g. k-caching HIP version
    • 262144: 450 GFlops/s
    • 524288: 490 GFlops/s
    • 786432: 510 GFlops/s

@MichaelSt98 MichaelSt98 changed the title CLOUDSC HIP (SCC, SCC-HOIST, SCC-K- CLOUDSC HIP (SCC, SCC-HOIST, SCC-K-CACHING) Aug 31, 2023
@MichaelSt98 MichaelSt98 changed the base branch from main to develop August 31, 2023 08:28
Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, big apology for the huge delay in reviewing this. This is a fantastic contribution and provides an important reference for achievable performance on MI250X.

I can reproduce the quoted performance numbers, for reference:

bin/dwarf-cloudsc-hip 1 262144 128               312 GF/s
bin/dwarf-cloudsc-hip 1 262144 64                355 GF/s
bin/dwarf-cloudsc-hip 1 524288 64                399 GF/s
bin/dwarf-cloudsc-hip-hoist 1 262144 128         270 GF/s
bin/dwarf-cloudsc-hip-hoist 1 524288 64          383 GF/s
bin/dwarf-cloudsc-hip-k-caching 1 262144 128     260 GF/s
bin/dwarf-cloudsc-hip-k-caching 1 262144 64      440 GF/s
bin/dwarf-cloudsc-hip-k-caching 1 768432 64      508 GF/s

There is a noticeable dependency on the NPROMA value and it requires a really large data set to converge to maximum performance.

For NPROMA=256 I also get an error:

GPUassert: hipErrorLaunchFailure

While it will be interesting to explore some of this further, I don't think it is required for this PR. However, I have left a few comments and remarks how the build system integration could be improved. In particular, making the suggested changes to the toolchain file allowed to build with the expected minimum build command:

./cloudsc-bundle build --arch arch/eurohpc/lumi/cray-gpu/15.0.1/ --with-hip --with-serialbox

CMakeLists.txt Outdated Show resolved Hide resolved
src/cloudsc_hip/CMakeLists.txt Outdated Show resolved Hide resolved
src/cloudsc_hip/CMakeLists.txt Outdated Show resolved Hide resolved
src/cloudsc_hip/CMakeLists.txt Outdated Show resolved Hide resolved
bundle.yml Show resolved Hide resolved
src/cloudsc_hip/CMakeLists.txt Outdated Show resolved Hide resolved
src/cloudsc_hip/cloudsc/load_state.cpp Outdated Show resolved Hide resolved
src/cloudsc_hip/cloudsc/load_state.cpp Outdated Show resolved Hide resolved
arch/eurohpc/lumi/cray-gpu/15.0.1/toolchain.cmake Outdated Show resolved Hide resolved
Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, looks great now. If you can, add the relevant ctest hooks for the other HIP variants, otherwise GTG.

src/cloudsc_hip/CMakeLists.txt Outdated Show resolved Hide resolved
@reuterbal reuterbal merged commit 22fc989 into develop Dec 5, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants