-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLOUDSC HIP (SCC, SCC-HOIST, SCC-K-CACHING) #59
Conversation
…sc including arch files for Lumi
db9a182
to
3ef7b5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, big apology for the huge delay in reviewing this. This is a fantastic contribution and provides an important reference for achievable performance on MI250X.
I can reproduce the quoted performance numbers, for reference:
bin/dwarf-cloudsc-hip 1 262144 128 312 GF/s
bin/dwarf-cloudsc-hip 1 262144 64 355 GF/s
bin/dwarf-cloudsc-hip 1 524288 64 399 GF/s
bin/dwarf-cloudsc-hip-hoist 1 262144 128 270 GF/s
bin/dwarf-cloudsc-hip-hoist 1 524288 64 383 GF/s
bin/dwarf-cloudsc-hip-k-caching 1 262144 128 260 GF/s
bin/dwarf-cloudsc-hip-k-caching 1 262144 64 440 GF/s
bin/dwarf-cloudsc-hip-k-caching 1 768432 64 508 GF/s
There is a noticeable dependency on the NPROMA value and it requires a really large data set to converge to maximum performance.
For NPROMA=256 I also get an error:
GPUassert: hipErrorLaunchFailure
While it will be interesting to explore some of this further, I don't think it is required for this PR. However, I have left a few comments and remarks how the build system integration could be improved. In particular, making the suggested changes to the toolchain file allowed to build with the expected minimum build command:
./cloudsc-bundle build --arch arch/eurohpc/lumi/cray-gpu/15.0.1/ --with-hip --with-serialbox
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, looks great now. If you can, add the relevant ctest hooks for the other HIP variants, otherwise GTG.
On Lumi build with:
/cloudsc-bundle build --clean --cmake="OpenMP_C_LIB_NAMES=craymp OpenMP_CXX_LIB_NAMES=craymp OpenMP_Fortran_LIB_NAMES=craymp OpenMP_craymp_LIBRARY=/opt/cray/pe/cce/15.0.1/cce/x86_64/lib/libcraymp.so OpenMP_C_FLAGS=-fopenmp OpenMP_CXX_FLAGS=-fopenmp OpenMP_Fortran_FLAGS=-homp" --arch arch/eurohpc/lumi/cray-gpu/15.0.1 --build-dir=build --with-hip --with-serialbox
__launch_bounds__(128, 1)
(similar performance for__launch_bounds__(64, 1)
,__launch_bounds__(256, 1)
) instead of default__launch_bounds__(1024, 1)
NGPTOT
, e.g. k-caching HIP version262144
: 450 GFlops/s524288
: 490 GFlops/s786432
: 510 GFlops/s