Introducing 'mix' versions #65

MichaelSt98 · 2023-12-12T16:06:41Z

Fortran driver + OpenMP/OpenACC offload + CUDA/HIP/SYCL kernel implementation

For now only SCC-K-CACHING.

./cloudsc-bundle build --with-mix --cloudsc-gpu-offload ACC|OMP --cloudsc-gpu-lang CUDA|HIP|SYCL

e.g.:

./cloudsc-bundle build --arch=arch/ecmwf/hpc2020/nvhpc/22.11 --with-mix --cloudsc-gpu-offload ACC --cloudsc-gpu-lang CUDA

Tested on NVIDIA:
- OpenACC + CUDA
- OpenMP + CUDA
- OpenACC + SYCL
- OpenMP + SYCL
Tested on AMD:
- OpenACC + HIP
- OpenMP + HIP

reuterbal

Thanks and apologies for the extremely long review time.

This is very cool and conceptually ready to merge, but I left a few suggestions how to improve particularly the build system integration.

In addition, I would like to ask that

this is added to the Github build config for NVHPC, at least with the CUDA flavour, to make sure we test the build at least;
these variants are described in the README

reuterbal · 2024-11-06T10:43:22Z

bundle.yml

+            ENABLE_CLOUDSC_MIX=ON
+
+    - cloudsc-gpu-offload :
+        help  : [OMP|ACC]


This will break ecbundle because pipe has a meaning in yaml, we need to make this valid:

Suggested change

help : [OMP|ACC]

help : "Data offload model for GPU variants. Available options: OMP, ACC"

reuterbal · 2024-11-06T10:44:09Z

bundle.yml

+        cmake : CLOUDSC_GPU_OFFLOAD={{value}}
+
+    - cloudsc-gpu-lang :
+        help  : [CUDA|HIP|SYCL]


Suggested change

help : [CUDA|HIP|SYCL]

help : "Kernel language for low-level GPU kernel implementations. Available options: CUDA, HIP, SYCL"

reuterbal · 2024-11-06T10:46:43Z

src/common/CMakeLists.txt

+if ( ENABLE_CLOUDSC_MIX )
+  # if (CLOUDSC_GPU_OFFLOAD STREQUAL "ACC")
+  # HACK: seems like nordc only necessary for NVIDIA machines but not AMD machines
+  if (CLOUDSC_GPU_LANG STREQUAL "CUDA" OR CLOUDSC_GPU_LANG STREQUAL "SYCL")


Suggestion for a shorter check

Suggested change

if (CLOUDSC_GPU_LANG STREQUAL "CUDA" OR CLOUDSC_GPU_LANG STREQUAL "SYCL")

if (CLOUDSC_GPU_LANG MATCHES "CUDA|SYCL")

Also, that compiler option is NVHPC-specific, no? So, we might want to add CMAKE_Fortran_COMPILER_ID MATCHES "PGI|NVHPC"

reuterbal · 2024-11-06T11:04:24Z

src/cloudsc_mix/CMakeLists.txt

+    # GPU data offload (default: ACC = 1)
+    ## ACC: 1
+    ## OMP: 2    
+    if (CLOUDSC_GPU_OFFLOAD STREQUAL "ACC")


The use of named constants in the source is nice, it really helps the readability. But it isn't as clean here in the CMake layer, which I think we can improve with a little trick that requires you to also change only a single place when you ever want to change the numbering.

Conceptually it should look like this:

# Define integer IDs corresponding to every language choice set(CUDA_LANG "1") set(HIP_LANG "2") set(SYCL_LANG "3") set(ACC_OFFLOAD "4") set(OMP_OFFLOAD "5") # Select offload model if( NOT DEFINED CLOUDSC_GPU_OFFLOAD ) set(CLOUDSC_GPU_OFFLOAD "ACC") endif() set(GPU_OFFLOAD ${${CLOUDSC_GPU_OFFLOAD}_OFFLOAD}) # Optional: check for valid options if( NOT ${GPU_OFFLOAD} MATCHES "ACC|OMP" ) error(...) endif() # Select kernel language if( NOT DEFINED CLOUDSC_GPU_LANG ) set(CLOUDSC_GPU_LANG "CUDA") endif() set(GPU_LANG ${${CLOUDSC_GPU_LANG}_LANG}) # Language and offload-specific overrides if( CLOUDSC_GPU_LANG STREQUAL "CUDA" ) ... elseif(...) ... endif() [... Define library etc...] # Provide definitions to target foreach(_def CUDA_LANG HIP_LANG SYCL_LANG ACC_OFFLOAD OMP_OFFLOAD) target_compile_definitions(dwarf-cloudsc-gpu-lib PUBLIC ${_def}=${${_def}}) endforeach()

This is untested but simplifies the logic in my opinion

reuterbal · 2024-11-06T12:23:21Z

src/cloudsc_mix/CMakeLists.txt

+        ${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}
+      )
+      if (NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
+        target_compile_options(dwarf-cloudsc-gpu-lib PRIVATE $<$<COMPILE_LANGUAGE:CUDA>>)


What's the purpose of the double-nested generator expression? $<$<...>>

reuterbal · 2024-11-06T12:27:36Z

src/cloudsc_mix/cloudsc_c_k_caching.cpp

+#if GPU_LANG == CUDA_LANG
+#include "cuda.h"
+#elif GPU_LANG == HIP_LANG
+#include "hip/hip_runtime.h"
+#elif GPU_LANG == SYCL_LANG
+#include <CL/sycl.hpp>
+#endif


Not necessarily required but just to point out that we could use here the same pattern that ectrans is using to switch between HIP and CUDA implementations via header files that rewrite relevant API calls on demand:
https://github.com/ecmwf-ifs/ectrans/blob/main/src/trans/gpu/algor/hicblas.h
With specific overloads for HIP and CUDA

… CUDA/HIP/SYCL kernel implementation

…ifications

reuterbal · 2024-11-06T15:22:17Z

src/common/CMakeLists.txt

@@ -98,6 +108,8 @@ ecbuild_add_library( TARGET cloudsc-common-lib
        $<${HAVE_HDF5}:hdf5::hdf5_fortran>
        $<${HAVE_SERIALBOX}:Serialbox::Serialbox_Fortran>
        $<${HAVE_FIELD_API}:field_api_${prec}>
+    DEFINITIONS
+        ENABLE_CLOUDSC_MIX


Is this intentionally always active, or should this actually be something like $<${ENABLE_CLOUDSC_MIX}:ENABLE_CLOUDSC_MIX>? (Just curious, because it would render the CPP guards in yoethf.F90, yomcst.F90 redundant)

MichaelSt98 marked this pull request as ready for review April 18, 2024 12:09

MichaelSt98 requested a review from reuterbal April 18, 2024 12:09

reuterbal requested changes Nov 6, 2024

View reviewed changes

MichaelSt98 added 4 commits November 6, 2024 15:11

Introducing 'mix' versions, Fortran driver + OpenMP/OpenACC offload +…

22a8d16

… CUDA/HIP/SYCL kernel implementation

CLOUDSC (GPU) mix variants: minor improvements, refactoring and simpl…

4158302

…ifications

CLOUDSC (GPU) mix variants: add entry in README

a6b5832

Add CLOUDSC (GPU) mix variant to CI (build the CUDA ACC variant)

434f1c3

MichaelSt98 force-pushed the nams_mix branch from 84928d4 to 434f1c3 Compare November 6, 2024 15:14

MichaelSt98 requested a review from reuterbal November 6, 2024 15:15

reuterbal reviewed Nov 6, 2024

View reviewed changes

MichaelSt98 added 2 commits November 6, 2024 16:05

Do no always provide definition 'ENABLE_CLOUDSC_MIX'

ebc9601

CLOUDSC (GPU) mix variant as as single version build (in the CI)

de38016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing 'mix' versions #65

Introducing 'mix' versions #65

MichaelSt98 commented Dec 12, 2023

reuterbal left a comment

reuterbal Nov 6, 2024

reuterbal Nov 6, 2024

reuterbal Nov 6, 2024

reuterbal Nov 6, 2024

reuterbal Nov 6, 2024

reuterbal Nov 6, 2024

reuterbal Nov 6, 2024

	help : [OMP\|ACC]
	help : "Data offload model for GPU variants. Available options: OMP, ACC"

	help : [CUDA\|HIP\|SYCL]
	help : "Kernel language for low-level GPU kernel implementations. Available options: CUDA, HIP, SYCL"

	if (CLOUDSC_GPU_LANG STREQUAL "CUDA" OR CLOUDSC_GPU_LANG STREQUAL "SYCL")
	if (CLOUDSC_GPU_LANG MATCHES "CUDA\|SYCL")

Introducing 'mix' versions #65

Are you sure you want to change the base?

Introducing 'mix' versions #65

Conversation

MichaelSt98 commented Dec 12, 2023

reuterbal left a comment

Choose a reason for hiding this comment

reuterbal Nov 6, 2024

Choose a reason for hiding this comment

reuterbal Nov 6, 2024

Choose a reason for hiding this comment

reuterbal Nov 6, 2024

Choose a reason for hiding this comment

reuterbal Nov 6, 2024

Choose a reason for hiding this comment

reuterbal Nov 6, 2024

Choose a reason for hiding this comment

reuterbal Nov 6, 2024

Choose a reason for hiding this comment

reuterbal Nov 6, 2024

Choose a reason for hiding this comment