Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing 'mix' versions #65

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from
Open

Introducing 'mix' versions #65

wants to merge 6 commits into from

Conversation

MichaelSt98
Copy link
Contributor

Fortran driver + OpenMP/OpenACC offload + CUDA/HIP/SYCL kernel implementation

For now only SCC-K-CACHING.

./cloudsc-bundle build --with-mix --cloudsc-gpu-offload ACC|OMP --cloudsc-gpu-lang CUDA|HIP|SYCL

e.g.:

./cloudsc-bundle build --arch=arch/ecmwf/hpc2020/nvhpc/22.11 --with-mix --cloudsc-gpu-offload ACC --cloudsc-gpu-lang CUDA

  • Tested on NVIDIA:
    • OpenACC + CUDA
    • OpenMP + CUDA
    • OpenACC + SYCL
    • OpenMP + SYCL
  • Tested on AMD:
    • OpenACC + HIP
    • OpenMP + HIP

@MichaelSt98 MichaelSt98 marked this pull request as ready for review April 18, 2024 12:09
Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and apologies for the extremely long review time.

This is very cool and conceptually ready to merge, but I left a few suggestions how to improve particularly the build system integration.

In addition, I would like to ask that

  • this is added to the Github build config for NVHPC, at least with the CUDA flavour, to make sure we test the build at least;
  • these variants are described in the README

bundle.yml Outdated
ENABLE_CLOUDSC_MIX=ON

- cloudsc-gpu-offload :
help : [OMP|ACC]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break ecbundle because pipe has a meaning in yaml, we need to make this valid:

Suggested change
help : [OMP|ACC]
help : "Data offload model for GPU variants. Available options: OMP, ACC"

bundle.yml Outdated
cmake : CLOUDSC_GPU_OFFLOAD={{value}}

- cloudsc-gpu-lang :
help : [CUDA|HIP|SYCL]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help : [CUDA|HIP|SYCL]
help : "Kernel language for low-level GPU kernel implementations. Available options: CUDA, HIP, SYCL"

if ( ENABLE_CLOUDSC_MIX )
# if (CLOUDSC_GPU_OFFLOAD STREQUAL "ACC")
# HACK: seems like nordc only necessary for NVIDIA machines but not AMD machines
if (CLOUDSC_GPU_LANG STREQUAL "CUDA" OR CLOUDSC_GPU_LANG STREQUAL "SYCL")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for a shorter check

Suggested change
if (CLOUDSC_GPU_LANG STREQUAL "CUDA" OR CLOUDSC_GPU_LANG STREQUAL "SYCL")
if (CLOUDSC_GPU_LANG MATCHES "CUDA|SYCL")

Also, that compiler option is NVHPC-specific, no? So, we might want to add CMAKE_Fortran_COMPILER_ID MATCHES "PGI|NVHPC"

# GPU data offload (default: ACC = 1)
## ACC: 1
## OMP: 2
if (CLOUDSC_GPU_OFFLOAD STREQUAL "ACC")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of named constants in the source is nice, it really helps the readability. But it isn't as clean here in the CMake layer, which I think we can improve with a little trick that requires you to also change only a single place when you ever want to change the numbering.

Conceptually it should look like this:

# Define integer IDs corresponding to every language choice
set(CUDA_LANG "1")
set(HIP_LANG "2")
set(SYCL_LANG "3")
set(ACC_OFFLOAD "4")
set(OMP_OFFLOAD "5")

# Select offload model
if( NOT DEFINED CLOUDSC_GPU_OFFLOAD )
  set(CLOUDSC_GPU_OFFLOAD "ACC")
endif()

set(GPU_OFFLOAD ${${CLOUDSC_GPU_OFFLOAD}_OFFLOAD})

# Optional: check for valid options
if( NOT ${GPU_OFFLOAD} MATCHES "ACC|OMP" )
  error(...)
endif()

# Select kernel language
if( NOT DEFINED CLOUDSC_GPU_LANG )
  set(CLOUDSC_GPU_LANG "CUDA")
endif()

set(GPU_LANG ${${CLOUDSC_GPU_LANG}_LANG})

# Language and offload-specific overrides
if( CLOUDSC_GPU_LANG STREQUAL "CUDA" )
 ...
elseif(...)
  ...
endif()

[... Define library etc...]

# Provide definitions to target
foreach(_def CUDA_LANG HIP_LANG SYCL_LANG ACC_OFFLOAD OMP_OFFLOAD)
  target_compile_definitions(dwarf-cloudsc-gpu-lib PUBLIC ${_def}=${${_def}})
endforeach()

This is untested but simplifies the logic in my opinion

${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}
)
if (NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
target_compile_options(dwarf-cloudsc-gpu-lib PRIVATE $<$<COMPILE_LANGUAGE:CUDA>>)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of the double-nested generator expression? $<$<...>>

Comment on lines +11 to +17
#if GPU_LANG == CUDA_LANG
#include "cuda.h"
#elif GPU_LANG == HIP_LANG
#include "hip/hip_runtime.h"
#elif GPU_LANG == SYCL_LANG
#include <CL/sycl.hpp>
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily required but just to point out that we could use here the same pattern that ectrans is using to switch between HIP and CUDA implementations via header files that rewrite relevant API calls on demand:
https://github.com/ecmwf-ifs/ectrans/blob/main/src/trans/gpu/algor/hicblas.h
With specific overloads for HIP and CUDA

@@ -98,6 +108,8 @@ ecbuild_add_library( TARGET cloudsc-common-lib
$<${HAVE_HDF5}:hdf5::hdf5_fortran>
$<${HAVE_SERIALBOX}:Serialbox::Serialbox_Fortran>
$<${HAVE_FIELD_API}:field_api_${prec}>
DEFINITIONS
ENABLE_CLOUDSC_MIX
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally always active, or should this actually be something like $<${ENABLE_CLOUDSC_MIX}:ENABLE_CLOUDSC_MIX>? (Just curious, because it would render the CPP guards in yoethf.F90, yomcst.F90 redundant)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants