Skip to content

v1.11.0 OpenMP target on some loops

Compare
Choose a tag to compare
@sumseq sumseq released this 09 Sep 15:58
· 57 commits to main since this release

This release uses OpenMP Target for some nested loops where nested do concurrent' loops may be hard for compilers to parallelize optimally, as well as for some reduction loops that use min' or `max'.
This release is temporary and is made to be optimal across NVIDIA, Intel, and AMD GPUs using current compilers.
When some compilers are updated, this release will not be needed.