The OpenMP model introduced support for offloading code (with the target directive) to accelerators, co-processors, or many-core processors from version 4.0 (released 2013), and has continued to add and update features through versions 4.5 (released 2015) and 5.0 (released 2018).
Several other related works include demonstrating GPU support for OpenMP offloading features in compilers in Flang/Clang [3,25]
- Integrating gpu support for openmp offloading directives into clang.
- Openmp gpu offload in flang and llvm.
The Rodinia benchmark suite was used to evaluate OpenMP offloading Unified Memory performance by Mishra et al. [19].
- Benchmarking and evaluating unified memory for openmp gpu offloading.
dl.acm.org/doi/10.1145/3148173.3148184
5개의 OpenMP offload, 1개의 OpenACC, 1개의 CUDA 컴파일러로 총 7개의 컴파일러와 mini-apps 성능 차이 비교.
Because PGI support for OpenMP offloading is still under development, PGI was tested using an OpenACC equivalent implementation of each code.
Cray Classic Compiler는 Cray compiler technology를 사용하는 것이고, Cray CCE 10.0.0은 Clang/LLVM으로 교체됨.
Table3에서
- NI : mini-app이 그 프로그래밍 모델로 구현되지 않음을 의미함.
- CE : Compiler Errors for OpenMP offloading features
- RE : Runtime Errors for OpenMP offloading features
mini-apps 특성에 따라 달라지는 성능 metric을 사용함.
- su3는 GPLOPs, ToyPush/laplace는 execution time, babelStream은 memory bandwidth