배움의 과정 : 실천

전체 글

Performance Assessment of OpenMP CompilersTargeting NVIDIA V100 GPUs 2021.02.24

Performance Assessment of OpenMP CompilersTargeting NVIDIA V100 GPUs

Miss_Baker 2021. 2. 24. 23:13

2021. 2. 24. 23:13

The OpenMP model introduced support for offloading code (with the target directive) to accelerators, co-processors, or many-core processors from version 4.0 (released 2013), and has continued to add and update features through versions 4.5 (released 2015) and 5.0 (released 2018).

Several other related works include demonstrating GPU support for OpenMP offloading features in compilers in Flang/Clang [3,25]

Integrating gpu support for openmp offloading directives into clang.
Openmp gpu offload in flang and llvm.

The Rodinia benchmark suite was used to evaluate OpenMP offloading Unified Memory performance by Mishra et al. [19].

Benchmarking and evaluating unified memory for openmp gpu offloading.
dl.acm.org/doi/10.1145/3148173.3148184

5개의 OpenMP offload, 1개의 OpenACC, 1개의 CUDA 컴파일러로 총 7개의 컴파일러와 mini-apps 성능 차이 비교.

Because PGI support for OpenMP offloading is still under development, PGI was tested using an OpenACC equivalent implementation of each code.

Cray Classic Compiler는 Cray compiler technology를 사용하는 것이고, Cray CCE 10.0.0은 Clang/LLVM으로 교체됨.

Table3에서

NI : mini-app이 그 프로그래밍 모델로 구현되지 않음을 의미함.
CE : Compiler Errors for OpenMP offloading features
RE : Runtime Errors for OpenMP offloading features

mini-apps 특성에 따라 달라지는 성능 metric을 사용함.

su3는 GPLOPs, ToyPush/laplace는 execution time, babelStream은 memory bandwidth

저작자표시 비영리 변경금지

PREV 이전 1 2 3 4 5 ···58 NEXT 다음

배움의 과정 : 실천

전체 글

Performance Assessment of OpenMP CompilersTargeting NVIDIA V100 GPUs

+ Recent posts

티스토리툴바