Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Optimizing GPU programs by partial evaluation. / Tyurin, Aleksey; Berezun, Daniil; Grigorev, Semyon.
PPoPP 2020 - Proceedings of the 2020 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Association for Computing Machinery, 2020. p. 431-432 (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Optimizing GPU programs by partial evaluation
AU - Tyurin, Aleksey
AU - Berezun, Daniil
AU - Grigorev, Semyon
PY - 2020/2/19
Y1 - 2020/2/19
N2 - While GPU utilization allows one to speed up computations to the orders of magnitude, memory management remains the bottleneck making it often a challenge to achieve the desired performance. Hence, different memory optimizations are leveraged to make memory being used more effectively. We propose an approach automating memory management utilizing partial evaluation, a program transformation technique that enables data accesses to be pre-computed, optimized, and embedded into the code, saving memory transactions. An empirical evaluation of our approach shows that the transformed program could be up to 8 times as efficient as the original one in the case of CUDA C naïve string pattern matching algorithm implementation.
AB - While GPU utilization allows one to speed up computations to the orders of magnitude, memory management remains the bottleneck making it often a challenge to achieve the desired performance. Hence, different memory optimizations are leveraged to make memory being used more effectively. We propose an approach automating memory management utilizing partial evaluation, a program transformation technique that enables data accesses to be pre-computed, optimized, and embedded into the code, saving memory transactions. An empirical evaluation of our approach shows that the transformed program could be up to 8 times as efficient as the original one in the case of CUDA C naïve string pattern matching algorithm implementation.
KW - CUDA
KW - GPU
KW - Partial Evaluation
UR - http://www.scopus.com/inward/record.url?scp=85082389104&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/68d45b0a-5f2b-3108-95ff-e38a69e5eba0/
U2 - 10.1145/3332466.3374507
DO - 10.1145/3332466.3374507
M3 - Conference contribution
AN - SCOPUS:85082389104
SN - 9781450368186
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
SP - 431
EP - 432
BT - PPoPP 2020 - Proceedings of the 2020 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
PB - Association for Computing Machinery
T2 - 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2020
Y2 - 22 February 2020 through 26 February 2020
ER -
ID: 53079211