Design and GPGPU performance of Futhark's redomap construct
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Design and GPGPU performance of Futhark's redomap construct. / Henriksen, Troels; Larsen, Ken Friis; Oancea, Cosmin Eugen.
Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming. Association for Computing Machinery, 2016. p. 17-24.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Design and GPGPU performance of Futhark's redomap construct
AU - Henriksen, Troels
AU - Larsen, Ken Friis
AU - Oancea, Cosmin Eugen
N1 - Conference code: 3
PY - 2016
Y1 - 2016
N2 - This paper presents and evaluates a novel second-order operator, named 'redomap', that stems from 'map'-'reduce' compositions in the context of the purely-functional array language Futhark, which is aimed at efficient GPGPU execution. Main contributions are: First, we demonstrate an aggressive fusion technique that is centered on the 'redomap' operator. Second, we present a compilation technique for 'redomap' that efficiently sequentializes the excess parallelism and ensures coalesced access to global memory, even for non-commutative 'reduce' operators. Third, a detailed performance evaluation shows that Futhark's automatically generated code matches or exceeds performance of hand-tuned Thrust code. Our evaluation infrastructure is publicly available and we encourage replication and verification of our results.
AB - This paper presents and evaluates a novel second-order operator, named 'redomap', that stems from 'map'-'reduce' compositions in the context of the purely-functional array language Futhark, which is aimed at efficient GPGPU execution. Main contributions are: First, we demonstrate an aggressive fusion technique that is centered on the 'redomap' operator. Second, we present a compilation technique for 'redomap' that efficiently sequentializes the excess parallelism and ensures coalesced access to global memory, even for non-commutative 'reduce' operators. Third, a detailed performance evaluation shows that Futhark's automatically generated code matches or exceeds performance of hand-tuned Thrust code. Our evaluation infrastructure is publicly available and we encourage replication and verification of our results.
U2 - 10.1145/2935323.2935326
DO - 10.1145/2935323.2935326
M3 - Article in proceedings
SP - 17
EP - 24
BT - Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming
PB - Association for Computing Machinery
T2 - 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming
Y2 - 14 June 2016 through 14 June 2016
ER -
ID: 164443159