Towards GPU Parallelism Abstractions in Rust: A Case Study with Linear Pipelines
Resumo
Programming Graphics Processing Units (GPUs) for general-purpose computation remains a daunting task, often requiring specialized knowledge of low-level APIs like CUDA or OpenCL. While Rust has emerged as a modern, safe, and performant systems programming language, its adoption in the GPU computing domain is still nascent. Existing approaches often involve intricate compiler modifications or complex static analysis to adapt CPU-centric Rust code for GPU execution. This paper presents a novel high-level abstraction in Rust, leveraging procedural macros to automatically generate GPU-executable code from constrained Rust functions. Our approach simplifies the code generation process by imposing specific limitations on how these functions can be written, thereby avoiding the need for complex static analysis.We demonstrate the feasibility and effectiveness of our abstraction through a case study involving linear pipeline parallel patterns, a common structure in data-parallel applications. By transforming Rust functions annotated as source, stage, or sink in a pipeline, we enable straightforward execution on the GPU. We evaluate our abstraction’s performance and programmability using two benchmark applications: sobel (image filtering) and latbol (fluid simulation), comparing it against manual OpenCL implementations. Our results indicate that while incurring a small performance overhead in some cases, our approach significantly reduces development effort and, in certain scenarios, achieves comparable or even superior throughput compared to CPU-based parallelism.
Referências
Gabriella Andrade, Dalvan Griebler, Rodrigo Santos, Christoph Kessler, August Ernstsson, and Luiz Gustavo Fernandes. 2022. Analyzing Programming Effort Model Accuracy of High-Level Parallel Programs for Stream Processing. In 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2022) (SEAA’22). IEEE, Gran Canaria, Spain, 229–232. DOI: 10.1109/SEAA56994.2022.00043
Niek Aukes. 2024. Hybrid compilation between GPGPU and CPU targets for Rust. [link]
Valerio Besozzi. 2024. PPL: Structured Parallel Programming Meets Rust. In 2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). 78–87. DOI: 10.1109/PDP62718.2024.00019
Gianpiero Cabodi, Paolo Camurati, Alessandro Garbo, Michele Giorelli, Stefano Quer, and Francesco Savarese. 2019. A Smart Many-Core Implementation of a Motion Planning Framework along a Reference Path for Autonomous Cars. Electronics 8, 2 (2019). DOI: 10.3390/electronics8020177
Shiyi Chen and Gary D. Doolen. 1998. LATTICE BOLTZMANN METHOD FOR FLUID FLOWS. Annual Review of Fluid Mechanics 30, Volume 30, 1998 (1998), 329–364. DOI: 10.1146/annurev.fluid.30.1.329
Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2020. An overview of end-to-end entity resolution for big data. ACM Computing Surveys (CSUR) 53, 6 (2020), 1–42.
Kees Cook. 2022. Git Pull that introduces Rust to the Linux Kernel. [link]
Andre Rauber Du Bois and Gerson Cavalheiro. 2023. GPotion: An Embedded DSL for GPU Programming in Elixir. In Proceedings of the XXVII Brazilian Symposium on Programming Languages (
EmbarkStudios. 2025. rust-gpu. [link]
Leonardo Faé and Dalvan Griebler. 2024. An internal domain-specific language for expressing linear pipelines: a proof-of-concept with MPI in Rust. In Anais do XXVIII Simpósio Brasileiro de Linguagens de Programação (SBLP’24). SBC, Curitiba/PR, 81–90. DOI: 10.5753/sblp.2024.3691
Leonardo Faé, Renato Barreto Hoffmann, and Dalvan Griebler. 2023. Sourceto-Source Code Transformation on Rust for High-Level Stream Parallelism. In XXVII Brazilian Symposium on Programming Languages (SBLP) (SBLP’23). ACM, Campo Grande, Brazil, 41–49. DOI: 10.1145/3624309.3624320
Dalvan Griebler, Marco Danelutto, Massimo Torquati, and Luiz Gustavo Fernandes. 2017. SPar: A DSL for High-Level and Productive Stream Parallelism. Parallel Processing Letters 27, 01 (March 2017), 1740005. DOI: 10.1142/S0129626417400059
Maurice H. Halstead. 1977. Elements of Software Science (Operating and programming systems series). Elsevier Science Inc., USA.
Bowen He, Xiao Zheng, Yuan Chen,Weinan Li, Yajin Zhou, Xin Long, Pengcheng Zhang, Xiaowei Lu, Linquan Jiang, Qiang Liu, Dennis Cai, and Xiantao Zhang. 2023. DxPU: Large-scale Disaggregated GPU Pools in the Datacenter. ACM Trans. Archit. Code Optim. 20, 4, Article 55 (Dec. 2023), 23 pages. DOI: 10.1145/3617995
Eric Holk, Milinda Pathirage, Arun Chauhan, Andrew Lumsdaine, and Nicholas D. Matsakis. 2013. GPU programming in rust: Implementing high-level abstractions in a systems-level language. In Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013. 315 – 324. DOI: 10.1109/IPDPSW.2013.173 Cited by: 15.
Hartmut Kaiser, Patrick Diehl, Adrian S. Lemoine, Bryce Adelstein Lelbach, Parsa Amini, Agustín Berge, John Biddiscombe, Steven R. Brandt, Nikunj Gupta, Thomas Heller, Kevin Huck, Zahra Khatami, Alireza Kheirkhahan, Auriane Reverdell, Shahrzad Shirzad, Mikael Simberg, Bibek Wagle, Weile Wei, and Tianyi Zhang. 2020. HPX - The C++ Standard Library for Parallelism and Concurrency. Journal of Open Source Software 5, 53 (2020), 2352. DOI: 10.21105/joss.02352
Nick Kanopoulos, Nagesh Vasanthavada, and Robert L Baker. 1988. Design of an image edge detection filter using the Sobel operator. IEEE Journal of solid-state circuits 23, 2 (1988), 358–367.
Khronos Group. 2025. The OpenCL Specification. [link]
Khronos Group. 2025. SPIR-V Specification. [link]
Khronos Group. 2025. Vulkan 1.3.* - A Specification (with all registered extensions). [link]
David B Kirk and W Hwu Wen-Mei. 2016. Programming massively parallel processors: a hands-on approach. Morgan kaufmann.
S. Klabnik and C. Nichols. 2023. The Rust Programming Language, 2nd Edition. No Starch Press. [link]
Nikolay Kondratyuk, Vsevolod Nikolskiy, Daniil Pavlov, and Vladimir Stegailov. 2021. GPU-accelerated molecular dynamics: State-of-art software performance and porting from Nvidia CUDA to AMD HIP. The International Journal of High Performance Computing Applications 35, 4 (2021), 312–324.
Michael McCool, James Reinders, and Arch Robison. 2012. Structured parallel programming: patterns for efficient computation. Elsevier.
Tan D Ngo, Tuyen T Bui, Tuan M Pham, Hong TB Thai, Giang L Nguyen, and Tu N Nguyen. 2021. Image deconvolution for optical small satellite with deep learning and real-time GPU acceleration. Journal of Real-Time Image Processing 18, 5 (2021), 1697–1710.
NVIDIA. 2024. CUDA C++ Programming Guide. NVIDIA.
NVIDIA, Péter Vingelmann, and Frank H.P. Fitzek. 2025. CUDA, release: 12.6. [link]
Ricardo Pieper, Dalvan Griebler, and Luiz G. Fernandes. 2019. Structured Stream Parallelism for Rust. In XXIII Brazilian Symposium on Programming Languages (SBLP) (SBLP’19). ACM, Salvador, Brazil, 54–61. DOI: 10.1145/3355378.3355384
Ricardo Pieper, Júnior Löff, Renato Berreto Hoffmann, Dalvan Griebler, and Luiz Gustavo Fernandes. 2021. High-level and Efficient Structured Stream Parallelism for Rust on Multi-cores. Journal of Computer Languages 65 (July 2021), 101054. DOI: 10.1016/j.cola.2021.101054
Rayon. 2025. Rayon. [link]
Dinei André Rockenbach. 2020. High-Level Programming Abstractions for Stream Parallelism on GPUs. Master’s Thesis. School of Technology - PPGCC - PUCRS, Porto Alegre, Brazil.
Dinei A. Rockenbach, Júnior Löff, Gabriell Araujo, Dalvan Griebler, and Luiz G. Fernandes. 2022. High-Level Stream and Data Parallelism in C++ for GPUs. In XXVI Brazilian Symposium on Programming Languages (SBLP) (SBLP’22). ACM, Uberlândia, Brazil, 41–49. DOI: 10.1145/3561320.3561327
Rust-GPU. 2025. Rust CUDA Project. [link]
J.P. Shen and M.H. Lipasti. 2005. Modern Processor Design: Fundamentals of Superscalar Processors. McGraw-Hill Companies,Incorporated. [link]
The Rust Project. 2025. The Rust Reference. [link]
The Rust Project. 2025. Rustonomicon: The Dark Arts of Advanced and Unsafe Rust Programming. [link]
Tokio. 2025. Tokio - The asynchronous runtime for the Rust programming language. [link]
Christian R. Trott, Damien Lebrun-Grandié, Daniel Arndt, Jan Ciesko, Vinh Dang, Nathan Ellingwood, Rahulkumar Gayatri, Evan Harvey, Daisy S. Hollman, Dan Ibanez, Nevin Liber, Jonathan Madsen, Jeff Miles, David Poliakoff, Amy Powell, Sivasankaran Rajamanickam, Mikael Simberg, Dan Sunderland, Bruno Turcksin, and Jeremiah Wilke. 2022. Kokkos 3: Programming Model Extensions for the Exascale Era. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2022), 805–817. DOI: 10.1109/TPDS.2021.3097283
