OFG-STM: Transactional Memory for GPUs based on Obstruction-Free STM algorithms
Resumo
Transactional memory is a high-level programming abstraction for synchronizing concurrent tasks investigated in different architectures, such as multicores, distributed systems and GPUs. Software transactional memory systems (STMs), including those for GPUs, are usually implemented using a lock-based system, where the locks that protect memory locations being accessed by a transaction are acquired, either at commit time or during transaction execution, to guarantee the atomicity of the transaction’s changes to memory. Transactional memory systems can also be lock-free or obstruction-free. Still, such STM algorithms are complex to implement in languages that do not support garbage collection, as they rely on a structure called locator, which must be used to acquire ownership of transactional objects. Every time a transaction needs to access an object in write mode, a new locator is created to substitute the current locator of the object. As transactions can be aborted at any time, and concurrent transactions must read the locators of objects to access their current versions, it is difficult to know when exactly such a locator is not being used and can be freed, a common problem in designing lock-free data structures. This paper presents OFG-STM, an STM system for GPUs inspired by obstruction-free STM algorithms. OFG-STM takes advantage of the fact that GPU threads executing the same kernel execute the same code, and threads in a warp execute a kernel in lockstep, to introduce a garbage-collection phase every time a number of transactions has committed. This paper presents the design and implementation of the OFG-STM system, as well as experiments demonstrating its usability.
Referências
Trevor Alexander Brown. 2015. Reclaiming Memory for Lock-Free Data Structures: There has to be a BetterWay. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing (Donostia-San Sebastián, Spain) (PODC ’15). Association for Computing Machinery, NewYork, NY, USA, 261–270. DOI: 10.1145/2767386.2767436
Daniel Cederman, Philippas Tsigas, and Muhammad Tayyab Chaudhry. 2010. Towards a Software Transactional Memory for Graphics Processors . In Eurographics Symposium on Parallel Graphics and Visualization, James Ahrens, Kurt Debattista, and Renato Pajarola (Eds.). The Eurographics Association. DOI: 10.2312/EGPGV/EGPGV10/121-129
Sui Chen and Lu Peng. 2016. Efficient GPU hardware transactional memory through early conflict resolution. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 274–284. DOI: 10.1109/HPCA.2016.7446071
Sui Chen, Lu Peng, and Samuel Irving. 2017. Accelerating GPU hardware transactional memory with snapshot isolation. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 282–294. DOI: 10.1145/3079856.3080204
Dave Dice, Ori Shalev, and Nir Shavit. 2006. Transactional Locking II. In Distributed Computing, Shlomi Dolev (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 194–208.
Aleksandar Dragojević, Rachid Guerraoui, and Michal Kapalka. 2009. Stretching transactional memory. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (Dublin, Ireland) (PLDI ’09). Association for Computing Machinery, NewYork, NY, USA, 155–165. DOI: 10.1145/1542476.1542494
Pascal Felber, Christof Fetzer, and Torvald Riegel. 2008. Dynamic Performance Tuning of Word-Based Software Transactional Memory. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP (02 2008). DOI: 10.1145/1345206.1345241
Pascal Felber, Christof Fetzer, and Torvald Riegel. 2008. Dynamic Performance Tuning of Word-Based Software Transactional Memory. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP (02 2008). DOI: 10.1145/1345206.1345241
Sérgio Miguel Fernandes and João Cachopo. 2011. Lock-free and scalable multi-version software transactional memory. SIGPLAN Not. 46, 8 (feb 2011), 179–188. DOI: 10.1145/2038037.1941579
Wilson W. L. Fung and Tor M. Aamodt. 2013. Energy efficient GPU transactional memory via space-time optimizations. In 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 408–420.
WilsonW. L. Fung, Inderpreet Singh, Andrew Brownsword, and Tor M. Aamodt. 2011. Hardware transactional memory for GPU architectures. In 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 296–307.
Rachid Guerraoui and Paolo Romano (Eds.). 2015. Transactional Memory. Foundations, Algorithms, Tools, and Applications. LNCS, Vol. 8913. Springer.
Tim Harris, James Larus, and Ravi Rajwar. 2010. Transactional Memory, 2nd edition. Morgan and Claypool Publishers.
Maurice Herlihy, Victor Luchangco, and Mark Moir. 2006. A Flexible Framework for Implementing Software Transactional Memory. In 21st OOPSLA. ACM, 253–262.
Maurice Herlihy, Victor Luchangco, Mark Moir, and William N. Scherer. 2003. Software Transactional Memory for Dynamic-Sized Data Structures (PODC ’03). ACM, 92–101.
Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional memory: architectural support for lock-free data structures. SIGARCH Comput. Archit. News 21, 2 (may 1993), 289–300. DOI: 10.1145/173682.165164
Maurice Herlihy, Nir Shavit, Victor Luchangco, and Michael Spear. 2020. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
Anup Holey and Antonia Zhai. 2014. Lightweight Software Transactions on GPUs. In 2014 43rd International Conference on Parallel Processing. 461–470. DOI: 10.1109/ICPP.2014.55
Richard Jones, Antony Hosking, and Eliot Moss. 2023. The garbage collection handbook: the art of automatic memory management. CRC Press.
Virendra J. Marathe, William N. Scherer, and Michael L. Scott. 2005. Adaptive Software Transactional Memory. In DISC’05.
Maged M Michael. 2004. Hazard pointers: Safe memory reclamation for lock-free objects. IEEE Transactions on Parallel and Distributed Systems 15, 6 (2004), 491–504.
Diogo Nunes, Daniel Castro, and Paolo Romano. 2022. CSMV: A Highly Scalable Multi-Versioned Software Transactional Memory for GPUs. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 526–536. DOI: 10.1109/IPDPS53621.2022.00057
Jerônimo Ramos, Andre Rauber Du Bois, and Gerson Cavalheiro. 2023. Obstruction-Free Distributed Transactional Memory. In Proceedings of the XXVII Brazilian Symposium on Programming Languages (, Campo Grande, MS, Brazil,) (SBLP ’23). Association for Computing Machinery, New York, NY, USA, 33–40. DOI: 10.1145/3624309.3624316
Torvald Riegel, Christof Fetzer, and Pascal Felber. 2007. Time-based transactional memory with scalable time bases. In Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures (San Diego, California, USA) (SPAA ’07). Association for Computing Machinery, New York, NY, USA, 221–228. DOI: 10.1145/1248377.1248415
Mohamed M. Saad and Binoy Ravindran. 2012. Transactional Forwarding: Supporting Highly-Concurrent STM in Asynchronous Distributed Systems. In IEEE 24th SBAC-PAD. 219–226. DOI: 10.1109/SBAC-PAD.2012.36
William Scherer and Michael Scott. 2005. Contention management in dynamic software transactional memory. In Proceedings of PODC’05.
William N. Scherer and Michael L. Scott. 2005. Advanced contention management for dynamic software transactional memory. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Principles of Distributed Computing (Las Vegas, NV, USA) (PODC ’05). Association for Computing Machinery, New York, NY, USA, 240–248. DOI: 10.1145/1073814.1073861
Qi Shen, Craig Sharp, William Blewitt, Gary Ushaw, and Graham Morgan. 2015. PR-STM: Priority Rule Based Software Transactions for the GPU, Vol. 9233. 361–372. DOI: 10.1007/978-3-662-48096-0_28
Qi Shen, Craig Sharp, Richard Davison, Gary Ushaw, Rajiv Ranjan, Albert Y. Zomaya, and Graham Morgan. 2020. A general purpose contention manager for software transactions on the GPU. J. Parallel Distrib. Comput. 139, C (may 2020), 1–17. DOI: 10.1016/j.jpdc.2019.12.018
Konrad Siek and Paweł T. Wojciechowski. 2016. Atomic RMI: A Distributed Transactional Memory Framework. Int. Journal of Parallel Programming 44, 3 (01 Jun 2016), 598–619.
Michael F. Spear, Virendra J. Marathe, William N. Scherer, and Michael L. Scott. 2006. Conflict Detection and Validation Strategies for Software Transactional Memory. In DISC.
KaiWang, Don Fussell, and Calvin Lin. 2019. Fast Fine-Grained Global Synchronization on GPUs. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Providence, RI, USA) (ASPLOS ’19). Association for Computing Machinery, New York, NY, USA, 793–806. DOI: 10.1145/3297858.3304055
Yunlong Xu, RuiWang, Nilanjan Goswami, Tao Li, Lan Gao, and Depei Qian. 2014. Software Transactional Memory for GPU Architectures. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (Orlando, FL, USA) (CGO ’14). Association for Computing Machinery, New York, NY, USA, 1–10. DOI: 10.1145/2581122.2544139