W. Jradi, H. do Nascimento, and W. Martins. " A Fast and Generic GPU-Based Parallel Reduction Implementation", in Proceedings of the 19th Symposium on High-Performance Computing Systems, São Paulo, 2018, pp. 16-22.