Identification and Characterization of Memory Allocation Anomalies in High-Performance Computing Applications
A memory allocation anomaly occurs when the allocation of a set of heap blocks imposes an unnecessary overhead on the execution of an application. In this paper, we propose a method for identifying, locating, characterizing and fixing allocation anomalies, and a tool for developers to apply the method. We experiment our method and tool with a numerical simulator aimed at approximating the solutions to partial differential equations using a finite element method. We show that taming allocation anomalies in this simulator reduces the memory footprint of its processes by 37.27% and the execution time by 16.52%. We conclude that the developer of high-performance computing applications can benefit from the method and tool during the software development cycle.
Appelbe, B. and Bergmark, D. (1996). Software tools for high performance computing: Survey and recommendations. Scientific Programming, 5:239–249.
Araya, R., Harder, C., Paredes, D., and Valentin, F. (2013). Multiscale hybrid-mixed method. SIAM Journal on Numerical Analysis, 51(6):3505–3531.
Arndt, D., Bangerth, W., Clevenger, T. C., Davydov, D., Fehling, M., Garcia-Sanchez, D., Harper, G., Heister, T., Heltai, L., Kronbichler, M., Kynch, R. M., Maier, M., Pelteret, J.-P., Turcksin, B., and Wells, D. (2019). The deal.II library, version 9.1. Journal of Numerical Mathematics.
Belady, L. A., Nelson, R. A., and Shedler, G. S. (1969). An anomaly in space-time characteristics of certain programs running in a paging machine. Communications of the ACM, 12(6):349–353.
Berger, E. D., McKinley, K. S., Blumofe, R. D., and Wilson, P. R. (2000). Hoard: A scalable memory allocator for mumtithreaded applications. In 9th International Conferences on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), pages 117–128, Cambridge, MA, USA.
Boehm, H. (1995). Dynamic memory allocation and garbage collection. Computers in Physics, 9:297–393.
Ghemawat, S. (2019). Gperftools Heap Profiler. https://gperftools.github.io/gperftools/heapprofile.html.
Ghemawat, S. and Menage, P. (2007). TCMalloc: Thread caching malloc. http://goog-perftools.sourceforge.net/doc/tcmalloc.html.
GNU Developer community (2019). The GNU C library (glibc). https://www.gnu.org/software/libc.
Gomes, A. T. A., Pereira, W. S., Valentin, F., and Paredes, D. (2017). On the implementation of a scalable simulator for multiscale hybrid-mixed methods. CoRR, abs/1703.10435.
Gropp, W. D. and Lumsdaine, A. (2006). Parallel Tools and Environments: A Survey, chapter 12, pages 223–232. SIAM.
Guo, C., Zhang, J., Zhang, Z., and Zhang, Y. (2013). Characterizing and detecting resource leaks in Android applications. In 28th IEEE/ACM International Conference on Automated Software Engineering (ASE’2013), pages 389–398, Palo Alto, CA, USA.
Hastings, R. and Joyce, B. (1992). Purify: Fast detection of memory leaks and access errors. In Winter USENIX Conference, pages 125–136, San Francisco, CA, USA.
Kirk, B. S., Peterson, J. W., Stogner, R. H., and Carey, G. F. (2006). libMesh: A C++ library for parallel adaptive mesh refinement/coarsening simulations. Engineering with Computers, 22(3–4):237–254.
Kukanov, A. and Voss, M. J. (2007). The foundations for scalable multi-core software in Intel Threading Building Blocks. Intel Technology Journal, 11(04):309–322.
Kukunas, J. (2015). Intel VTune Amplifier. In Kukunas, J., editor, Power and Performance: Software Analysis and Optimization. Elsevier.
Logg, A., Wells, G. N., and Hake, J. (2012). DOLFIN: a C++/Python finite element library. In Logg, A., Mardal, K.-A., and Wells, G., editors, Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book, pages 173–225. Springer, Berlin, Heidelberg.
Mitchell, N. (2013). Leaking space. Queue, 11(9):10:10–10:23.
Novark, G., Berger, E. D., and Zorn, B. G. (2009). Efficiently and precisely locating memory leaks and bloat. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09), pages 397–407, Dublin, Ireland.
Rathgeber, F., Ham, D. A., Mitchell, L., Lange, M., Luporini, F., Mcrae, A. T. T., Bercea, G.-T., Markall, G. R., and Kelly, P. H. J. (2016). Firedrake: Automating the finite element method by composing abstractions. ACM Transactions on Mathematical Software, 43(3):24:1–24:27.
Servat, H., Llort, G., Huck, K., Gimenez, J., and Labarta, J. (2013). Framework for a productive performance optimization. Parallel Computing, 39:336–353.
Seward, J., Nethercote, N., and Weidendorfer, J. (2015). Valgrind 3.11 Reference Manual.
Samurai Media Limited.
Supalov, A., Semin, A., Klemm, M., and DahnKen, C. (2014). Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops. Apress.
Wadler, P. (1987). Fixing some space leaks with a garbage collector. Software: Practice and Experience, 17(9):595–608.