Reavaliando o Conjunto de Aplicações STAMP em um Novo Hardware Transacional
Resumo
Nos últimos quatro anos, IBM® e Intel® disponibilizaram processadores com suporte para memória transacional. A maioria dos trabalhos avaliaram esses processadores usando as aplicações STAMP e consideraram apenas as causas de cancelamentos das aplicações como um todo. Neste trabalho, apresenta-se uma análise por transação do STAMP e contrasta-se diferentes métricas de desempenho para determinar os motivos fundamentais pelo baixo desempenho do HaswellTM em algumas aplicações. Em resumo, os resultados mostram que uma transação domina o tempo de execução e tem poucas efetivações porque excede a capacidade restrita do processador, ou gera muitos conflitos quando executada em hardware.
Referências
Dalessandro, L., Spear, M. F., and Scott, M. L. (2010). NOrec: Streamlining STM by abolishing In Proceedings of the 15th Symposium on Principles and Practice of ownership records. Parallel Programming, pages 67–78.
de Carvalho, J. P. L., Baldassin, A., and Azevedo, R. (2013). Reassessing the energy efciency of software transacional memory on commodity processors. In WSCAD-SSC 2013, Porto de Galinhas, Ipojuca, Pernambuco.
Dice, D., Lev, Y., Moir, M., and Nussbaum, D. (2009). Early experience with a commercial hardware transactional memory implementation. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 157–168.
Diegues, N., Romano, P., and Rodrigues, L. (2014). Virtues and limitations of commodity In Proceedings of the 23rd international conference on hardware transactional memory. Parallel architectures and compilation, pages 3–14. ACM.
Eswaran, K. P., Gray, J. N., Lorie, R. A., and Traiger, I. L. (1976). The notions of consistency and predicate locks in a database system. Communications of the ACM, 19(11):624–633.
Goel, B., Titos-Gil, R., Negi, A., Mckee, S., Stenstrom, P., et al. (2014). Performance and energy analysis of the restricted transactional memory implementation on haswell. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 615–624. IEEE.
Harris, T., Larus, J., and Rajwar, R. (2010). Transactional Memory. Morgan & Claypool Publishers, 2 edition.
Hong, S., Oguntebi, T., Casper, J., Bronson, N., Kozyrakis, C., and Olukotun, K. (2010). Eigenbench: A Simple Exploration Tool for Orthogonal TM Characteristics. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10), IISWC '10, pages 1–11, Washington, DC, USA. IEEE Computer Society.
Le, H., Guthrie, G., Williams, D., Michael, M., Frey, B., Starke, W., May, C., Odaira, R., and Nakaike, T. (2015). Transactional memory support in the ibm power8 processor. IBM Journal of Research and Development, 59(1):8–1.
Lee, C. Y. (1961). An algorithm for path connections and its applications. Electronic Computers, IRE Transactions on, (3):346–365.
Lomet, D. B. (1977). Process structuring, synchronization, and recovery using atomic actions. In Proceedings of an ACM conference on Language design for reliable software, pages 128– 137.
Marathe, V. J., Spear, M. F., Heriot, C., Acharya, A., Eisenstat, D., Scherer III, W. N., and Scott, M. L. (2006). Lowering the overhead of nonblocking software transactional memory. In Workshop on Languages, Compilers, and Hardware Support for Transactional Computing (TRANSACT).
Matveev, A. and Shavit, N. (2015). Reduced Hardware NOrec: A safe and scalable hybrid transactional memory. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 59–71, New York, NY, USA. ACM.
Minh, C. C., Chung, J., Kozyrakis, C., and Olukotun, K. (2008). STAMP: Stanford Transactional Applications for Multi-Processing. In Proceedings of the IEEE International Symposium on Workload Characterization, pages 35–46.
Nakaike, T., Odaira, R., Gaudet, M., Michael, M. M., and Tomari, H. (2015). Quantitative Comparison of Hardware Transactional Memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture, ISCA '15, pages 144–157, New York, NY, USA. ACM.
Ruppert, J. (1995). A delaunay renement algorithm for quality 2-dimensional mesh generation. Journal of algorithms, 18(3):548–585.
Schindewolf, M., Bihari, B., Gyllenhaal, J., Schulz, M., Wang, A., and Karl, W. (2012). What Scientic Applications Can Benet from Hardware Transactional Memory? In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 90:1–90:11, Los Alamitos, CA, USA. IEEE Computer Society Press.
Sutter, H. and Larus, J. (2005). Software and the concurrency revolution. Queue, 3(7):54–62.
Wall, D. W. (1991). Limits of instruction-level parallelism. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IV, pages 176–188, New York, NY, USA. ACM.
Wang, A., Gaudet, M., Wu, P., Amaral, J. N., Ohmacht, M., Barton, C., Silvera, R., and Michael, M. (2012). Evaluation of Blue Gene/Q hardware support for transactional memories. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pages 127–136.
Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. (1995). The SPLASH-2 programs: Characterization and methodological considerations. In ACM SIGARCH Computer Architecture News, volume 23, pages 24–36. ACM.
Wulf, W. A. and McKee, S. A. (1995). Hitting the memory wall: implications of the obvious. ACM SIGARCH computer architecture news, 23(1):20–24.
Xiang, L. and Scott, M. L. (2015). Software partitioning of hardware transactions. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pages 76–86, New York, NY, USA. ACM.
Yoo, R. M., Hughes, C. J., Lai, K., and Rajwar, R. (2013). Performance evaluation of intel R! transactional synchronization extensions for high-performance computing. In High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for, pages 1–11. IEEE.