Energy Efficiency Evaluation of Multi-level Parallelism on Low Power Processors
Resumo
Energy efficiency and consumption are becoming major concerns in HPC area. One considered alternative to reach better energy efficiency has been the use of unconventional architectures in the HPC scenario, e.g., embedded and mobile processors. In this paper, we present an evaluation about the use of multi-level parallelism in two low-power architectures: Intel Atom and ARM Cortex-A9. Our results show that for all tested cases Intel Atom outperforms ARM Cortex-A9 in terms of execution time and Energy-Delay Product.Referências
(2013). CACTI 6.0. http://www.cs.utah.edu/˜rajeev/cacti6/.
ARM Ltd (2013). Cortex-A Series. http://www.arm.com/products/processors/cortex-a/index.php.
Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., et al. (2009). A view of the parallel computing landscape. Communications of the ACM, 52(10):56–67.
Barroso, L. A. (2005). The price of performance. Queue, 3(7):48–53.
Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hiller, J., Karp, S., Keckler, S., Klein, D., Lucas, R., Richards, M., Scarpelli, A., Scott, S., Snavely, A., Sterling, T., Williams, R. S., and Yelick, K. (2008). Exascale computing study: Technology challenges in achieving exascale systems. Technical report.
Blem, E., Menon, J., and Sankaralingam, K. (2013). Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), pages 1–12. IEEE Computer Society.
Frumkin, M., Jin, H., and Yan, J. (1998). Implementation of NAS Parallel Benchmarks in High Performance Fortran. NAS Techinical Report NAS-98-009.
Frumkin, M., Schultz, M., Jin, H., and Yan, J. (2003). Performance and Scalability of the NAS Parallel Benchmarks in Java. In Parallel and Distributed Processing Symposium (IPDPS'03), 2003. Proceedings of the International, pages 1–6.
Intel (2013). Intel Atom Processor. http://www.intel.com/content/www/us/en/processors/atom/atom-processor.html.
Jarus, M., Varrette, S., Oleksiak, A., and Bouvry, P. (2013). Performance Evaluation and Energy Efciency of High-Density HPC Platforms Based on Intel, AMD and ARM Processors. In Pierson, J.-M., Da Costa, G., and Dittmann, L., editors, Energy Efciency in Large Scale Distributed Systems, Lecture Notes in Computer Science, pages 182–200. Springer Berlin Heidelberg.
Jin, H. and der Wijngaart, R. F. V. (2006). Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks. Journal of Parallel and Distributed Computing, 66(5):674 – 685. IPDPS '04 Special Issue 18th International Parallel and Distributed Processing Symposium.
Jin, H., Frumkin, M., and Yan, J. (1999). The OpenMP implementation of NAS Parallel Benchmarks and its performance. Technical report, Technical Report NAS-99-011, NASA Ames Research Center.
Muralimanohar, N., Balasubramonian, R., and Jouppi, N. P. (2009). CACTI 6.0: A Tool to Understand Large Caches. Technical report. http://www.cs.utah.edu/rajeev/cacti6/cacti6-tr.pdf.
NAS (2013). NAS Parallel Benchmarks. http://www.nas.nasa.gov/Software/NPB/.
Ou, Z., Pang, B., Deng, Y., Nurminen, J. K., Ylä-Jääski, A., and Hui, P. (2012). Energy and Cost-Efciency Analysis of ARM-Based Clusters. In 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), pages 115–123. IEEE.
Padoin, E. L., de Oliveira, D. A., Velho, P., and Navaux, P. O. (2012). Time-to-Solution and Energy-to-Solution: A Comparison between ARM and Xeon. In 2012 Third Workshop on Applications for Multi-Core Architecture (WAMCA), pages 48–53. IEEE Computer Society.
Rajovic, N., Puzovic, N., Vilanova, L., Villavieja, C., and Ramirez, A. (2011). The Low-In Proceedings of the Power Architecture Approach Towards Exascale Computing. second workshop on Scalable algorithms for large-scale systems ScalA '11, page 1, New York, New York, USA. ACM Press.
Rajovic, N., Rico, A., Puzovic, N., Adeniyi-Jones, C., and Ramirez, A. (2013a). Tibid- abo1: Making the case for an ARM-based HPC system. Future Generation Computer Systems.
Rajovic, N., Rico, A., Vipond, J., Gelado, I., Puzovic, N., and Ramirez, A. (2013b). Experiences with Mobile Processors for Energy Efcient HPC. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, pages 464–468, New Jersey. IEEE Conference Publications.
Roberts-Hoffman, K. and Hegde, P. (2009). ARM Cortex-A8 vs. Intel Atom: Architectural and Benchmark Comparisons. Technical report.
Saphir, W., Van der Wijngaart, R. F., Woo, A., and Yarrow, M. (1997). New Implementations and Results for the NAS Parallel Benchmarks 2. In Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientic Computing, PPSC 1997.
Stanley-Marbell, P. and Cabezas, V. C. (2011). Performance, Power, and Thermal Analysis of Low-Power Processors for Scale-Out Systems. Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pages 863–870. IEEE.
Van der Wijngaart, R. F. and Jin, H. (2003). NAS Parallel Benchmarks, Multi-Zone versions. NASA Ames Research Center, Tech. Rep. NAS-03-010.
Wehner, M., Oliker, L., and Shalf, J. (2009). A Real Cloud Computer. IEEE Spectrum, 46(10):24–29.
ARM Ltd (2013). Cortex-A Series. http://www.arm.com/products/processors/cortex-a/index.php.
Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., et al. (2009). A view of the parallel computing landscape. Communications of the ACM, 52(10):56–67.
Barroso, L. A. (2005). The price of performance. Queue, 3(7):48–53.
Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hiller, J., Karp, S., Keckler, S., Klein, D., Lucas, R., Richards, M., Scarpelli, A., Scott, S., Snavely, A., Sterling, T., Williams, R. S., and Yelick, K. (2008). Exascale computing study: Technology challenges in achieving exascale systems. Technical report.
Blem, E., Menon, J., and Sankaralingam, K. (2013). Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), pages 1–12. IEEE Computer Society.
Frumkin, M., Jin, H., and Yan, J. (1998). Implementation of NAS Parallel Benchmarks in High Performance Fortran. NAS Techinical Report NAS-98-009.
Frumkin, M., Schultz, M., Jin, H., and Yan, J. (2003). Performance and Scalability of the NAS Parallel Benchmarks in Java. In Parallel and Distributed Processing Symposium (IPDPS'03), 2003. Proceedings of the International, pages 1–6.
Intel (2013). Intel Atom Processor. http://www.intel.com/content/www/us/en/processors/atom/atom-processor.html.
Jarus, M., Varrette, S., Oleksiak, A., and Bouvry, P. (2013). Performance Evaluation and Energy Efciency of High-Density HPC Platforms Based on Intel, AMD and ARM Processors. In Pierson, J.-M., Da Costa, G., and Dittmann, L., editors, Energy Efciency in Large Scale Distributed Systems, Lecture Notes in Computer Science, pages 182–200. Springer Berlin Heidelberg.
Jin, H. and der Wijngaart, R. F. V. (2006). Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks. Journal of Parallel and Distributed Computing, 66(5):674 – 685. IPDPS '04 Special Issue 18th International Parallel and Distributed Processing Symposium.
Jin, H., Frumkin, M., and Yan, J. (1999). The OpenMP implementation of NAS Parallel Benchmarks and its performance. Technical report, Technical Report NAS-99-011, NASA Ames Research Center.
Muralimanohar, N., Balasubramonian, R., and Jouppi, N. P. (2009). CACTI 6.0: A Tool to Understand Large Caches. Technical report. http://www.cs.utah.edu/rajeev/cacti6/cacti6-tr.pdf.
NAS (2013). NAS Parallel Benchmarks. http://www.nas.nasa.gov/Software/NPB/.
Ou, Z., Pang, B., Deng, Y., Nurminen, J. K., Ylä-Jääski, A., and Hui, P. (2012). Energy and Cost-Efciency Analysis of ARM-Based Clusters. In 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), pages 115–123. IEEE.
Padoin, E. L., de Oliveira, D. A., Velho, P., and Navaux, P. O. (2012). Time-to-Solution and Energy-to-Solution: A Comparison between ARM and Xeon. In 2012 Third Workshop on Applications for Multi-Core Architecture (WAMCA), pages 48–53. IEEE Computer Society.
Rajovic, N., Puzovic, N., Vilanova, L., Villavieja, C., and Ramirez, A. (2011). The Low-In Proceedings of the Power Architecture Approach Towards Exascale Computing. second workshop on Scalable algorithms for large-scale systems ScalA '11, page 1, New York, New York, USA. ACM Press.
Rajovic, N., Rico, A., Puzovic, N., Adeniyi-Jones, C., and Ramirez, A. (2013a). Tibid- abo1: Making the case for an ARM-based HPC system. Future Generation Computer Systems.
Rajovic, N., Rico, A., Vipond, J., Gelado, I., Puzovic, N., and Ramirez, A. (2013b). Experiences with Mobile Processors for Energy Efcient HPC. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, pages 464–468, New Jersey. IEEE Conference Publications.
Roberts-Hoffman, K. and Hegde, P. (2009). ARM Cortex-A8 vs. Intel Atom: Architectural and Benchmark Comparisons. Technical report.
Saphir, W., Van der Wijngaart, R. F., Woo, A., and Yarrow, M. (1997). New Implementations and Results for the NAS Parallel Benchmarks 2. In Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientic Computing, PPSC 1997.
Stanley-Marbell, P. and Cabezas, V. C. (2011). Performance, Power, and Thermal Analysis of Low-Power Processors for Scale-Out Systems. Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pages 863–870. IEEE.
Van der Wijngaart, R. F. and Jin, H. (2003). NAS Parallel Benchmarks, Multi-Zone versions. NASA Ames Research Center, Tech. Rep. NAS-03-010.
Wehner, M., Oliker, L., and Shalf, J. (2009). A Real Cloud Computer. IEEE Spectrum, 46(10):24–29.
Publicado
28/07/2014
Como Citar
PINTO, Vinícius; LORENZON, Arthur; BECK, Antonio Carlos; MAILLARD, Nicolas; NAVAUX, Philippe.
Energy Efficiency Evaluation of Multi-level Parallelism on Low Power Processors. In: WORKSHOP EM DESEMPENHO DE SISTEMAS COMPUTACIONAIS E DE COMUNICAÇÃO (WPERFORMANCE), 13. , 2014, Brasília.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2014
.
p. 14-25.
ISSN 2595-6167.