Clusters Big Data utilizando Raspberry Pi e Apache Hadoop - Uma Quasi-Revisão Sistemática da Literatura
Resumo
Este trabalho tem como objetivo identificar como estão sendo desenvolvidos os clusters big data de baixo custo, utilizando Raspberry Pi e Apache Hadoop, e como os mesmos estão sendo validados e monitorados. Para tal fim, foi elaborada uma Quasi-Revisão Sistemática da Literatura (QRSL), resultando em 9 artigos relevantes aptos a responder 3 questões de pesquisa. A QRSL identificou que os modelos de Raspberry Pis mais utilizados no desenvolvimento dos clusters são a Raspberry Pi 4B e a Raspberry Pi 2B, e que para sua validação os benchmarks Terasort e Wordcount são os mais citados na literatura, seguidos da abordagem original do Map Reduce e o TestDFSIO. As 3 únicas ferramentas encontradas para monitoramento dos recursos do cluster foram a Ganglia, Grafana e a Prometheus.
Referências
A. Middleton and P. Solutions, "Hpcc systems: Introduction to hpcc (high-performance computing cluster)", White paper, LexisNexis Risk Solutions, 2011.
P. Giger, S. Srikugan, and B. L. Persaud, "A Raspberry Pi Cluster for Teaching Big-Data Analytics", Master's thesis, Universitat Zurich, 2020.
N. M. Mwasaga and M. Joy, "Implementing micro high performance computing (μhpc) artifact: Affordable hpc facilities for academia", in 2020 IEEE Frontiers in Education Conference (FIE). IEEE, 2020, pp. 1-9.
W. Hajji and F. P. Tso, "Understanding the performance of low power raspberry pi cloud for big data", Electronics (Switzerland), vol. 5, no. 2, 2016. [Online]. Available: https://doi.org/10.3390/electronics5020029
E. Lee, H. Oh, and D. Park, "Big Data Processing on Single Board Computer Clusters: Exploring Challenges and Possibilities", IEEE Access, vol. 9, pp. 142 551-142 565, 2021.
M. Bother and T. Rabl, "Scale-down experiments on tpcx-hs", in Proceedings of the International Workshop on Big Data in Emergent Distributed Environments, ser. BiDEDE ‘21. New York, NY, USA: Association for Computing Machinery, 2021. [Online]. Available: https://doi.org/10.1145/3460866.3461774
A. J. A. Neto, A. C. Neto, and E. D. Ordonez, "Low-cost clusters on big data - a systematic study", in Proceedings of the Euro American Conference on Telematics and Information Systems, ser. EATIS’22. New York, NY, USA: Association for Computing Machinery, 2022. [Online]. Available: https://doi.org/10.1145/3544538.3544635
A. Komninos, I. Simou, N. Gkorgkolis, and J. D. Garofalakis, "Performance of raspberry pi microclusters for edge machine learning in tourism", in AmI, 2019.
A. S. Foundation, "Apache hadoop", https://hadoop.apache.org, 2022.
B. Kitchenham, "Procedures for performing systematic reviews", Keele University Technical Report TR/SE-0401, vol. 33, 08 2004.
G. H. Travassos, P. S. M. dos Santos, P. G. Mian, P. G. M. Neto, and J. Biolchini, "An environment to support large scale experimentation in software engineering", in 13th IEEE International Conference on Engineering of Complex Computer Systems (iceccs 2008), 2008, pp. 193-202.
CAPES/MEC, “Portal de periodicos da capes,” http://www.periodicos.capes.gov.br/, 2022.
J. Lin, "Scaling down distributed infrastructure on wimpy machines for personal web archiving", in Proceedings of the 24th International Conference on World Wide Web, ser. WWW '15 Companion. New York, NY, USA: Association for Computing Machinery, 2015, p. 1351-1355. [Online]. Available: https://doi.org/10.1145/2740908.2741695
J. S. Turana, H. Sukoco, and W. A. Kusuma, "Hadoop performance analysis on raspberry pi for dna sequence alignment", TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 14, no. 3, pp. 1059-1066, 2016.
S. Bourhnane, M. R. Abid, R. Lghoul, K. Zine-Dine, N. Elkamoun, and D. Benhaddou, "Towards green data centers", in Sustainable Energy for Smart Cities, J. L. Afonso, V. Monteiro, and J. G. Pinto, Eds. Cham: Springer International Publishing, 2020, pp. 291-307.
K. Srinivasan, C. Y. Chang, C. H. Huang, M. H. Chang, A. Sharma, and A. Ankur, "An efficient implementation of mobile Raspberry Pi Hadoop clusters for Robust and Augmented computing performance", Journal of Information Processing Systems, vol. 14, no. 4, pp. 989-1009, 2018.
R. Scolati, I. Fronza, N. El Ioini, A. Samir, and C. Pahl, "A containerized big data streaming architecture for edge cloud computing on clustered single-board devices", CLOSER 2019 - Proceedings of the 9th International Conference on Cloud Computing and Services Science, no. May, pp. 68-80, 2019.
R. Scolati, I. Fronza, N. El Ioini, A. Samir, H. R. Barzegar, and C. Pahl, "A Containerized Edge Cloud Architecture for Data Stream Processing", Communications in Computer and Information Science, vol. 1218 CCIS, no. May, pp. 150-176, 2020.
SocialCompare, "Raspberry pi models comparison - comparison tables", Disponívell em: [link]. Acesso em: 07 de julho 2022, 2022.
S. Easterbrook, J. Singer, M.-A. Storey, and D. Damian, Selecting Empirical Methods for Software Engineering Research. London: Springer London, 2008, pp. 285-311.