Uma Estratégia Hierárquica para Sistemas de Múltiplos Classificadores Distribuídos
Resumo
A distribuição natural dos dados e questões de privacidade e segurança justificam a necessidade de sistemas eficientes para lidar com dados distribuídos. Por outro lado, a escassez de dados é um desafio para o aprendizado de máquina, o que pode ser mitigado com uma abordagem distribuída. Este trabalho utiliza uma topologia virtual hierárquica baseada em hipercubos para organizar a troca e agrupamento de resultados de treinamento distribuídos, aumentando a acurácia dos modelos finais e oferecendo uma solução tolerante a falhas. Resultados experimentais confirmam a eficácia da técnica, com melhoria dos resultados em todos os cenários simulados.Referências
Alpcan, T. and Bauckhage, C. (2009). A distributed machine learning framework. In Proc. of the 48h IEEE Conf. on Decision and Control (CDC), pages 2546–2551. IEEE.
Britto Jr, A. S., Sabourin, R., and Oliveira, L. E. (2014). Dynamic selection of classifiers—a comprehensive review. Pattern recognition, 47(11):3665–3680.
Chen, S. (2018). A distributed algorithm for machine learning. AIP Conference Proceedings, 1955(1):040079.
de Araujo, J. P., Arantes, L., Duarte, E. P., Rodrigues, L. A., and Sens, P. (2019). VCube-PS: A causal broadcast topic-based publish/subscribe system. JPDC, 125:18–30.
Duarte, E. P., Bona, L. C. E., and Ruoso, V. K. (2014). VCube: A provably scalable distributed diagnosis algorithm. In 2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pages 17–22.
Duarte, E. P., Weber, A., and Fonseca, K. V. (2011). Distributed diagnosis of dynamic events in partitionable arbitrary topology networks. IEEE TPDS, 23(8):1415–1426.
Duarte Jr., E. P., Rodrigues, L. A., Camargo, E. T., and Turchetti, R. C. (2023). The missing piece: a distributed system-level diagnosis model for the implementation of unreliable failure detectors. Computing, 105.
Eibe, F., Hall, M. A., and Witten, I. H. (2016). The weka workbench. online appendix for data mining: practical machine learning tools and techniques. In Morgan Kaufmann. Morgan Kaufmann Publishers San Francisco, California.
Gunes, V., Ménard, M., Loonis, P., and Petit-Renaud, S. (2003). Combination, cooperation and selection of classifiers: A state of the art. Int’l Journal of Pattern Recognition and Artificial Intelligence, 17:1303–1324.
Guo, H. and Zhang, J. (2016). A distributed and scalable machine learning approach for big data. In IJCAI, pages 1512–1518.
Gupta, V., Luqman, A., Chattopadhyay, N., Chattopadhyay, A., and Niyato, D. (2022). Travellingfl: Communication efficient peer-to-peer federated learning.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The elements of statistical learning: data mining, inference and prediction. Springer, 2 edition.
Hegedűs, I., Danner, G., and Jelasity, M. (2021). Decentralized learning works: An empirical comparison of gossip learning and federated learning. JPDC, 148:109–124.
Hu, H., Wang, D., and Wu, C. (2020). Distributed machine learning through heterogeneous edge systems. In AAAI Conf. on Artif. Intelligence, volume 34, pages 7179–7186.
Jeanneau, D., Rodrigues, L. A., Arantes, L., and Duarte Jr, E. P. (2017). An autonomic hierarchical reliable broadcast protocol for asynchronous distributed systems with failure detection. Journal of the Brazilian Computer Society, 23:1–14.
Kittler, J., HATEF, M., DUIN, R., and MATAS, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226–239.
Ko, A. H., Sabourin, R., and Britto Jr, A. S. (2008). From dynamic classifier selection to dynamic ensemble selection. Pattern recognition, 41(5):1718–1731.
Leban, G., Zupan, B., Vidmar, G., and Bratko, I. (2006). Vizrank: Data visualization guided by machine learning. Data Mining and Knowledge Discovery, 13(2):119–136.
Lu, H., Li, M.-J., He, T., Wang, S., Narayanan, V., and Chan, K. S. (2020). Robust coreset construction for distributed machine learning. IEEE Journal on Selected Areas in Communications, 38(10):2400–2417.
Malekzadeh, M., Clegg, R. G., Cavallaro, A., and Haddadi, H. (2019). Mobile sensor data anonymization. In Proceedings of the International Conference on Internet of Things Design and Implementation, IoTDI ’19, pages 49–58, New York, NY, USA. ACM.
Marsland, S. (2014). MACHINE LEARNING. Chapman & Hall/CRC, New York.
Mu, Z. (2014). Optimization of inter-network bandwidth resources for large-scale data transmission. JOURNAL OF NETWORKS, 9(3):689–694.
Peteiro-Barral, D. and Guijarro-Berdiñas, B. (2013). A survey of methods for distributed machine learning. Progress in Artificial Intelligence, 2(1):1–11.
Rodrigues, L. A., Arantes, L., and Duarte, E. P. (2016). An autonomic majority quorum system. In 30th AINA, pages 524–531. IEEE.
Rodrigues, L. A., Duarte Jr, E. P., de Araujo, J. P., Arantes, L., and Sens, P. (2018). Bundling messages to reduce the cost of tree-based broadcast algorithms. In LADC, pages 115–124. IEEE.
Schmeing, E., Brun, A. L., and Silva, R. A. (2022). Dynamic selection of classifiers based on complexity measures. In 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI), pages 82–89.
Tsoumakas, G. and Vlahavas, I. (2009). Distributed data mining. In Database technologies: Concepts, methodologies, tools, and applications, pages 157–164. IGI Global.
Verbraeken, J. et al. (2020). A survey on distributed machine learning. ACM computing surveys, 53(2):1–33.
Wang, H., Niu, D., and Li, B. (2019). Distributed machine learning with a serverless architecture. In IEEE INFOCOM, pages 1288–1296. IEEE.
Zaharia, M. et al. (2016). Apache spark: A unified engine for big data processing. Communications of the ACM, 59(11):56 – 65.
Britto Jr, A. S., Sabourin, R., and Oliveira, L. E. (2014). Dynamic selection of classifiers—a comprehensive review. Pattern recognition, 47(11):3665–3680.
Chen, S. (2018). A distributed algorithm for machine learning. AIP Conference Proceedings, 1955(1):040079.
de Araujo, J. P., Arantes, L., Duarte, E. P., Rodrigues, L. A., and Sens, P. (2019). VCube-PS: A causal broadcast topic-based publish/subscribe system. JPDC, 125:18–30.
Duarte, E. P., Bona, L. C. E., and Ruoso, V. K. (2014). VCube: A provably scalable distributed diagnosis algorithm. In 2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pages 17–22.
Duarte, E. P., Weber, A., and Fonseca, K. V. (2011). Distributed diagnosis of dynamic events in partitionable arbitrary topology networks. IEEE TPDS, 23(8):1415–1426.
Duarte Jr., E. P., Rodrigues, L. A., Camargo, E. T., and Turchetti, R. C. (2023). The missing piece: a distributed system-level diagnosis model for the implementation of unreliable failure detectors. Computing, 105.
Eibe, F., Hall, M. A., and Witten, I. H. (2016). The weka workbench. online appendix for data mining: practical machine learning tools and techniques. In Morgan Kaufmann. Morgan Kaufmann Publishers San Francisco, California.
Gunes, V., Ménard, M., Loonis, P., and Petit-Renaud, S. (2003). Combination, cooperation and selection of classifiers: A state of the art. Int’l Journal of Pattern Recognition and Artificial Intelligence, 17:1303–1324.
Guo, H. and Zhang, J. (2016). A distributed and scalable machine learning approach for big data. In IJCAI, pages 1512–1518.
Gupta, V., Luqman, A., Chattopadhyay, N., Chattopadhyay, A., and Niyato, D. (2022). Travellingfl: Communication efficient peer-to-peer federated learning.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The elements of statistical learning: data mining, inference and prediction. Springer, 2 edition.
Hegedűs, I., Danner, G., and Jelasity, M. (2021). Decentralized learning works: An empirical comparison of gossip learning and federated learning. JPDC, 148:109–124.
Hu, H., Wang, D., and Wu, C. (2020). Distributed machine learning through heterogeneous edge systems. In AAAI Conf. on Artif. Intelligence, volume 34, pages 7179–7186.
Jeanneau, D., Rodrigues, L. A., Arantes, L., and Duarte Jr, E. P. (2017). An autonomic hierarchical reliable broadcast protocol for asynchronous distributed systems with failure detection. Journal of the Brazilian Computer Society, 23:1–14.
Kittler, J., HATEF, M., DUIN, R., and MATAS, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226–239.
Ko, A. H., Sabourin, R., and Britto Jr, A. S. (2008). From dynamic classifier selection to dynamic ensemble selection. Pattern recognition, 41(5):1718–1731.
Leban, G., Zupan, B., Vidmar, G., and Bratko, I. (2006). Vizrank: Data visualization guided by machine learning. Data Mining and Knowledge Discovery, 13(2):119–136.
Lu, H., Li, M.-J., He, T., Wang, S., Narayanan, V., and Chan, K. S. (2020). Robust coreset construction for distributed machine learning. IEEE Journal on Selected Areas in Communications, 38(10):2400–2417.
Malekzadeh, M., Clegg, R. G., Cavallaro, A., and Haddadi, H. (2019). Mobile sensor data anonymization. In Proceedings of the International Conference on Internet of Things Design and Implementation, IoTDI ’19, pages 49–58, New York, NY, USA. ACM.
Marsland, S. (2014). MACHINE LEARNING. Chapman & Hall/CRC, New York.
Mu, Z. (2014). Optimization of inter-network bandwidth resources for large-scale data transmission. JOURNAL OF NETWORKS, 9(3):689–694.
Peteiro-Barral, D. and Guijarro-Berdiñas, B. (2013). A survey of methods for distributed machine learning. Progress in Artificial Intelligence, 2(1):1–11.
Rodrigues, L. A., Arantes, L., and Duarte, E. P. (2016). An autonomic majority quorum system. In 30th AINA, pages 524–531. IEEE.
Rodrigues, L. A., Duarte Jr, E. P., de Araujo, J. P., Arantes, L., and Sens, P. (2018). Bundling messages to reduce the cost of tree-based broadcast algorithms. In LADC, pages 115–124. IEEE.
Schmeing, E., Brun, A. L., and Silva, R. A. (2022). Dynamic selection of classifiers based on complexity measures. In 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI), pages 82–89.
Tsoumakas, G. and Vlahavas, I. (2009). Distributed data mining. In Database technologies: Concepts, methodologies, tools, and applications, pages 157–164. IGI Global.
Verbraeken, J. et al. (2020). A survey on distributed machine learning. ACM computing surveys, 53(2):1–33.
Wang, H., Niu, D., and Li, B. (2019). Distributed machine learning with a serverless architecture. In IEEE INFOCOM, pages 1288–1296. IEEE.
Zaharia, M. et al. (2016). Apache spark: A unified engine for big data processing. Communications of the ACM, 59(11):56 – 65.
Publicado
24/05/2024
Como Citar
SALLES, Charles Giovane de; BRUN, André Luiz; RODRIGUES, Luiz Antonio.
Uma Estratégia Hierárquica para Sistemas de Múltiplos Classificadores Distribuídos. In: WORKSHOP DE TESTES E TOLERÂNCIA A FALHAS (WTF), 25. , 2024, Niterói/RJ.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 15-28.
ISSN 2595-2684.
DOI: https://doi.org/10.5753/wtf.2024.2517.