Mineração de Textos para Apoiar a Predição de Severidade de Relatórios de Incidentes: um Estudo de Viabilidade
Resumo
Devido à grande quantidade de relatórios de incidentes que são persistidos em Sistema de Rastreamento de Incidentes (SRI) e a necessidade em priorizá-los conforme o tipo de severidade, faz-se necessário investigar ferramentas que apoiem a predição de severidade de relatórios de incidentes. Objetivo: Aplicar técnicas de Mineração de Textos (MT) e métodos de aprendizado para apoiar a predição de severidade de relatórios de incidentes a partir das descrições dos mesmos. Método: Um estudo de viabilidade foi conduzido para avaliar a aplicação de técnicas de pré-processamento e métodos de classificação. Resultados: O método de aprendizado semissupervisionado TCBHN apresentou bom desempenho em relação às demais abordagens. Conclusão: Utilização de redes heterogêneas bipartidas e métodos de classificação semissupervisionados para predição de severidade de relatórios de incidentes são promissores.Referências
Chapelle, O., Schölkopf, B., and Zien, A., editors (2006). Semi-Supervised Learning. MIT Press.
Ji, M., Sun, Y., Danilevsky, M., Han, J., and Gao, J. (2010). Graph regularized transductive classification on heterogeneous information networks. In Proc. of the European Conf. on Machine Learning and Knowledge Discovery in Databases, pages 570–586. Springer-Verlag.
Jung, W., Lee, E., and Wu, C. (2012). A survey on mining software repositories. IEICE Transactions on Information and Systems, E95.D(5):1384–1406.
Koch, K.-R. (1990). Bayes’ theorem. In Bayesian Inference with Geodetic Applications, pages 4–8. Springer.
Lamkanfi, A., Demeyer, S., Giger, E., and Goethals, B. (2010). Predicting the severity of a reported bug. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pages 1–10.
Lamkanfi, A., Demeyer, S., Soetens, Q. D., and Verdonck, T. (2011). Comparing mining algorithms for predicting the severity of a reported bug. In 2011 15th European Conference on Software Maintenance and Reengineering, pages 249–258.
Lamkanfi, A., Pérez, J., and Demeyer, S. (2013). The eclipse and mozilla defect tracking dataset: a genuine dataset for mining bug information. In 2013 10th Working Conference on Mining Software Repositories (MSR), pages 203–206.
Quinlan, J. R. (1993). C4.5:Programs for Machine Learning, volume 1. M.Kaufmann.
Rish, I. (2001). An empirical study of the naive bayes classifier. In IJCAI-Workshop Empirical Methods in Artificial Intelligence, volume 3, pages 41–46. IBM New York.
Rossi, R. G., Lopes, A. A., and Rezende, S. O. (2016). Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts. Information Processing & Management, 52(2):217 – 257.
Saha, R. K., Lawall, J., Khurshid, S., and Perry, D. E. (2015). Are these bugs really normal? In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pages 258–268.
Shull, F., Carver, J., and Travassos, G. H. (2001). An empirical methodology for introducing software processes. In Proceedings of the 8th European Software Engineering Conference Held Jointly with 9th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-9, pages 288–296, New York, NY, USA. ACM.
Sokolova, M. and Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4):427–437.
Tan, P.-N., Steinbach, M., and Kumar, V. (2005). Introduction to Data Mining. Addison-Wesley.
Tian, Y., Lo, D., and Sun, C. (2012). Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In 19th Working Conference on Reverse Engineering, pages 215–224.
Vapnik, V. N. (1995). The nature of statistical learning theory.
Xia, X., Lo, D., Shihab, E., Wang, X., and Yang, X. (2015). Elblocker: Predicting blocking bugs with ensemble imbalance learning. Information and Software Technology, 61:93 – 106.
Yin, Z., Li, R., Mei, Q., and Han, J. (2009). Exploring social tagging graph for web object classification. In Proc. of the Int. Conf. on Knowledge Discovery and Data Mining, pages 957–966.
Zhang, T., Chen, J., Yang, G., Lee, B., and Luo, X. (2016). Towards more accurate severity prediction and fixer recommendation of software bugs. J. Syst. Softw., 117(C):166–184.
Zhou, D., Bousquet, O., Lal, T. N., Weston, J., and Schölkopf, B. (2004). Learning with local and global consistency. In Advances in Neural Information Processing Systems, volume 16, pages 321–328.
Zhou, Y., Tong, Y., Gu, R., and Gall, H. (2014). Combining text mining and data mining for bug report classification. In 2014 IEEE International Conference on Software Maintenance and Evolution, pages 311–320.
Zhu, X. and Goldberg, A. B. (2009). Introduction to semi-supervised learning. Morgan and Claypool Publishers.
Ji, M., Sun, Y., Danilevsky, M., Han, J., and Gao, J. (2010). Graph regularized transductive classification on heterogeneous information networks. In Proc. of the European Conf. on Machine Learning and Knowledge Discovery in Databases, pages 570–586. Springer-Verlag.
Jung, W., Lee, E., and Wu, C. (2012). A survey on mining software repositories. IEICE Transactions on Information and Systems, E95.D(5):1384–1406.
Koch, K.-R. (1990). Bayes’ theorem. In Bayesian Inference with Geodetic Applications, pages 4–8. Springer.
Lamkanfi, A., Demeyer, S., Giger, E., and Goethals, B. (2010). Predicting the severity of a reported bug. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pages 1–10.
Lamkanfi, A., Demeyer, S., Soetens, Q. D., and Verdonck, T. (2011). Comparing mining algorithms for predicting the severity of a reported bug. In 2011 15th European Conference on Software Maintenance and Reengineering, pages 249–258.
Lamkanfi, A., Pérez, J., and Demeyer, S. (2013). The eclipse and mozilla defect tracking dataset: a genuine dataset for mining bug information. In 2013 10th Working Conference on Mining Software Repositories (MSR), pages 203–206.
Quinlan, J. R. (1993). C4.5:Programs for Machine Learning, volume 1. M.Kaufmann.
Rish, I. (2001). An empirical study of the naive bayes classifier. In IJCAI-Workshop Empirical Methods in Artificial Intelligence, volume 3, pages 41–46. IBM New York.
Rossi, R. G., Lopes, A. A., and Rezende, S. O. (2016). Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts. Information Processing & Management, 52(2):217 – 257.
Saha, R. K., Lawall, J., Khurshid, S., and Perry, D. E. (2015). Are these bugs really normal? In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pages 258–268.
Shull, F., Carver, J., and Travassos, G. H. (2001). An empirical methodology for introducing software processes. In Proceedings of the 8th European Software Engineering Conference Held Jointly with 9th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-9, pages 288–296, New York, NY, USA. ACM.
Sokolova, M. and Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4):427–437.
Tan, P.-N., Steinbach, M., and Kumar, V. (2005). Introduction to Data Mining. Addison-Wesley.
Tian, Y., Lo, D., and Sun, C. (2012). Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In 19th Working Conference on Reverse Engineering, pages 215–224.
Vapnik, V. N. (1995). The nature of statistical learning theory.
Xia, X., Lo, D., Shihab, E., Wang, X., and Yang, X. (2015). Elblocker: Predicting blocking bugs with ensemble imbalance learning. Information and Software Technology, 61:93 – 106.
Yin, Z., Li, R., Mei, Q., and Han, J. (2009). Exploring social tagging graph for web object classification. In Proc. of the Int. Conf. on Knowledge Discovery and Data Mining, pages 957–966.
Zhang, T., Chen, J., Yang, G., Lee, B., and Luo, X. (2016). Towards more accurate severity prediction and fixer recommendation of software bugs. J. Syst. Softw., 117(C):166–184.
Zhou, D., Bousquet, O., Lal, T. N., Weston, J., and Schölkopf, B. (2004). Learning with local and global consistency. In Advances in Neural Information Processing Systems, volume 16, pages 321–328.
Zhou, Y., Tong, Y., Gu, R., and Gall, H. (2014). Combining text mining and data mining for bug report classification. In 2014 IEEE International Conference on Software Maintenance and Evolution, pages 311–320.
Zhu, X. and Goldberg, A. B. (2009). Introduction to semi-supervised learning. Morgan and Claypool Publishers.
Publicado
28/08/2017
Como Citar
BARBOSA, Jacson Rodrigues; MATSUNO, Ivone Penque; GUIMARÃES, Eduardo R.; REZENDE, Solange Oliveira; VINCENZI, Auri M. R.; DELAMARO, Márcio E..
Mineração de Textos para Apoiar a Predição de Severidade de Relatórios de Incidentes: um Estudo de Viabilidade. In: SIMPÓSIO BRASILEIRO DE QUALIDADE DE SOFTWARE (SBQS), 16. , 2017, Rio de Janeiro.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2017
.
p. 89-103.
DOI: https://doi.org/10.5753/sbqs.2017.15094.