Text Mining to Support Predicting Severity of Incident Reporting: A Feasibility Study
Abstract
Due to a large number of incident reports that are persistent in Bug Tracking Systems repositories and the need to prioritize them according to the type of severity, it is necessary to investigate tools that support the prediction of incident reports severity. Objective: Apply Text Mining (TM) techniques and learning methods to help the prediction of incident reports severity from their descriptions. Method: A viability study was conducted to evaluate the application of preprocessing techniques and classification methods. Results: The semi-supervised learning method TCBHN presented good performance concerning the other approaches. Conclusion: The use of two-way heterogeneous networks and semi-supervised classification methods for predicting the severity of incident reports are promising.References
Chapelle, O., Schölkopf, B., and Zien, A., editors (2006). Semi-Supervised Learning. MIT Press.
Ji, M., Sun, Y., Danilevsky, M., Han, J., and Gao, J. (2010). Graph regularized transductive classification on heterogeneous information networks. In Proc. of the European Conf. on Machine Learning and Knowledge Discovery in Databases, pages 570–586. Springer-Verlag.
Jung, W., Lee, E., and Wu, C. (2012). A survey on mining software repositories. IEICE Transactions on Information and Systems, E95.D(5):1384–1406.
Koch, K.-R. (1990). Bayes’ theorem. In Bayesian Inference with Geodetic Applications, pages 4–8. Springer.
Lamkanfi, A., Demeyer, S., Giger, E., and Goethals, B. (2010). Predicting the severity of a reported bug. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pages 1–10.
Lamkanfi, A., Demeyer, S., Soetens, Q. D., and Verdonck, T. (2011). Comparing mining algorithms for predicting the severity of a reported bug. In 2011 15th European Conference on Software Maintenance and Reengineering, pages 249–258.
Lamkanfi, A., Pérez, J., and Demeyer, S. (2013). The eclipse and mozilla defect tracking dataset: a genuine dataset for mining bug information. In 2013 10th Working Conference on Mining Software Repositories (MSR), pages 203–206.
Quinlan, J. R. (1993). C4.5:Programs for Machine Learning, volume 1. M.Kaufmann.
Rish, I. (2001). An empirical study of the naive bayes classifier. In IJCAI-Workshop Empirical Methods in Artificial Intelligence, volume 3, pages 41–46. IBM New York.
Rossi, R. G., Lopes, A. A., and Rezende, S. O. (2016). Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts. Information Processing & Management, 52(2):217 – 257.
Saha, R. K., Lawall, J., Khurshid, S., and Perry, D. E. (2015). Are these bugs really normal? In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pages 258–268.
Shull, F., Carver, J., and Travassos, G. H. (2001). An empirical methodology for introducing software processes. In Proceedings of the 8th European Software Engineering Conference Held Jointly with 9th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-9, pages 288–296, New York, NY, USA. ACM.
Sokolova, M. and Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4):427–437.
Tan, P.-N., Steinbach, M., and Kumar, V. (2005). Introduction to Data Mining. Addison-Wesley.
Tian, Y., Lo, D., and Sun, C. (2012). Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In 19th Working Conference on Reverse Engineering, pages 215–224.
Vapnik, V. N. (1995). The nature of statistical learning theory.
Xia, X., Lo, D., Shihab, E., Wang, X., and Yang, X. (2015). Elblocker: Predicting blocking bugs with ensemble imbalance learning. Information and Software Technology, 61:93 – 106.
Yin, Z., Li, R., Mei, Q., and Han, J. (2009). Exploring social tagging graph for web object classification. In Proc. of the Int. Conf. on Knowledge Discovery and Data Mining, pages 957–966.
Zhang, T., Chen, J., Yang, G., Lee, B., and Luo, X. (2016). Towards more accurate severity prediction and fixer recommendation of software bugs. J. Syst. Softw., 117(C):166–184.
Zhou, D., Bousquet, O., Lal, T. N., Weston, J., and Schölkopf, B. (2004). Learning with local and global consistency. In Advances in Neural Information Processing Systems, volume 16, pages 321–328.
Zhou, Y., Tong, Y., Gu, R., and Gall, H. (2014). Combining text mining and data mining for bug report classification. In 2014 IEEE International Conference on Software Maintenance and Evolution, pages 311–320.
Zhu, X. and Goldberg, A. B. (2009). Introduction to semi-supervised learning. Morgan and Claypool Publishers.
Ji, M., Sun, Y., Danilevsky, M., Han, J., and Gao, J. (2010). Graph regularized transductive classification on heterogeneous information networks. In Proc. of the European Conf. on Machine Learning and Knowledge Discovery in Databases, pages 570–586. Springer-Verlag.
Jung, W., Lee, E., and Wu, C. (2012). A survey on mining software repositories. IEICE Transactions on Information and Systems, E95.D(5):1384–1406.
Koch, K.-R. (1990). Bayes’ theorem. In Bayesian Inference with Geodetic Applications, pages 4–8. Springer.
Lamkanfi, A., Demeyer, S., Giger, E., and Goethals, B. (2010). Predicting the severity of a reported bug. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pages 1–10.
Lamkanfi, A., Demeyer, S., Soetens, Q. D., and Verdonck, T. (2011). Comparing mining algorithms for predicting the severity of a reported bug. In 2011 15th European Conference on Software Maintenance and Reengineering, pages 249–258.
Lamkanfi, A., Pérez, J., and Demeyer, S. (2013). The eclipse and mozilla defect tracking dataset: a genuine dataset for mining bug information. In 2013 10th Working Conference on Mining Software Repositories (MSR), pages 203–206.
Quinlan, J. R. (1993). C4.5:Programs for Machine Learning, volume 1. M.Kaufmann.
Rish, I. (2001). An empirical study of the naive bayes classifier. In IJCAI-Workshop Empirical Methods in Artificial Intelligence, volume 3, pages 41–46. IBM New York.
Rossi, R. G., Lopes, A. A., and Rezende, S. O. (2016). Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts. Information Processing & Management, 52(2):217 – 257.
Saha, R. K., Lawall, J., Khurshid, S., and Perry, D. E. (2015). Are these bugs really normal? In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pages 258–268.
Shull, F., Carver, J., and Travassos, G. H. (2001). An empirical methodology for introducing software processes. In Proceedings of the 8th European Software Engineering Conference Held Jointly with 9th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-9, pages 288–296, New York, NY, USA. ACM.
Sokolova, M. and Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4):427–437.
Tan, P.-N., Steinbach, M., and Kumar, V. (2005). Introduction to Data Mining. Addison-Wesley.
Tian, Y., Lo, D., and Sun, C. (2012). Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In 19th Working Conference on Reverse Engineering, pages 215–224.
Vapnik, V. N. (1995). The nature of statistical learning theory.
Xia, X., Lo, D., Shihab, E., Wang, X., and Yang, X. (2015). Elblocker: Predicting blocking bugs with ensemble imbalance learning. Information and Software Technology, 61:93 – 106.
Yin, Z., Li, R., Mei, Q., and Han, J. (2009). Exploring social tagging graph for web object classification. In Proc. of the Int. Conf. on Knowledge Discovery and Data Mining, pages 957–966.
Zhang, T., Chen, J., Yang, G., Lee, B., and Luo, X. (2016). Towards more accurate severity prediction and fixer recommendation of software bugs. J. Syst. Softw., 117(C):166–184.
Zhou, D., Bousquet, O., Lal, T. N., Weston, J., and Schölkopf, B. (2004). Learning with local and global consistency. In Advances in Neural Information Processing Systems, volume 16, pages 321–328.
Zhou, Y., Tong, Y., Gu, R., and Gall, H. (2014). Combining text mining and data mining for bug report classification. In 2014 IEEE International Conference on Software Maintenance and Evolution, pages 311–320.
Zhu, X. and Goldberg, A. B. (2009). Introduction to semi-supervised learning. Morgan and Claypool Publishers.
Published
2017-08-28
How to Cite
BARBOSA, Jacson Rodrigues; MATSUNO, Ivone Penque; GUIMARÃES, Eduardo R.; REZENDE, Solange Oliveira; VINCENZI, Auri M. R.; DELAMARO, Márcio E..
Text Mining to Support Predicting Severity of Incident Reporting: A Feasibility Study. In: BRAZILIAN SOFTWARE QUALITY SYMPOSIUM (SBQS), 16. , 2017, Rio de Janeiro.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2017
.
p. 89-103.
DOI: https://doi.org/10.5753/sbqs.2017.15094.
