Dealing with Imbalanceness in Hierarchical Classification Problems Through Data Resampling
Many important classification problems are imbalanced. Although resampling approaches are a common solution for different types of classification problems, they were still not defined for hierarchical classification problems. The objective of this work is to propose novel resampling approaches to handle the class imbalanceness issue in hierarchical classification problems. Four directions were investigated: (i) The use of classic resampling methods; (ii) A label path conversion strategy; (iii) The design of schemas to use resampling algorithms with local approaches; (iv) The proposal of global resampling algorithms. To show the impacts of the contribution of this work, we have investigated the imbalanceness issue in the COVID-19 identification in chest x-ray images.
C. Xu and X. Geng, “Hierarchical classification based on label distribution learning,” in Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI, 2019, pp. 5533–5540.
C. N. Silla Jr and A. A. Freitas, “A survey of hierarchical classification across different application domains,” Data Mining and Knowledge Discovery, vol. 22, no. 1-2, pp. 31–72, 2011.
F. Charte, A. Rivera, M. del Jesus, and F. Herrera, “MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation,” Knowledge-Based Systems, vol. 89, pp. 385–397, 2015.
N. Chawla, K. Bowyer, L. Hall, and P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
R. M. Pereira and C. N. Silla Jr, “Using simplified chords sequences to classify songs genres,” in Proceedings of IEEE International Conference on Multimedia and Expo. Hong Kong: IEEE, 2017, pp. 1446–1451.
V. D. Valerio, R. M. Pereira, Y. M. G. Costa, D. Bertoini, and C. N. Silla Jr, “A resampling approach for imbalanceness on music genre classification using spectrograms,” in Proceedings of the International Florida Artificial Intelligence Conference. Melbourne, USA: AAAI, 2018.
R. M. Pereira, Y. M. G. Costa, R. L. Aguiar, A. S. Britto Jr, L. E. S. Oliveira, and C. N. Silla Jr, “Representation learning vs. handcrafted features for music genre classification,” in Proceedings of the International Joint Conference on Neural Networks. Budapest, Hungary: IEEE, 2019, pp. 1–8.
R. B. Mangolin, R. M. Pereira, A. S. Britto Jr, C. N. Silla Jr, V. D. Feltrim, D. B. Goncalves, and Y. M. G. Costa, “A multimodal approach for multi-label movie genre classification,” Multimedia Tools and Applications, vol. 79, no. 43, pp. 1–30, 2020.
R. M. Pereira, Y. M. G. Costa, and C. N. Silla Jr, “MLTL: A multi-label approach for the tomek link undersampling algorithm,” Neurocomputing, vol. 383, pp. 95–105, 2020.
——, “Dealing with imbalanceness in hierarchical multi-label datasets using multi-label resampling techniques,” in Proceedings of the IEEE International Conference on Tools with Artificial Intelligence. Volos, Greece: IEEE, 2018, pp. 818–824.
R. M. Pereira, D. Bertolini, L. O. Teixeira, C. N. Silla Jr, and Y. M. G. Costa, “COVID-19 identification in chest x-ray images on flat and hierarchical classification scenarios,” Computer Methods and Programs in Biomedicine, vol. 194, pp. 1–28, 2020.
R. M. Pereira, Y. M. Costa, and C. N. Silla, “Handling imbalance in hierarchical classification problems using local classifiers approaches,” Data Mining and Knowledge Discovery, vol. 35, pp. 1564–1621, 2021.
——, “Toward hierarchical classification of imbalanced data using random resampling algorithms,” Information Sciences, vol. 578, pp. 344–363, 2021.