Pairwise Difference Filter (PDF): An Interpretable Preprocessing Method for Medical and Beyond

Daniel Pordeus; Weslley Lioba Caldas; João Paulo do Vale Madeiro

doi:10.5753/sbcas_estendido.2025.7024

Daniel Pordeus UFC
Weslley Lioba Caldas UFC
João Paulo do Vale Madeiro UFC

DOI: https://doi.org/10.5753/sbcas_estendido.2025.7024

Resumo

Interpretability is a critical requirement for machine learning models in healthcare, as clinicians need to understand how the input features are processed to make informed decisions about patient care. Traditional preprocessing methods often fail to capture subtle differences between patient groups, particularly in datasets with overlapping or highly similar classes. To address these challenges, we propose Pairwise Difference Filter (PDF), a novel pre-processing method that leverages pairwise differences between samples of opposite classes to identify the most influential features. PDF focuses on pairs of patients with the smallest overall differences but significant differences in specific features, enabling the identification of clinically meaningful biomarkers. By enhancing the interpretability of machine learning models, PDF supports medical decision-making and improves the transparency of predictive models in healthcare. Experimental results with three different on a COVID-19 severity classification dataset, MUSIC (a dataset for predicting outcomes in patients with several degrees of heart failure) and Wine Toy Dataset demonstrate that PDF achieves competitive performance while providing interpretable feature rankings that align with clinical knowledge.

Referências

Association, W. M. (2001). World medical association declaration of helsinki: Ethical principles for medical research involving human subjects. Bulletin of the World Health Organization, 79(4):373–374.

Caldas, W. (2024). IVS: INTERPRETATIVE VARIABLE SELECTION VIA PERFECT BIPARTITE MATCHING. PhD thesis, Federal Unversity of Ceará.

Dhurandhar, A., Chen, P.-Y., Luss, R., Tu, C.-C., Ting, P., Shanmugam, K., and Das, P. (2018). Explanations based on the missing: Towards contrastive explanations with pertinent negatives.

Dua, D. and Graff, C. (2019). UCI machine learning repository.

Hakkoum, H., Abnane, I., and Idri, A. (2022). Interpretability in the medical field: A systematic mapping and review study. Applied Soft Computing, 117:108391.

Jović, A., Brkić, K., and Bogunović, N. (2015). A review of feature selection methods with applications. In 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pages 1200–1205.

Kira, K. and Rendell, L. A. (1992). The feature selection problem: Traditional methods and a new algorithm. Proceedings of the Ninth International Workshop on Machine Learning, pages 129–134.

Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1-2):273–324.

Kononenko, I. (1994). Estimating attributes: Analysis and extensions of relief. European Conference on Machine Learning, pages 171–182.

Lisboa, P., Saralajew, S., Vellido, A., Fernández-Domenech, R., and Villmann, T. (2023). The coming of age of interpretable and explainable machine learning models. Neurocomputing, 535:25–39.

Makowski, D., Pham, T., Lau, Z. J., Brammer, J. C., Lespinasse, F., Pham, H., Schölzel, C., and Chen, S. H. A. (2021). NeuroKit2: A python toolbox for neurophysiological signal processing. Behavior Research Methods, 53(4):1689–1696.

Martin-Yebra, A., Martínez, J. P., and Laguna, P. (2025). Music (sudden cardiac death in chronic heart failure). PhysioNet.

Pordeus, D. e. a. (2023). Training strategies for covid-19 severity classification. In Rojas, I., Valenzuela, O., Rojas Ruiz, F., Herrera, L., and Ortuño, F., editors, Bioinformatics and Biomedical Engineering, volume 13919 of Lecture Notes in Computer Science. Springer, Cham.

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ”why should i trust you?”: Explaining the predictions of any classifier.

Van Looveren, A. and Klaise, J. (2021). Interpretable counterfactual explanations guided by prototypes. In Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., and Lozano, J. A., editors, Machine Learning and Knowledge Discovery in Databases. Research Track, pages 650–665, Cham. Springer International Publishing.

Yan, X. e. a. (2020). Clinical characteristics and prognosis of 218 patients with covid-19: a retrospective study based on clinical classification. Frontiers in medicine, 7:485.