Early prediction of hypothyroidism based on feature selection and explainable artificial intelligence

  • Caio M. V. Cavalcante UFERSA
  • Rosana C. B. Rego UFERSA


Early and accurate diagnosis is required for adequate treatment of hypothyroidism. However, the presence of subjectivity in the interpretation of test results presents a significant challenge. In this study, we explored and evaluated the potential of machine learning (ML) algorithms for addressing this issue. These algorithms include decision trees, random forest, XGBoost, LightGBM, extra trees, gradient boosting, and a stacking ensemble model. The purpose is to predict hypothyroidism, which is a medical condition that affects the thyroid gland, using attributes derived from blood test results. These attributes include thyroxine, thyroid stimulating hormone, free thyroxine index, total thyroxine, and triiodothyronine. The results demonstrate the effectiveness of utilizing these algorithms for accurately classifying hypothyroidism and offering diagnostic assistance with 99.16% of accuracy.

Palavras-chave: Classification, machine learning, hypothyroidism, thyroid


Almahshi, H. M., Almasri, E. A., Alquran, H., Mustafa, W. A., and Alkhayyat, A. (2022). Hypothyroidism prediction and detection using machine learning. In 2022 5th International Conference on Engineering Technology and its Applications (IICETA), pages 159–163. IEEE.

Arora, N., Singh, A., Al-Dabagh, M. Z. N., and Maitra, S. K. (2022). A novel architecture for diabetes patients’ prediction using k-means clustering and svm. Mathematical Problems in Engineering, 2022.

Bensenor, I. M., Olmos, R. D., and Lotufo, P. A. (2012). Hypothyroidism in the elderly: diagnosis and management. Clinical Interventions in Aging, pages 97–111.

Cavalcante, C. M., Almeida, V. A., Barros, M., Lima, N., and Rego, R. C. (2023). Thyroid syndrome detection using machine learning algorithms: A comparative analysis. In XVI Brazilian Conference on Computational Intelligence (CBIC 2023).

Chaganti, R., Rustam, F., De La Torre Díez, I., Mazón, J. L. V., Rodríguez, C. L., and Ashraf, I. (2022). Thyroid disease prediction using selective features and machine learning techniques. Cancers, 14(16):3914.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.

Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794.

Cohen, I., Huang, Y., Chen, J., Benesty, J., Benesty, J., Chen, J., Huang, Y., and Cohen, I. (2009). Pearson correlation coefficient. Noise reduction in speech processing, pages 1–4.

Darst, B. F., Malecki, K. C., and Engelman, C. D. (2018). Using recursive feature elimination in random forest to account for correlated variables in high dimensional data. BMC genetics, 19(1):1–6.

Duan, L., Zhang, H.-Y., Lv, M., Zhang, H., Chen, Y., Wang, T., Li, Y., Wu, Y., Li, J., and Li, K. (2022). Machine learning identifies baseline clinical features that predict early hypothyroidism in patients with graves’ disease after radioiodine therapy. Endocrine Connections, 11(5).

Fan, J., Ma, X., Wu, L., Zhang, F., Yu, X., and Zeng, W. (2019). Light gradient boosting machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agricultural water management, 225:105758.

Fawagreh, K., Gaber, M. M., and Elyan, E. (2014). Random forests: from early developments to recent advancements. Systems Science & Control Engineering: An Open Access Journal, 2(1):602–609.

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of statistics, pages 1189–1232.

Gaitonde, D. Y., Rowley, K. D., and Sweeney, L. B. (2012). Hypothyroidism: an update. South African Family Practice, 54(5):384–390.

Geurts, P., Ernst, D., and Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63(1):3–42.

Guleria, K., Sharma, S., Kumar, S., and Tiwari, S. (2022). Early prediction of hypothyroidism and multiclass classification using predictive machine learning and deep learning. Measurement: Sensors, 24:100482.

Hu, M., Asami, C., Iwakura, H., Nakajima, Y., Sema, R., Kikuchi, T., Miyata, T., Sakamaki, K., Kudo, T., Yamada, M., et al. (2022). Development and preliminary validation of a machine learning system for thyroid dysfunction diagnosis based on routine laboratory tests. Communications Medicine, 2(1):9.

Hueston, W. J. (2001). Treatment of hypothyroidism. American family physician, 64(10):1717–1725.

Kostoglou-Athanassiou, I. and Ntalles, K. (2010). Hypothyroidism-new aspects of an old disease. Hippokratia, 14(2):82.

Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.

Pal, M. (2005). Random forest classifier for remote sensing classification. International journal of remote sensing, 26(1):217–222.

Quinlan, J. R. (1986). Induction of decision trees. In Machine learning, volume 1, pages 81–106. Kluwer Academic Publishers.

Quinlan, R. (1987). Thyroid Disease. UCI Machine Learning Repository. DOI: 10.24432/C5D010.

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144.

Saidi, R., Bouaguel, W., and Essoussi, N. (2019). Hybrid feature selection method based on the genetic algorithm and pearson correlation coefficient. Machine learning paradigms: theory and application, pages 3–24.

Sankar, S., Potti, A., Chandrika, G. N., and Ramasubbareddy, S. (2022). Thyroid disease prediction using xgboost algorithms. J. Mob. Multimed, 18:1–18.

Shahid, A. H., Singh, M. P., Raj, R. K., Suman, R., Jawaid, D., and Alam, M. (2019). A study on label tsh, t3, t4u, tt4, fti in hyperthyroidism and hypothyroidism using machine learning techniques. In 2019 International Conference on Communication and Electronics Systems (ICCES), pages 930–933. IEEE.

Stroek, K., Visser, A., van der Ploeg, C. P., Zwaveling-Soonawala, N., Heijboer, A. C., Bosch, A. M., van Trotsenburg, A. P., Boelen, A., Hoogendoorn, M., and de Jonge, R. (2023). Machine learning to improve false-positive results in the dutch newborn screening for congenital hypothyroidism. Clinical Biochemistry, 116:7–10.

Vaidya, B. and Pearce, S. H. (2008). Management of hypothyroidism in adults. Bmj, 337.

Van Der Aalst, W. and van der Aalst, W. (2016). Data science in action. Springer.
CAVALCANTE, Caio M. V.; REGO, Rosana C. B.. Early prediction of hypothyroidism based on feature selection and explainable artificial intelligence. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 24. , 2024, Goiânia/GO. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 49-60. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2024.1870.