An Approach to HLA Allele Imputation in Bone Marrow Donor Registries

  • Felipe S. C. Eduardo UERJ
  • Nathalia de Azevedo UERJ
  • Luís Cristóvão M. S. Pôrto UERJ
  • Karla Figueiredo UERJ
  • Alexandre C. Sena UERJ

Abstract


The main information in bone marrow donor records is the alleles of the HLA genes. Due to the costs and types of tests required to obtain this information, many of these alleles are not found in the databases. Thus, the objective of this study is to evaluate, in an unprecedented way, the possibility of imputing the alleles of genes not reported in these databases. For this purpose, a Recurrent Neural Network of the Long-Short Time Memory (LSTM) type was used. The accuracy of 76% achieved shows the feasibility of imputing the missing alleles, despite the strong imbalance of the classes and because it is one of the most polymorphic regions of human DNA (i.e. many options of distinct alleles).

References

Al-lQubaydhi, N., Alenezi, A., Alanazi, T., Senyor, A., Alanezi, N., Alotaibi, B., Alotaibi, M., Razaque, A., and Hariri, S. (2024). Deep learning for unmanned aerial vehicles detection: A review. Computer Science Review, 51:100614.

Alexander Dilthey, Stephen Leslie, L. M.-J. S.-C. C. M. R. N. G. M. (2013). Multi-population classical hla type imputation. PLoS Comput. Bio., 9(2):e1002877.

Alexander T Dilthey, Loukas Moutsianas, S. L. G. M. (2011). Hla*imp—an integrated framework for imputing classical hla alleles from snp genotypes. Bioinformatics, 27(7):968–972.

Geffard, E. et al. (2019). Easy-HLA: a validated web application suite to reveal the full details of HLA typing. Bioinformatics, 36(7):2157–2164.

Geffard, E., Limou, S., Walencik, A., Daya, M., Watson, H., Torgerson, D., Barnes, K. C., CAAPA, Cesbron Gautier, A., Gourraud, P.-A., et al. (2020). Easy-hla: a validated web application suite to reveal the full details of hla typing. Bioinformatics, 36(7):2157–2164.

Hancock, J.T., K. T. (2020). Survey on categorical data for neural networks. J Big Data, 7:28.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8):1735–1780.

Hrinchuk, O., Khrulkov, V., Mirvakhabova, L., Orlova, E., and Oseledets, I. (2020). Tensorized embedding layers. In Cohn, T., He, Y., and Liu, Y., editors, Findings of the Association for Computational Linguistics: EMNLP 2020, Online.

Instituto Nacional de Câncer (INCA) (2023). Quem somos. Acesso em: 03 jan. 2025.

Jeanmougin, M., Noirel, J., Coulonges, C., and Zagury, J.-F. (2017). Hla-check: evaluating hla data from snp information. BMC bioinformatics, 18:1–8.

Junjie Chen, X. S. (2019). Sparse convolutional denoising autoencoders for genotype imputation. Genes, 10(9):652.

Kishore, A. and Petrek, M. (2018). Next-generation sequencing based hla typing: deciphering immunogenetic aspects of sarcoidosis. Frontiers in genetics, 9:503.

Lhotte, R., Letort, V., Usureau, C., Jorge-Cordeiro, D., Consortium, P. A., Siemowski, J., Gabet, L., Cournede, P.-H., Taupin, J.-L., Guillaume, N., et al. (2024). Improving hla typing imputation accuracy and eplet identification with local next-generation sequencing training data. HLA, 103(1):e15222.

Maiers, M., Halagan, M., Gragert, L., Bashyal, P., Brelsford, J., Schneider, J., Lutsker, P., and Louzoun, Y. (2019). Grimm: Graph imputation and matching for hla genotypes. Bioinformatics, 35(18):3520–3523.

Shaz, B. H., Hillyer, C. D., and Gil, M. R. (2013). Blood Banking and Transfusion Medicine - History, Industry, and Discipline.

Song, M., Greenbaum, J., Luttrell, J., Zhou, W., Wu, C., Luo, Z., et al. (2022). An autoencoder-based deep learning method for genotype imputation. Frontiers in Artificial Intelligence, 5.

Stephen Leslie, Peter Donnelly, G. M. (2008). A statistical method for predicting classical hla alleles from snp data. American Journal of Human Genetics, 82(1):48–56.

Tiercy, J.-M. (2016). How to select the best available related or unrelated donor of hematopoietic stem cells? Haematologica, 101(6):680–687.

Torres, M. A. and Moraes, M. E. H. (2011). Nomenclatura dos fatores do sistema hla. einstein (São Paulo), 9:249–251.

Xiaoming Jia, Buhm Han, S. O.-G. W.-M. C. P. J. C. S. S. R. S. R. P. I. W. d. B. (2013). Imputing amino acid polymorphisms in human leukocyte antigens. PLoS One.

Yu, Y., Si, X., Hu, C., and Zhang, J. (2019). A review of recurrent neural networks: Lstm cells and network architectures. Neural Computation, 31(7):1235–1270.
Published
2025-09-29
EDUARDO, Felipe S. C.; AZEVEDO, Nathalia de; PÔRTO, Luís Cristóvão M. S.; FIGUEIREDO, Karla; SENA, Alexandre C.. An Approach to HLA Allele Imputation in Bone Marrow Donor Registries. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 22. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 831-842. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2025.14236.

Most read articles by the same author(s)

1 2 > >>