Language Models are not a Panacea: Combining them with Domain Knowledge and Efficient Indexes for Entity Linking

  • Daniel Lucas Albuquerque Universidade Federal de Santa Catarina (UFSC)
  • Vitória S. Santos Universidade Federal de Santa Catarina (UFSC)
  • Pedro Nack Universidade Federal de Santa Catarina (UFSC)
  • Renato Fileto Universidade Federal de Santa Catarina (UFSC)
  • Carina F. Dorneles Universidade Federal de Santa Catarina (UFSC)

Resumo


Language models enable cutting-edge solutions for many problems. However, they may not always be the best choice—at least not on their own—for certain tasks in specific contexts. In this paper, we propose a hybrid approach to entity linking (EL) that employs domain knowledge and efficient indexes for named entity recognition (NER), delegating only the disambiguation step (NED) to language models. We evaluated this hybrid approach on textual descriptions of invoice items from public medication purchases. The experiments showed that domain knowledge and indexes enabled efficient recognition of medications (NER), with accuracy superior to most state-of-the-art language models investigated and comparable to the GPT-4o reasoning language model. In addition, candidate medications recognized by our computationally efficient approach were disambiguated (NED) by GPT-4o with 90.55% precision.
Palavras-chave: language models, entity linking, domain knowledge, indexing

Referências

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.

Barba, E., Procopio, L., and Navigli, R. (2022). Extend: Extractive entity disambiguation. In Proc.of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2478–2488.

Besta, M., Barth, J., Schreiber, E., Kubicek, A., Catarino, A., Gerstenberger, R., Nyczyk, P., Iff, P., Li, Y., Houliston, S., Sternal, T., Copik, M., Kwaśniewski, G., Müller, J., Łukasz Flis, Eberhard, H., Niewiadomski, H., and Hoefler, T. (2025). Reasoning language models: A blueprint.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. (2020). Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA. Curran Associates Inc.

Cao, N. D., Izacard, G., Riedel, S., and Petroni, F. (2021). Autoregressive entity retrieval.

Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., et al. (2024). Scaling instruction-finetuned language models. Journal of Machine Learning Research, 25(70):1–53.

Ding, Y., Zeng, Q., and Weninger, T. (2024). Chatel: Entity linking with chatbots. arXiv preprint arXiv:2402.14858.

Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly.

Jurafsky, D. and Martin, J. H. (2024). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Third edition draft edition.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., tau Yih, W., Rocktäschel, T., Riedel, S., and Kiela, D. (2021). Retrieval-augmented generation for knowledge-intensive nlp tasks.

Liu, S. and Fang, Y. (2023). Use large language models for named entity disambiguation in academic knowledge graphs. In 2023 3rd Intl. conf. on Education, Information Management and Service Science (EIMSS 2023), pages 681–691. Atlantis Press.

Liu, X., Liu, Y., Zhang, K., Wang, K., Liu, Q., and Chen, E. (2024). Onenet: A fine-tuning free framework for few-shot entity linking via large language model prompting. arXiv preprint arXiv:2410.07549.

Miranda, N., Machado, M. M., and Moreira, D. A. (2024). Ontodrug: Enhancing brazilian health system interoperability with a national medication ontology. In Brazilian Symposium on Multimedia and the Web (WebMedia), pages 240–248. SBC.

Nascimento, E. and Casanova, M. A. (2024). Querying databases with natural language: The use of large language models for text-to-sql tasks. In Anais Estendidos do XXXIX Simp. Brasileiro de Bancos de Dados, pages 196–201, Porto Alegre, RS, Brasil. SBC.

Oliveira, I. L., Fileto, R., Speck, R., Garcia, L. P., Moussallem, D., and Lehmann, J. (2021). Towards holistic entity linking: Survey and directions. Information Systems, 95:101624.

Pereira, Í. M. and Ferreira, A. A. (2024). E-bela: Enhanced embedding-based entity linking approach. In Brazilian Symposium on Multimedia and the Web (WebMedia), pages 115–123. SBC.

Rea, L. and Parker, R. (2012). Designing and Conducting Survey Research: A Comprehensive Guide. Wiley.

Romero, P., Han, L., and Nenadic, G. (2025). Medication extraction and entity linking using stacked and voted ensembles on LLMs. In Ananiadou, S., Demner-Fushman, D., Gupta, D., and Thompson, P., editors, Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health), pages 303–315, Albuquerque, New Mexico. Association for Computational Linguistics.

Santos, V. and Dorneles, C. (2024). Unveiling the segmentation power of llms: Zero-shot invoice item description analysis. In Anais do XXXIX Simpósio Brasileiro de Bancos de Dados, pages 549–561, Porto Alegre, RS, Brasil. SBC.

Sevgili, Ö., Shelmanov, A., Arkhipov, M., Panchenko, A., and Biemann, C. (2022). Neural entity linking: A survey of models based on deep learning. Semantic Web, 13(3):527–570.

Shen, W., Li, Y., Liu, Y., Han, J., Wang, J., and Yuan, X. (2023). Entity linking meets deep learning: Techniques and solutions. IEEE Transactions on Knowledge and Data Engineering, 35(3):2556–2578.

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.

Trivedi, H., Balasubramanian, N., Khot, T., and Sabharwal, A. (2023). Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions.

Vollmers, D., Zahera, H., Moussallem, D., and Ngomo, A.-C. N. (2025). Contextual augmentation for entity linking using large language models. In Proc.of the 31st International Conference on Computational Linguistics, pages 8535–8545.

Wang, S., Li, A. H., Zhu, H., Zhang, S., Hang, C.-W., Perera, P., Ma, J., Wang, W., Wang, Z., Castelli, V., et al. (2023). Benchmarking diverse-modal entity linking with generative models. arXiv preprint arXiv:2305.17337.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E. H., Le, Q. V., and Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY, USA. Curran Associates Inc.

Xiao, Z., Gong, M., Wu, J., Zhang, X., Shou, L., Pei, J., and Jiang, D. (2023). Instructed language models with retrievers are powerful entity linkers. arXiv preprint arXiv:2311.03250.

Xin, A., Qi, Y., Yao, Z., Zhu, F., Zeng, K., Bin, X., Hou, L., and Li, J. (2024). Llmael: Large language models are good context augmenters for entity linking. arXiv preprint arXiv:2407.04020.
Publicado
29/09/2025
ALBUQUERQUE, Daniel Lucas; SANTOS, Vitória S.; NACK, Pedro; FILETO, Renato; DORNELES, Carina F.. Language Models are not a Panacea: Combining them with Domain Knowledge and Efficient Indexes for Entity Linking. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 40. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 479-492. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2025.247273.