Entity Matching with Large Language Models: comparative study with Entity Blocking approach
Abstract
Entity Matching is essential to integrate data from different sources that refer to the same entity. Although pre-trained models that adopt Entity Blocking techniques are widely used in this task, the advancement of Large Language Models (LLMs) suggests new possibilities. This work compares Ditto, which applies optimization techniques to traditional models, with Orca2, an LLM based on Llama2 focused on reasoning. Despite its inferior initial performance, Orca2 demonstrates competitive potential, especially with future computational improvements. Thus, we seek to evaluate the feasibility of LLMs in Entity Matching, analyzing accuracy and computational cost.
Keywords:
Entity Matching, Entity Blocking, Large Language Models, Orca2, Ditto
References
Arvanitis-Kasinikos, I. and Papadakis, G. (2025). Entity matching with 7b llms: A study on prompting strategies and hardware limitations. CEUR Workshop Proceedings.
Barlaug, N. and Gulla, J. A. (2021). Neural networks for entity matching: A survey. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(3):1–37.
Brasileiro Araújo, T., Efthymiou, V., Christophides, V., Pitoura, E., and Stefanidis, K. (2025). Treats: Fairness-aware entity resolution over streaming data. Information Systems, 129:102506.
Christen, P. and Christen, P. (2012). Data matching systems. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, pages 229–242.
Kuang, W., Qian, B., Li, Z., Chen, D., Gao, D., Pan, X., Xie, Y., Li, Y., Ding, B., and Zhou, J. (2024). Federatedscope-llm: A comprehensive package for fine-tuning large language models in federated learning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5260–5271.
Li, Y., Li, J., Suhara, Y., Doan, A., and Tan, W.-C. (2020). Deep entity matching with pre-trained language models. Proceedings of the VLDB Endowment, 14(1):50–60.
Mitra, A., Del Corro, L., Mahajan, S., Codas, A., Simoes, C., Agarwal, S., Chen, X., Razdaibiedina, A., Jones, E., Aggarwal, K., et al. (2023). Orca 2: Teaching small language models how to reason. arXiv preprint arXiv:2311.11045.
Niven, T. and Kao, H.-Y. (2019). Probing neural network comprehension of natural language arguments. arXiv preprint arXiv:1907.07355.
Peeters, R., Der, R. C., and Bizer, C. (2023a). Wdc products: A multi-dimensional entity matching benchmark. arXiv preprint arXiv:2301.09521.
Peeters, R., Steiner, A., and Bizer, C. (2023b). Entity matching using large language models. arXiv preprint arXiv:2310.11244.
Wang, Y. and Yan, M. (2024). Unsupervised domain adaptation for entity blocking leveraging large language models. In 2024 IEEE International Conference on Big Data (BigData), pages 159–164. IEEE.
Zhang, J., Sun, H., and Ho, J. C. (2024). Emba: Entity matching using multi-task learning of bert with attention-over-attention. In EDBT, pages 281–293.
Barlaug, N. and Gulla, J. A. (2021). Neural networks for entity matching: A survey. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(3):1–37.
Brasileiro Araújo, T., Efthymiou, V., Christophides, V., Pitoura, E., and Stefanidis, K. (2025). Treats: Fairness-aware entity resolution over streaming data. Information Systems, 129:102506.
Christen, P. and Christen, P. (2012). Data matching systems. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, pages 229–242.
Kuang, W., Qian, B., Li, Z., Chen, D., Gao, D., Pan, X., Xie, Y., Li, Y., Ding, B., and Zhou, J. (2024). Federatedscope-llm: A comprehensive package for fine-tuning large language models in federated learning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5260–5271.
Li, Y., Li, J., Suhara, Y., Doan, A., and Tan, W.-C. (2020). Deep entity matching with pre-trained language models. Proceedings of the VLDB Endowment, 14(1):50–60.
Mitra, A., Del Corro, L., Mahajan, S., Codas, A., Simoes, C., Agarwal, S., Chen, X., Razdaibiedina, A., Jones, E., Aggarwal, K., et al. (2023). Orca 2: Teaching small language models how to reason. arXiv preprint arXiv:2311.11045.
Niven, T. and Kao, H.-Y. (2019). Probing neural network comprehension of natural language arguments. arXiv preprint arXiv:1907.07355.
Peeters, R., Der, R. C., and Bizer, C. (2023a). Wdc products: A multi-dimensional entity matching benchmark. arXiv preprint arXiv:2301.09521.
Peeters, R., Steiner, A., and Bizer, C. (2023b). Entity matching using large language models. arXiv preprint arXiv:2310.11244.
Wang, Y. and Yan, M. (2024). Unsupervised domain adaptation for entity blocking leveraging large language models. In 2024 IEEE International Conference on Big Data (BigData), pages 159–164. IEEE.
Zhang, J., Sun, H., and Ho, J. C. (2024). Emba: Entity matching using multi-task learning of bert with attention-over-attention. In EDBT, pages 281–293.
Published
2025-09-29
How to Cite
BOLCONTE DONATO, Rodolfo; BRASILEIRO ARAÚJO, Tiago.
Entity Matching with Large Language Models: comparative study with Entity Blocking approach. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 40. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 956-962.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2025.247828.
