Engineering Inferential Composition Control for Federated RAG in Data Spaces
Resumo
Federated Data Spaces increasingly rely on Retrieval-Augmented Generation (RAG) to provide intelligent access to distributed and sovereign data sources. However, current RAG systems typically assume that retrieved fragments from different providers can be merged into a single generation context without semantic conflict. In regulated and heterogeneous settings, this assumption can trigger inferential errors, such as treating conditional or alternative requirements as jointly mandatory, posing risks to trustworthiness and governance. This paper reframes semantic interoperability in federated RAG systems as a problem of inferential composition rather than conceptual alignment. We introduce a lightweight engineering framework that adds an explicit reasoning control layer to the RAG pipeline. The framework uses semantic roles and pragmatic signals to guide selective context separation and conservative fragment composition, preventing unsafe inferences without requiring ontologies or shared vocabularies. We evaluate the approach through a reproducible case study in a regulated data-sharing scenario, showing that inferential composition control prevents reasoning failures under adversarial queries, while preserving benign integration. These findings highlight the need for explicit reasoning control in RAG-based systems operating in federated and regulated environments.Referências
Ashley, K. D. (2017). Artificial intelligence and legal analytics: new tools for law practice in the digital age. Cambridge University Press.
Austin, J. L. (1975). How to do things with words. Harvard university press.
Braga, C. M., Serrano, M. A., and Fernández-Medina, E. (2026). Guided and federated rag: Architectural models for trustworthy ai in data spaces. In Intelligent Data Engineering and Automated Learning – IDEAL 2025, pages 363–374, Cham. Springer Nature Switzerland.
Brandom, R. (1994). Making it explicit: Reasoning, representing, and discursive commitment. Harvard university press.
Diao, S., Wang, P., Lin, Y., Pan, R., Liu, X., and Zhang, T. (2024). Active prompting with chain-of-thought for large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1330–1350.
Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Metropolitansky, D., Ness, R. O., and Larson, J. (2024). From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130.
Governatori, G. (2005). Representing business contracts in RuleML. International Journal of Cooperative Information Systems, 14(02n03):181–216.
Hao, Z., Mayer, W., Xia, J., Li, G., Qin, L., and Feng, Z. (2023). Ontology alignment with semantic and structural embeddings. Journal of Web Semantics, 78:100798.
J. Carlos, M., Ana Alice, B., Rui M., S., and Paulo J., M. (2025). Semantic mediation: a literature review on semantic interoperability through ontologies. Procedia Computer Science, 263:734–743. International Conference on Industry Sciences and Computer Science Innovation (iSCSi’24).
Kamp, H. and Reyle, U. (2013). From discourse to logic: Introduction to modeltheoretic semantics of natural language, formal logic and discourse representation theory, volume 42. Springer Science & Business Media.
Lewis, P. et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in neural information processing systems, 33:9459–9474.
Möller, F. et al. (2024). Industrial data ecosystems and data spaces. Electronic Markets, 34(1):41.
Nissenbaum, H. (2004). Privacy as contextual integrity. Wash. L. Rev., 79:119.
Penedo, A. C. (2024). The Regulation of Data Spaces under the EU Data Strategy: Towards the ‘Act-ification’of the Fifth European Freedom for Data? European Journal of Law and Technology, 15(1).
Schneider, J. (2024). Explainable generative AI (GenXAI): A survey, conceptualization, and research agenda. Artificial Intelligence Review, 57(11):289.
Yang, J., Shu, L., Duan, H., and Li, H. (2025). RDguru: A conversational intelligent agent for rare diseases. IEEE Journal of Biomedical and Health Informatics, 29(9):6366–6378.
Austin, J. L. (1975). How to do things with words. Harvard university press.
Braga, C. M., Serrano, M. A., and Fernández-Medina, E. (2026). Guided and federated rag: Architectural models for trustworthy ai in data spaces. In Intelligent Data Engineering and Automated Learning – IDEAL 2025, pages 363–374, Cham. Springer Nature Switzerland.
Brandom, R. (1994). Making it explicit: Reasoning, representing, and discursive commitment. Harvard university press.
Diao, S., Wang, P., Lin, Y., Pan, R., Liu, X., and Zhang, T. (2024). Active prompting with chain-of-thought for large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1330–1350.
Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Metropolitansky, D., Ness, R. O., and Larson, J. (2024). From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130.
Governatori, G. (2005). Representing business contracts in RuleML. International Journal of Cooperative Information Systems, 14(02n03):181–216.
Hao, Z., Mayer, W., Xia, J., Li, G., Qin, L., and Feng, Z. (2023). Ontology alignment with semantic and structural embeddings. Journal of Web Semantics, 78:100798.
J. Carlos, M., Ana Alice, B., Rui M., S., and Paulo J., M. (2025). Semantic mediation: a literature review on semantic interoperability through ontologies. Procedia Computer Science, 263:734–743. International Conference on Industry Sciences and Computer Science Innovation (iSCSi’24).
Kamp, H. and Reyle, U. (2013). From discourse to logic: Introduction to modeltheoretic semantics of natural language, formal logic and discourse representation theory, volume 42. Springer Science & Business Media.
Lewis, P. et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in neural information processing systems, 33:9459–9474.
Möller, F. et al. (2024). Industrial data ecosystems and data spaces. Electronic Markets, 34(1):41.
Nissenbaum, H. (2004). Privacy as contextual integrity. Wash. L. Rev., 79:119.
Penedo, A. C. (2024). The Regulation of Data Spaces under the EU Data Strategy: Towards the ‘Act-ification’of the Fifth European Freedom for Data? European Journal of Law and Technology, 15(1).
Schneider, J. (2024). Explainable generative AI (GenXAI): A survey, conceptualization, and research agenda. Artificial Intelligence Review, 57(11):289.
Yang, J., Shu, L., Duan, H., and Li, H. (2025). RDguru: A conversational intelligent agent for rare diseases. IEEE Journal of Biomedical and Health Informatics, 29(9):6366–6378.
Publicado
11/05/2026
Como Citar
BRAGA, Carlos Mario; SERRANO, Manuel A.; FERNÁNDEZ-MEDINA, Eduardo.
Engineering Inferential Composition Control for Federated RAG in Data Spaces. In: CONGRESSO IBERO-AMERICANO EM ENGENHARIA DE SOFTWARE (CIBSE), 29. , 2026, Recife/PE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 76-90.
