Smart OCR: A Vertical AI Agent for Financial Document Validation — An Exploratory Case Study

Daniel Mitsuaki da Silva Utyiama; Oziel Sênior Coelho Moraes; Felipe Augusto Souza Guimarães; Paolo Bruno Silva Ramos; Leonardo Carneiro Marques

doi:10.5753/sbsi.2026.248680

Daniel Mitsuaki da Silva Utyiama SiDi
Oziel Sênior Coelho Moraes SiDi
Felipe Augusto Souza Guimarães SiDi
Paolo Bruno Silva Ramos SiDi
Leonardo Carneiro Marques SiDi

DOI: https://doi.org/10.5753/sbsi.2026.248680

Resumo

Research Context: The growing complexity of fiscal and financial documents increases the need for intelligent solutions that automate validation. Vertical AI agents are emerging to improve accuracy and efficiency in financial workflows. Scientific and/or Practical Problem: Despite their potential, many AI systems face adoption barriers due to usability issues, lack of training, and misalignment with established practices. Understanding user perceptions in early stages is key to sustainable integration. Proposed Solution and/or Analysis: This study examines Smart OCR, a vertical AI agent that automates manual validation of fiscal and financial documents and provides a web interface for reviewing discrepancies. We assess perceptions of usefulness, usability, value, satisfaction, and recommendation intent. Related IS Theory: The analysis is grounded in the Technology Acceptance Model (TAM) and human–computer interaction literature, which emphasize perceived usefulness and ease of use as determinants of adoption. Research Method: The study took place in the financial sector of a research and development institute. Over two weeks, analysts and coordinators used Smart OCR integrated into their workflow. Data were collected through a structured questionnaire combining Likert items and open-ended questions. Summary of Results: Findings show high perceived usefulness, expected impact, and value, with strong recommendation intent. Usability and initial satisfaction were moderate, highlighting the need for clearer guidance and interface improvements. Contributions and Impact to IS area: The study offers empirical evidence on the acceptance of vertical AI agents in financial contexts and provides insights for IS design, onboarding, and organizational support to foster AI adoption.

Referências

Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M., et al. (2022). Flamingo: a visual language model for few-shot learning. In Advances in Neural Information Processing Systems.

Anthropic (2024). Claude 3.5 sonnet v2: Multimodal reasoning model. Disponível em: [link]. Acesso em: 8 fev. 2026.

Anthropic (2025). Claude 3.7 sonnet: Extended thinking model. Disponível em: [link]. Acesso em: 8 fev. 2026.

Balch, T., Veloso, M., et al. (2023). Flowmind: Automatic workflow generation in financial services leveraging large language models. In ICAIF ’23: Proceedings of the 4th ACM International Conference on AI in Finance, page 1–11, Brooklyn, NY, USA.

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.

Cheng, Y., Zhang, C., Zhang, Z., Meng, X., Hong, S., Li, W., Wang, Z., Wang, Z., Yin, F., Zhao, J., and He, X. (2024). Exploring large language model based intelligent agents: Definitions, methods, and prospects. arXiv preprint.

Cho, S., Moon, J., Bae, J., Kang, J., and Lee, S. (2023). A framework for understanding unstructured financial documents using rpa and multimodal approach. Electronics, 12(4):939.

Davenport, T. H. and Ronanki, R. (2018). Artificial intelligence for the real world. Harvard Business Review, 96(1):108–116.

Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 13(3):319–340.

Dulebohn, J. H., Murray, B., Sun, M., Rao, V., and Wilkie, D. (2005). Computer training and technology acceptance by end-users: testing a causal model. Journal of Organizational Behavior, 26(3):329–347.

(FSB), F. (2024). Documento ou relatório da fsb. Contém Sumário Executivo e Introdução; Copyright © 2024 Financial Stability Board.

Gefen, D. and Straub, D. W. (2000). The relative importance of perceived ease of use in is adoption: A study of e-commerce adoption. Journal of the Association for Information Systems, 1(1):1–30.

Gimpel, H., Hosseini, S., Huber, R., Probst, L., and Röglinger, M. (2018). Structuring digital transformation: A framework of action fields and its application at zeiss. Journal of Information Technology Theory and Application, 19(1):31–54.

Han, S., Kang, H., Jin, B., Liu, X. Y., and Yang, S. Y. (2024). Xbrl agent: Leveraging large language models for financial report analysis. In Proceedings of the 5th ACM International Conference on AI in Finance, pages 856–864. ACM.

Holden, R. J. and Karsh, B.-T. (2010). The technology acceptance model: Its past and its future in health care. Journal of Biomedical Informatics, 43(1):159–172.

Jennings, N. R. (2000). On agent-based software engineering. Artificial Intelligence, 117(2):277–296.

Jin, J. (2025). Designing the next generation of intelligent manufacturing with generative ai. [link]. Fujitsu Insight Paper. Acessado em 27 ago. 2025.

Karunarathne, P., Madushani, D., and Perera, S. (2020). Role of employee training and experience on the adoption of digital technologies. Economic Horizons, 22(2):123–138.

Klock, A. C. T., Gasparini, I., and Pimenta, M. S. (2016). Avaliação de usabilidade de sistemas de gerenciamento de referências: um estudo com endnote, mendeley e zotero. In Anais do XXI Simpósio Brasileiro sobre Sistemas de Informação (SBSI), Brasil. Sociedade Brasileira de Computação (SBC).

Langendorf, J., Meier, C., and Schuster, F. (2025). Systematic literature review on usability and training in digital systems. Journal of Usability Studies, 20(1):45–68.

Li, X., Wang, S., Zeng, S., Wu, Y., and Yang, Y. (2024). A survey on llm-based multi-agent systems: workflow, infrastructure, and challenges. Vicinagearth, 1(9):1–43.

Li, Y., Wang, S., Ding, H., and Chen, H. (2023). Large language models in finance: A survey. In ICAIF ’23: Proceedings of the 4th ACM International Conference on AI in Finance, Brooklyn, NY, USA.

Liu, Z., Li, J., Wu, Z., Zhang, Y., Zhou, Y., Lin, K., Yang, Y., Li, J., Hu, Z., Dai, X., et al. (2023). A survey on multimodal large language models. arXiv preprint arXiv:2306.13549.

Meta AI (2025). Llama 4 Scout 17B 16E Instruct: Natively multimodal MoE model. Disponível em: [link]. Acesso em: 8 fev. 2026.

Nelson, K. M., Kogan, A., Srivastava, R. P., Vasarhelyi, M. A., and Lu, H. (2000). Virtual auditing agents: the edgar agent challenge. Decision Support Systems, 28(3):241–253.

Nielsen, J. (2012). Usability 101: Introduction to usability. [link]. Acessado em 27 ago. 2025.

Park, T. (2024). Enhancing anomaly detection in financial markets with an llm-based multi-agent framework. arXiv preprint.

Qwen Team (2025). QwQ-32B: Reasoning model series developed by Qwen. Disponível em: [link]. Acesso em: 8 fev. 2026.

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR.

Roy, P., Ghose, B., Singh, P. K., Tyagi, P. K., and Vasudevan, A. (2025). Artificial intelligence and finance: A bibliometric review on the trends, influences, and research directions [version 1; peer review: 2 approved]. F1000Research, 14(122).

Russell, S. and Norvig, P. (2021). Artificial Intelligence: A Modern Approach. Pearson, 4 edition.

Smith, R. (2007). An overview of the tesseract ocr engine. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), volume 2, pages 629–633. IEEE.

Syed, R., Blome, C., Papadopoulos, T., and Childe, S. J. (2020). Artificial intelligence in operations and supply chain management: A systematic review. Supply Chain Management, 25(6):985–1009.

Szortyka, K. (2024). Inteligentna automatyzacja procesów finansowo-księgowych w centrach outsourcingu procesów biznesowych w polsce. Stowarzyszenie Księgowych w Polsce, 48(3):155–175.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, volume 30.

Venkatesh, V. and Davis, F. D. (2000). A theoretical extension of the technology acceptance model: Four longitudinal field studies. Management Science, 46(2):186–204.

Venkatesh, V., Morris, M. G., Davis, G. B., and Davis, F. D. (2003). User acceptance of information technology: Toward a unified view. MIS Quarterly, 27(3):425–478.

Venkatesh, V., Thong, J. Y. L., and Xu, X. (2012). Consumer acceptance and use of information technology: Extending the unified theory of acceptance and use of technology. MIS Quarterly, 36(1):157–178.

Wan, X., Deng, H., Zou, K., and Xu, S. (2024a). Enhancing the efficiency and accuracy of underlying asset reviews in structured finance: The application of multi-agent framework. arXiv preprint arXiv:2405.04294.

Wan, X., Deng, H., Zou, K., and Xu, S. (2024b). Enhancing the efficiency and accuracy of underlying asset reviews in structured finance: The application of multi-agent framework. [link]. arXiv preprint arXiv:2405.04294. Acessado em 27 ago. 2025.

WANG, L., MA, C., FENG, X., ZHANG, Z., YANG, H., ZHANG, J., CHEN, Z., TANG, J., CHEN, X., LIN, Y., ZHAO, W. X., WEI, Z., and WEN, J. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6):1–26.

Weiss, G. (1999). Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT Press.

Wooldridge, M. (2009). An Introduction to Multi-Agent Systems. John Wiley & Sons, 2 edition.

World Economic Forum (2025). Artificial intelligence in financial services 2025. [link]. World Economic Forum Report. Acessado em 27 ago. 2025.

Xu, F. F., Song, Y., Li, B., et al. (2024). Theagentcompany: Benchmarking llm agents on consequential real-world tasks. Disponível em: [link]. Acesso em: 27 ago. 2025.

Yang, H., Zhang, B., Wang, N., et al. (2024). Finrobot: An open-source ai agent platform for financial applications using large language models. Disponível em: [link]. Acesso em: 27 ago. 2025.

Yeo, W. J., Heever, W. V. D., Mao, R., Cambria, E., Satapathy, R., and Mengaldo, G. (2025). A comprehensive review on financial explainable ai. DOI: 10.1007/s10462-024-11077-7. Published in Artificial Intelligence Review, 58:189. Acessado em 27 ago. 2025.

Yimyam, W., Ketcham, M., Jensuttiwetchakult, T., Hiranchan, S., Pramkeaw, P., and Chumuang, N. (2020). Enhancing and evaluating an impact of ocr and ontology on financial document checking process. In 2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), pages 1–6. IEEE.

Yin, S., Fu, C., Zhao, S., Li, K., Sun, X., Xu, T., and Chen, E. (2024). A survey on multimodal large language models. National Science Review, 11(12):nwae403.

Zhao, H., Liu, Z., Wu, Z., Li, Y., Yang, T., Shu, P., Xu, S., Dai, H., Zhao, L., Mai, G., et al. (2024). Revolutionizing finance with llms: An overview of applications and insights. arXiv preprint arXiv:2401.11641.