Unveiling the Segmentation Power of LLMs: Zero-Shot Invoice Item Description Analysis

Vitória S. Santos; Carina F. Dorneles

doi:10.5753/sbbd.2024.240820

Vitória S. Santos Universidade Federal de Santa Catarina (UFSC)
Carina F. Dorneles Universidade Federal de Santa Catarina (UFSC)

DOI: https://doi.org/10.5753/sbbd.2024.240820

Resumo

Segmenting invoice item description into attributes that describe its features may be a newsworthy alternative for subsequent entity resolution. This paper presents a set of experiments to show the performance of seven LLMs, including Llama-3, Sabiá-2-Medium, Command R+, Claude 3 Opus, GPT-3.5, GPT-4, and Mixtral 8x22B, in segmenting text within Invoice items descriptions using zero-shot learning techniques. We have employed accuracy, precision, recall, and F₁-score evaluation metrics to highlight the effectiveness of LLMs. The experiment involved segmentation preparation, model training, prompt optimization, attribute extraction, and output generation. The objective is to determine each model's precision in accurately identifying segmentation within invoice item descriptions.

Palavras-chave: Large Language Models, Attribute Segmentation, Zero-shot prompting

Referências

Almeida, T. S., Abonizio, H., Nogueira, R., and Pires, R. (2024). Sabiá-2: A new generation of portuguese large language models. ArXiv, abs/2403.09887.

Aumiller, D., Almasian, S., Lackner, S., and Gertz, M. (2021). Structural text segmentation of legal documents. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, ICAIL ’21. ACM.

Borkar, V., Deshmukh, K., and Sarawagi, S. (2001). Automatic segmentation of text into structured records. SIGMOD Rec., 30(2):175–186.

Boukhers, Z., Ambhore, S., and Staab, S. (2019). An end-to-end approach for extracting and segmenting high-variance references from pdf documents. In 2019 ACM/IEEE JCDL, pages 186–195.

Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P. S., Yang, Q., and Xie, X. (2023). A survey on evaluation of large language models.

Chen, X., Marazopoulou, K., Lee, W., Agarwal, C., Sukumaran, J., and Hofleitner, A. (2023). Binary classifier evaluation on unlabeled segments using inverse distance weighting with distance learning. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.

Chen, Z., Meng, W., and Dragut, E. C. (2022). Web record extraction with invariants. Proc. VLDB Endow., 16:959–972.

Cruz, P., Vanneschi, L., Painho, M., and Rita, P. (2021). Automatic identification of addresses: A systematic literature review. ISPRS Int. J. Geo Inf., 11:11.

Dorneles, C. F., Gonçalves, R., and dos Santos Mello, R. (2011). Approximate data instance matching: a survey. Knowledge and Information Systems, 27(1):1–21.

Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly.

Haider, W. and Yeşilada, Y. (2022). Classification of layout vs. relational tables on the web: Machine learning with rendered pages. ACM Transac. on the Web, 17:1 – 23.

Hoffart, J., Seufert, S., Nguyen, D. B., Theobald, M., and Weikum, G. (2012). Kore: keyphrase overlap relatedness for entity disambiguation. In Proceedings of the 21st CIKM, page 545–554, New York, NY, USA. Association for Computing Machinery.

Kayed, M., Dakrory, S., and Ali, A. A. (2021). Postal address extraction from the web: a comprehensive survey. Artificial Intelligence Review, 55:1085 – 1120.

Lerman, K., Getoor, L., Minton, S., and Knoblock, C. (2004). Using the structure of web sites for automatic segmentation of tables. In Proceedings of the 2004 ACM SIGMOD, page 119–130, New York, NY, USA. Association for Computing Machinery.

Misra, H., Yvon, F., Cappé, O., and Jose, J. (2011). Text segmentation: A topic modeling perspective. Information Processing & Management, 47(4):528–544.

Peng, F. and McCallum, A. (2006). Information extraction from research papers using conditional random fields. Information Processing & Management, 42(4):963–979.

Rea, L. and Parker, R. (2012). Designing and Conducting Survey Research: A Comprehensive Guide. Wiley.

Simon, K. and Lausen, G. (2005). Viper: augmenting automatic information extraction with visual perceptions. In International Conference on Information and Knowledge Management.

Uppalapati, V. K. and Nag, D. S. (2024). A comparative analysis of ai models in complex medical decision-making scenarios: Evaluating chatgpt, claude ai, bard, and perplexity. Cureus, 16.

Varma, M., Orr, L., Wu, S., Leszczynski, M., Ling, X., and Ré, C. (2021). Cross-domain data integration for entity disambiguation in biomedical text. In EMNLP.

Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., and Zhang, Y. (2024). A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. 4(2):100211.

Yoon, J., Gupta, A., and Anumanchipalli, G. K. (2024). Is bigger edit batch size always better? – an empirical study on model editing with llama-3. ArXiv.

Zhang, P., Shao, N., Liu, Z., Xiao, S., Qian, H., Ye, Q., and Dou, Z. (2024). Extending llama-3’s context ten-fold overnight. ArXiv.

Zhang, X., Zou, J., Le, D., and Thoma, G. (2011). A structural svm approach for reference parsing. BMC bioinformatics, 12 Suppl 3:S7.