An Intelligent Report Generator For Chemical Documents


Context: Scientific articles and patents contain academic, industrial, and scientific information. Automatically retrieving information from these documents is necessary for supporting upcoming scientific research development. Problem: Difficulties in manually identifying and analyzing the chemical information in documents make it nearly impossible to access specific contents of chemical investigations and generate reports to support ongoing research. Solution: In this article, we present a system that recognizes chemical entities (elements, classes, compounds, methods, and equipment) and generates intelligent reports from free texts. IS Theory: We developed this work under the support of Soft Systems Theory. Method: This research was evaluated through proof of concept. We used 30 chemical patents from Brazilian National Institute of Industrial Property and 20 scientific articles from Revista Virtual de Química (RVq). For validation, we extracted the texts and recognized the named entities through, for instance, the hybrid method Conditional Random Field (CRF) + Local Grammar (LG). We then apply rules to generate intelligent reports. Summary of Results: The system can generate seven types of intelligent reports, two of which are customized by the user. For datasetPat our model obtained mean values of 98.96% for Precision, 91.12% for Recall, and 94.17% for F-Score. The datasetArt reached average values of 97.31%, 86.94%, and 91.29% for Precision, Recall, and F-Score, respectively. Contributions and Impact in the IS Area: This research presents as the main contribution the availability of an Information System for the generation of intelligent reports from documents based on the recognition of named entities in the chemical area. In addition the hybrid method CRF+LG can contribute to the evolution of Information Systems, helping people and organizations. The model is described throughout the paper and can be replicated in other contexts.

Palavras-chave: Artificial Intelligence, Natural Language Processing, Named Entity Recognition, Intelligent Report


IZO, Flavio; VEREAU, Luis Enrique Santos Prado; PIROVANI, Juliana Pinheiro Campos; OLIVEIRA, Elias; BADUE, Claudine. An Intelligent Report Generator For Chemical Documents. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 19. , 2023, Maceió/AL. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 .

