An Intelligent Report Generator For Chemical Documents
Resumo
Context: Scientific articles and patents contain academic, industrial, and scientific information. Automatically retrieving information from these documents is necessary for supporting upcoming scientific research development. Problem: Difficulties in manually identifying and analyzing the chemical information in documents make it nearly impossible to access specific contents of chemical investigations and generate reports to support ongoing research. Solution: In this article, we present a system that recognizes chemical entities (elements, classes, compounds, methods, and equipment) and generates intelligent reports from free texts. IS Theory: We developed this work under the support of Soft Systems Theory. Method: This research was evaluated through proof of concept. We used 30 chemical patents from Brazilian National Institute of Industrial Property and 20 scientific articles from Revista Virtual de Química (RVq). For validation, we extracted the texts and recognized the named entities through, for instance, the hybrid method Conditional Random Field (CRF) + Local Grammar (LG). We then apply rules to generate intelligent reports. Summary of Results: The system can generate seven types of intelligent reports, two of which are customized by the user. For datasetPat our model obtained mean values of 98.96% for Precision, 91.12% for Recall, and 94.17% for F-Score. The datasetArt reached average values of 97.31%, 86.94%, and 91.29% for Precision, Recall, and F-Score, respectively. Contributions and Impact in the IS Area: This research presents as the main contribution the availability of an Information System for the generation of intelligent reports from documents based on the recognition of named entities in the chemical area. In addition the hybrid method CRF+LG can contribute to the evolution of Information Systems, helping people and organizations. The model is described throughout the paper and can be replicated in other contexts.
Referências
Luciana Almansa, Gabriel Rubio, and Alessandra Macedo. 2020. A Question Answering System over Chronic Diseases and Epigenetics Knowledge. In Anais do XX Simpósio Brasileiro de Computação Aplicada à Saúde (Evento Online). SBC, Porto Alegre, RS, Brasil, 203–214. [link]
Haitham M Alzoubi. 2018. The role of intelligent information system in e-supply chain management performance. INTELLIGENT INFORMATION SYSTEM SUPPLY CHAIN 7, 2 (2018), 363–370.
Haitham M. Alzoubi, Muhammad Alshurideh, and Taher M. Ghazal. 2021. Integrating BLE Beacon Technology with Intelligent Information Systems IIS for Operations’ Performance: A Managerial Perspective. In Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2021), Aboul Ella Hassanien, Abdelkrim Haqiq, Peter J. Tonellato, Ladjel Bellatreche, Sam Goundar, Ahmad Taher Azar, Essaid Sabir, and Driss Bouzidi (Eds.). Springer International Publishing, Cham, 527–538.
Thiago Prado de Campos, Eduardo Filgueiras Damasceno, and Natasha Malveira Costa Valentim. 2022. Proposal and Evaluation of a Collaborative IS to Support Systematic Reviews and Mapping Studies. In XVIII Brazilian Symposium on Information Systems. Association for Computing Machinery, New York, NY, USA, 1–8.
Cristiano da Silveira Colombo and Elias Silva de Oliveira. 2022. Intelligent Information System for Extracting Knowledge from Pharmaceutical Package Inserts. In XVIII Brazilian Symposium on Information Systems (Curitiba, Brazil) (SBSI). Association for Computing Machinery, New York, NY, USA, Article 47, 9 pages.
Cristóbal Colón-Ruiz, Isabel Segura-Bedmar, and Paloma Martınez. 2017. Combining the banner tool with the DINTO ontology for the CEMP task of BioCreative V. 5. Proceedings of the BioCreative 5 (2017), 104–107.
Luís Conceição, João Carneiro, Diogo Martinho, Goreti Marreiros, and Paulo Novais. 2016. Generation of intelligent reports for ubiquitous group decision support systems. In 2016 Global Information Infrastructure and Networking Symposium (GIIS). IEEE, New York, 1–6.
Geovani Rocha de Freitas, Meuris Gurgel Carlos da Silva, and Melissa Gurgel Adeodato Vieira. 2019. Biosorption technology for removal of toxic metals: a review of commercial biosorbents and patents. Environmental Science and Pollution Research 26, 19 (2019), 19097–19118.
Tatiany Xavier de Godoi, André Menolli, and Gustavo Marcelino Dionísio. 2019. Software Startups Success Factors Study under the Entrepreneurial Perspective. In Proceedings of the XV Brazilian Symposium on Information Systems (Aracaju, Brazil) (SBSI’19). Association for Computing Machinery, New York, NY, USA, Article 53, 8 pages.
Arslan Erdengasileng, Keqiao Li, Qing Han, Shubo Tian, Jian Wang, Ting Hu, and Jinfeng Zhang. 2021. A BERT-Based Hybrid System for Chemical Identification and Indexing in Full-Text Articles. bioRxiv 1 (2021), 1–5.
Maurice Gross. 1997. 1 The Construction of Local Grammars. Finite-state language processing 1 (1997), 329.
Orland Hoeber. 2018. Information Visualization for Interactive Information Retrieval. In Proceedings of the 2018 Conference on Human Information Interaction & Retrieval (New Brunswick, NJ, USA) (CHIIR ’18). Association for Computing Machinery, New York, NY, USA, 371–374.
Flavio Izo, Jhonatan Leão, Juliana Pinheiro Campos Pirovani, and Elias Oliveira. 2022. Automatic Generation of Large-Scale Assessment Questions. In XVIII Brazilian Symposium on Information Systems (Curitiba, Brazil) (SBSI). Association for Computing Machinery, New York, NY, USA, Article 7, 8 pages.
Flavio Izo, Elias Oliveira, and Claudine Badue. 2022. Named Entities as a Metadata Resource for Indexing and Searching Information. In Intelligent Systems Design and Applications, Ajith Abraham, Niketa Gandhi, Thomas Hanne, Tzung-Pei Hong, Tatiane Nogueira Rios, and Weiping Ding (Eds.). Springer International Publishing, Cham, 838–848.
Mario Krenn, Robert Pollice, Si Yue Guo, Matteo Aldeghi, Alba Cervera-Lierta, Pascal Friederich, Gabriel dos Passos Gomes, Florian Häse, Adrian Jinich, AkshatKumar Nigam, et al. 2022. On scientific understanding with artificial intelligence. Nature Reviews Physics 4 (2022), 1–9.
Xusheng Li, Chengcheng Fu, Ran Zhong, Duo Zhong, Tingting He, and Xingpeng Jiang. 2019. A hybrid deep learning framework for bacterial named entity recognition with domain features. BMC bioinformatics 20, 16 (2019), 1–9.
Jiaying Liu, Xiangjie Kong, Xinyu Zhou, Lei Wang, Da Zhang, Ivan Lee, Bo Xu, and Feng Xia. 2019. Data Mining and Information Retrieval in the 21st century: A bibliographic review. Computer science review 34 (2019), 100193.
Francis P. McManamon, Keith W. Kintigh, Leigh Anne Ellison, and Adam Brin. 2017. tDAR: A Cultural Heritage Archive for Twenty-First-Century Public Outreach, Research, and Resource Management. Advances in Archaeological Practice 5, 3 (2017), 238–249.
Weiwei Pan, Lirong Jian, and Tao Liu. 2022. Knowledge generation and diffusion in science & technology: an empirical study of SiC-MOSFET based on scientific papers and patents. Technology Analysis & Strategic Management 0, 0 (2022), 1–17.
Juliana Pinheiro Campos Pirovani. 2019. CRF+ LG: Uma Abordagem Híbrida para o Reconhecimento de Entidades Nomeadas em Português. Ph. D. Dissertation. Universidade Federal do Espírito Santo.
Agha Azeem Rehma, Mazhar Javed Awan, and Ilyas Butt. 2018. Comparison and evaluation of information retrieval models. VFAST Transactions on Software Engineering 6, 1 (2018), 7–14.
Suzanne Reinman and Janet Ahrberg. 2020. Issued patents in a university's institutional repository. Journal of the Patent and Trademark Resource Center Association 30, 1 (2020), 5.
A. Sreejithlal, M. N. Syam, T. M. Letha, K. P. M. Madhusoodanan, and A. Shooja. 2018. Pressure Sensor Test System Using Raspberry Pi. In 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS). IEEE, New York, 182–185.
Xiaoguang Wang, Ningyuan Song, Huimin Zhou, and Hanghang Cheng. 2022. The representation of argumentation in scientific papers: A comparative analysis of two research areas. Journal of the Association for Information Science and Technology 73, 6 (2022), 863–878.
Chih-Hsuan Wei, Alexis Allot, Robert Leaman, and Zhiyong Lu. 2019. PubTator central: automated concept annotation for biomedical full text articles. Nucleic acids research 47, W1 (2019), W587–W593.
Binbin Yu. 2019. Research on information retrieval model based on ontology. EURASIP Journal on Wireless Communications and Networking 2019, 1 (2019), 1–8.