Semantic Search Mechanism Based on Word Embeddings in Lattes Curriculum Data, Graduate Programs, and Research Groups
Abstract
The search for researchers and scientific publications is essential for access to academic knowledge. However, keyword-based search mechanisms might fail to capture the semantics of queries, which can lead to less relevant results. This research proposes the implementation and analysis of a semantic search mechanism, using Word Embeddings to provide more relevant answers in the academic context. The study presents an architecture and implementation that allows efficient semantic searches in scientific databases through the transformation and indexing of Word Embeddings.References
Deepak, G. and Santhanavijayan, A. (2022). Uqscm-rfd: A query–knowledge interfacing approach for diversified query recommendation in semantic search based on river flow dynamics and dynamic user interaction. Neural Computing and Applications, 34(1):651–675.
dos Santos, M. S., de Jesus Oliveira, V. H., de Freitas Jorge, E. M., and de Meireles Costa, G. (2024). Solução para mapeamento e consulta das competências dos pesquisadores: uma arquitetura para extração, integração e consultas de informações acadêmicas. Cadernos de Prospecção, 17(2):671–688.
Dresch, A., Lacerda, D. P., and Junior, J. A. V. A. (2020). Design science research: método de pesquisa para avanço da ciência e tecnologia. Bookman Editora.
Farmanbar, M., Van Ommeren, N., and Zhao, B. (2020). Semantic search with domain-specific word-embedding and production monitoring in fintech. In Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations, pages 28–33.
Forgues, G., Pineau, J., Larchevêque, J.-M., and Tremblay, R. (2014). Bootstrapping dialog systems with word embeddings. In Nips, modern machine learning and natural language processing workshop, volume 2, page 168.
Gundyreva, E., Pivovarova, L., and Zosa, E. (2022). Unsupervised linking of scientific articles to food systems taxonomies.
Gupta, S. (2017). A survey on search engines. Journal for Research— Volume, 2(11).
Jbene, M., Tigani, S., Saadane, R., and Chehri, A. (2021). Deep neural network and boosting based hybrid quality ranking for e-commerce product search. Big Data and Cognitive Computing, 5(3):35.
Rastogi, N., Verma, P., and Kumar, P. (2021). Query expansion based on word embeddings and ontologies for efficient information retrieval. International Journal of Advanced Computer Science and Applications, 12(11).
Sharma, A. and Kumar, S. (2022). Shallow neural network and ontology-based novel semantic document indexing for information retrieval. Intelligent Automation & Soft Computing, 34(3):1989–2005.
Sharma, D. K., Pamula, R., and Chauhan, D. (2021). Semantic approaches for query expansion. Evolutionary Intelligence, 14(2):1101–1116.
Sheela, A. S. and Jayakumar, C. (2019). Comparative study of syntactic search engine and semantic search engine: A survey. In 2019 Fifth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), volume 1, pages 1–4.
Ta, C. V., Reiner, F., von Detten, I., and Stöhr, F. (2022). Touché-task 1-team korg: Finding pairs of argumentative sentences using embeddings. In CLEF (Working Notes), pages 3131–3148.
Tuncer, I., Kara, K. C., and Karakaş, A. (2021). Improving search relevance with word embedding based clusters. In Trends in Data Engineering Methods for Intelligent Systems: Proceedings of the International Conference on Artificial Intelligence and Applied Mathematics in Engineering (ICAIAME 2020), pages 15–24. Springer.
dos Santos, M. S., de Jesus Oliveira, V. H., de Freitas Jorge, E. M., and de Meireles Costa, G. (2024). Solução para mapeamento e consulta das competências dos pesquisadores: uma arquitetura para extração, integração e consultas de informações acadêmicas. Cadernos de Prospecção, 17(2):671–688.
Dresch, A., Lacerda, D. P., and Junior, J. A. V. A. (2020). Design science research: método de pesquisa para avanço da ciência e tecnologia. Bookman Editora.
Farmanbar, M., Van Ommeren, N., and Zhao, B. (2020). Semantic search with domain-specific word-embedding and production monitoring in fintech. In Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations, pages 28–33.
Forgues, G., Pineau, J., Larchevêque, J.-M., and Tremblay, R. (2014). Bootstrapping dialog systems with word embeddings. In Nips, modern machine learning and natural language processing workshop, volume 2, page 168.
Gundyreva, E., Pivovarova, L., and Zosa, E. (2022). Unsupervised linking of scientific articles to food systems taxonomies.
Gupta, S. (2017). A survey on search engines. Journal for Research— Volume, 2(11).
Jbene, M., Tigani, S., Saadane, R., and Chehri, A. (2021). Deep neural network and boosting based hybrid quality ranking for e-commerce product search. Big Data and Cognitive Computing, 5(3):35.
Rastogi, N., Verma, P., and Kumar, P. (2021). Query expansion based on word embeddings and ontologies for efficient information retrieval. International Journal of Advanced Computer Science and Applications, 12(11).
Sharma, A. and Kumar, S. (2022). Shallow neural network and ontology-based novel semantic document indexing for information retrieval. Intelligent Automation & Soft Computing, 34(3):1989–2005.
Sharma, D. K., Pamula, R., and Chauhan, D. (2021). Semantic approaches for query expansion. Evolutionary Intelligence, 14(2):1101–1116.
Sheela, A. S. and Jayakumar, C. (2019). Comparative study of syntactic search engine and semantic search engine: A survey. In 2019 Fifth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), volume 1, pages 1–4.
Ta, C. V., Reiner, F., von Detten, I., and Stöhr, F. (2022). Touché-task 1-team korg: Finding pairs of argumentative sentences using embeddings. In CLEF (Working Notes), pages 3131–3148.
Tuncer, I., Kara, K. C., and Karakaş, A. (2021). Improving search relevance with word embedding based clusters. In Trends in Data Engineering Methods for Intelligent Systems: Proceedings of the International Conference on Artificial Intelligence and Applied Mathematics in Engineering (ICAIAME 2020), pages 15–24. Springer.
Published
2024-11-05
How to Cite
BATISTA, João Vítor Café dos R.; COSTA, Gleidson de Meireles; JORGE, Eduardo Manuel de Freitas.
Semantic Search Mechanism Based on Word Embeddings in Lattes Curriculum Data, Graduate Programs, and Research Groups. In: REGIONAL SCHOOL ON COMPUTING OF BAHIA, ALAGOAS, AND SERGIPE (ERBASE), 24. , 2024, Salvador/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 109-118.
DOI: https://doi.org/10.5753/erbase.2024.4430.
