Disambiguation of terms from the Linguistic Atlas of Brazil through OpenWordnet-PT-ALiB
Abstract
This work describes the disambiguation of terms from the Linguistic Atlas of Brazil (ALiB) via OpenWN-PT-ALiB through a Twitter corpus. The study presents two main contributions: the incorporation of some ALiB terms in OpenWordNet-PT (OpenWN-PT) and the development of a disambiguation method using Word Embeddings and the Soft Cosine Measure (SCM). The proposed method uses Word Embeddings to represent the words in a vector space and calculates the SCM between the context of the tweets and the possible synsets of OpenWN-PT-ALiB for disambiguation. Results demonstrate the effectiveness of the method, with higher disambiguation rates even in the context of Twitter.
Keywords:
Disambiguation, Vitality, Twitter, Word Embeddings
References
Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. (2003). A neural probabilistic language model. J. Mach. Learn. Res., 3(null):1137–1155.
Cardoso, S. and Mota, J. (2014). Atlas Linguístico do Brasil. Addison-Wesley Longman Publishing Co., Inc.
de Paiva, V., Rademaker, A., and de Melo, G. (2012). Openwordnet-pt: An open Brazilian Wordnet for reasoning. In Proceedings of COLING 2012: Demonstration Papers, pages 353–360, Mumbai, India. The COLING 2012 Organizing Committee. Published also as Techreport http://hdl.handle.net/10438/10274.
Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Bradford Books. https://doi.org/10.2307/417141
Hartmann, N. S., Fonseca, E. R., Shulby, C. D., Treviso, M. V., Rodrigues, J. S., and Aluísio, S. M. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. In Anais do XI Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 122–131, Porto Alegre, RS, Brasil. SBC.
Ide, N. and Véronis, J. (1998). Introduction to the special issue on word sense disambiguation: The state of the art. Computational Linguistics, 24(1):1–40.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. In Bengio, Y. and LeCun, Y., editors, 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings. https://doi.org/10.48550/arXiv.1301.3781
Cardoso, S. and Mota, J. (2014). Atlas Linguístico do Brasil. Addison-Wesley Longman Publishing Co., Inc.
de Paiva, V., Rademaker, A., and de Melo, G. (2012). Openwordnet-pt: An open Brazilian Wordnet for reasoning. In Proceedings of COLING 2012: Demonstration Papers, pages 353–360, Mumbai, India. The COLING 2012 Organizing Committee. Published also as Techreport http://hdl.handle.net/10438/10274.
Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Bradford Books. https://doi.org/10.2307/417141
Hartmann, N. S., Fonseca, E. R., Shulby, C. D., Treviso, M. V., Rodrigues, J. S., and Aluísio, S. M. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. In Anais do XI Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 122–131, Porto Alegre, RS, Brasil. SBC.
Ide, N. and Véronis, J. (1998). Introduction to the special issue on word sense disambiguation: The state of the art. Computational Linguistics, 24(1):1–40.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. In Bengio, Y. and LeCun, Y., editors, 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings. https://doi.org/10.48550/arXiv.1301.3781
Published
2023-09-25
How to Cite
BARRETO, Augusto Sampaio; CLARO, Daniela Barreiro.
Disambiguation of terms from the Linguistic Atlas of Brazil through OpenWordnet-PT-ALiB. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 14. , 2023, Belo Horizonte/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2023
.
p. 377-381.
DOI: https://doi.org/10.5753/stil.2023.234580.
