Evaluation of Dimensionality Reduction and Truncation Techniques for Word Embeddings

  • Paulo Henrique Calado Aoun UFRPE
  • Andre C. A. Nascimento
  • Adenilton J. da Silva

Resumo


The use of word embeddings is becoming very common in many Natural Language Processing tasks. Most of the time, these require computacional resources that can not be found in most part of the current mobile devices. In this work, we evaluate a combination of numeric truncation and dimensionality reduction strategies in order to obtain smaller vectorial representations without substancial losses in performance.

Referências


Faruqui, M. and Dyer, C. (2014). Community evaluation and exchange of word vectors. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations., pages 19–24.

Levy, O. and Goldberg, Y. (2014). Dependency-based word embeddings. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 302–308.

Ling, S., Song, Y., and Roth, D. (2016). Word Embeddings with Limited Memory. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 387–392.

Mu, J., Bhat, S., and Viswanath, P. (2017). All-but-the-top: Simple and effective postprocessing for word representations. arXiv preprint arXiv:1702.01417.

Nielsen, M. A. and Chuang, I. (2002). Quantum computation and quantum information.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., and Others (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct):2825–2830.

Pennington, J., Socher, R., and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.

Raunak, V. (2017). Effective dimensionality reduction for word embeddings. arXiv preprint arXiv:1708.03629.

Shlens, J. (2014). A tutorial on principal component analysis. arXiv preprint ar- Xiv:1404.1100.

Trugenberger, C. A. (2001). Probabilistic quantum memories. Physical Review Letters, 87(6):067901.

Well, A. D. and Myers, J. L. (2003). Research design & statistical analysis. Psychology Press.

Publicado
22/10/2018
AOUN, Paulo Henrique Calado; NASCIMENTO, Andre C. A.; DA SILVA, Adenilton J.. Evaluation of Dimensionality Reduction and Truncation Techniques for Word Embeddings. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 15. , 2018, São Paulo. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 903-911. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2018.4477.