Studying the Dependence of Embedding Representations on the Target of NLP Tasks

Bárbara Stéphanie Neves Oliveira; Ticiana L. Coelho da Silva; José A. F. de Macêdo

doi:10.5753/stil.2023.234166

Bárbara Stéphanie Neves Oliveira UFC http://orcid.org/0009-0001-7513-8093
Ticiana L. Coelho da Silva UFC https://orcid.org/0000-0001-7686-9827
José A. F. de Macêdo UFC

DOI: https://doi.org/10.5753/stil.2023.234166

Resumo

In many human languages, linguistic units represent text structure. Vector semantics is used in NLP to represent these units, known as embeddings. Evaluating the learned representations is crucial for identifying critical differences between the diverse existing embedding models in task-specific selection. However, the evaluation process is complex, with two approaches: intrinsic and extrinsic. While useful, aggregated evaluations often lack consistency due to result misalignment. This work investigates the dependencies and correlations between embeddings and NLP tasks. The goal is how to initially verify if the embeddings' dimensions (i.e., features) depend on the final task. The study then explores two research questions and presents findings from experiments.

Palavras-chave: Embeddings, NLP tasks suitability, Evaluation process, Heuristics, Numerical measures

Referências

Adi, Y., Kermany, E., Belinkov, Y., Lavi, O., and Goldberg, Y. (2016). Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks. arXiv preprint arXiv:1608.04207. https://doi.org/10.48550/arXiv.1608.04207

Bakarov, A. (2018). A Survey of Word Embeddings Evaluation Methods. arXiv preprint arXiv:1801.09536. https://doi.org/10.48550/arXiv.1801.09536

Boggust, A., Carter, B., and Satyanarayan, A. (2022). Embedding Comparator: Visualizing Differences in Global Structure and Local Neighborhoods via Small Multiples. In 27th International Conference on Intelligent User Interfaces, pages 746–766. https://doi.org/10.1145/3490099.3511122

Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2016). Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606. https://doi.org/10.1162/tacl_a_00051

Butcher, B. and Smith, B. J. (2020). Feature Engineering and Selection: A Practical Approach for Predictive Models: by Max Kuhn and Kjell Johnson. Boca Raton, FL: Chapman & Hall/CRC Press, 2019, xv+ 297 pp., $79.95 (H), ISBN: 978-1-13-807922-9. https://doi.org/10.1201/9781315108230

Carter, B., Mueller, J., Jain, S., and Gifford, D. (2019). What made you do this? Understanding black-box decisions with sufficient input subsets. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 567–576. PMLR. https://doi.org/10.48550/arXiv.1810.03805

Chen, J., Tao, Y., and Lin, H. (2018). Visual Exploration and Comparison of Word Embeddings. Journal of Visual Languages & Computing, 48:178–186. https://doi.org/10.1016/j.jvlc.2018.08.008

Conneau, A., Kruszewski, G., Lample, G., Barrault, L., and Baroni, M. (2018). What you can cram into a single vector: Probing sentence embeddings for linguistic properties. arXiv preprint arXiv:1805.01070. https://doi.org/10.48550/arXiv.1805.01070

Fano, R. M. (1961). Transmission of Information: A Statistical Theory of Communications. American Journal of Physics, 29(11):793–794. https://doi.org/10.1119/1.1937609

Go, A., Bhayani, R., and Huang, L. (2009). Twitter Sentiment Classification using Distant Supervision. CS224N project report, Stanford, 1(12):2009.

Hamilton, W. L., Leskovec, J., and Jurafsky, D. (2016). Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change. In Proceedings of the conference on empirical methods in natural language processing. Conference on empirical methods in natural language processing, volume 2016, page 2116. NIH Public Access. http://dx.doi.org/10.18653/v1/D16-1229

Heimerl, F. and Gleicher, M. (2018). Interactive Analysis of Word Vector Embeddings. In Computer Graphics Forum, volume 37, pages 253–265. Wiley Online Library. https://doi.org/10.1111/cgf.13417

Ignat, O., Jin, Z., Abzaliev, A., Biester, L., Castro, S., Deng, N., Gao, X., Gunal, A., He, J., Kazemi, A., et al. (2023). A PhD Student’s Perspective on Research in NLP in the Era of Very Large Language Models. arXiv preprint arXiv:2305.12544. https://doi.org/10.48550/arXiv.2305.12544

Jurafsky, D. and Martin, J. H. (2018). Speech and Language Processing. preparation [cited 2020 June 1] Available from: https://web.stanford.edu/~jurafsky/slp3. https://web.stanford.edu/~jurafsky/slp3/

Li, Q., Njotoprawiro, K. S., Haleem, H., Chen, Q., Yi, C., and Ma, X. (2018). EmbeddingVis: A Visual Analytics Approach to Comparative Network Embedding Inspection. In 2018 IEEE Conference on Visual Analytics Science and Technology (VAST), pages 48–59. IEEE. https://doi.org/10.1109/VAST.2018.8802454

Liu, N. F., Gardner, M., Belinkov, Y., Peters, M. E., and Smith, N. A. (2019a). Linguistic Knowledge and Transferability of Contextual Representations. arXiv preprint arXiv:1903.08855. http://dx.doi.org/10.18653/v1/N19-1112

Liu, Y., Jun, E., Li, Q., and Heer, J. (2019b). Latent Space Cartography: Visual Analysis of Vector Space Embeddings. In Computer graphics forum, volume 38, pages 67–78. Wiley Online Library. https://doi.org/10.1111/cgf.13672

Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., and Potts, C. (2011). Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pages 142–150. https://dl.acm.org/doi/10.5555/2002472.2002491

Muennighoff, N., Tazi, N., Magne, L., and Reimers, N. (2022). MTEB: Massive Text Embedding Benchmark. arXiv preprint arXiv:2210.07316. https://doi.org/10.48550/arXiv.2210.07316

Oliveira, B. S. N., do Rêgo, L. G. C., Peres, L., da Silva, T. L. C., and de Macêdo, J. A. F. (2022). Processamento de Linguagem Natural via Aprendizagem Profunda. Sociedade Brasileira de Computação. https://doi.org/10.5753/sbc.9399.1.1

Pennington, J., Socher, R., and Manning, C. D. (2014). GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543. http://dx.doi.org/10.3115/v1/D14-1162

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). “Why Should I Trust You?” Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144. https://dl.acm.org/doi/10.1145/2939672.2939778

Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. https://doi.org/10.48550/arXiv.1910.01108

Schnabel, T., Labutov, I., Mimno, D., and Joachims, T. (2015). Evaluation methods for unsupervised word embeddings. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 298–307. http://dx.doi.org/10.18653/v1/D15-1036

Shi, X., Padhi, I., and Knight, K. (2016). Does String-Based Neural MT Learn Source Syntax? In Proceedings of the 2016 conference on empirical methods in natural language processing, pages 1526–1534. http://dx.doi.org/10.18653/v1/D16-1159

Shrikumar, A., Greenside, P., and Kundaje, A. (2017). Learning Important Features Through Propagating Activation Differences. In International conference on machine learning, pages 3145–3153. PMLR. https://dl.acm.org/doi/10.5555/3305890.3306006

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., and Potts, C. (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642.

Torregrossa, F., Allesiardo, R., Claveau, V., Kooli, N., and Gravier, G. (2021). A survey on training and evaluation of word embeddings. International Journal of Data Science and Analytics, 11(2):85–103. https://doi.org/10.1007/s41060-021-00242-8

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S. R. (2018a). Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461. http://dx.doi.org/10.18653/v1/W18-5446

Wang, Y., Liu, S., Afzal, N., Rastegar-Mojarad, M., Wang, L., Shen, F., Kingsbury, P., and Liu, H. (2018b). A Comparison of Word Embeddings for the Biomedical Natural Language Processing. Journal of biomedical informatics, 87:12– 20. https://doi.org/10.1016/j.jbi.2018.09.008

Warstadt, A., Singh, A., and Bowman, S. R. (2019). Neural Network Acceptability Judgments. Transactions of the Association for Computational Linguistics, 7:625–641. http://dx.doi.org/10.1162/tacl_a_00290

Zhelezniak, V., Savkov, A., and Hammerla, N. (2020). Estimating Mutual Information Between Dense Word Embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8361–8371. http://dx.doi.org/10.18653/v1/2020.acl-main.741