Revisão sistematica das teorias norteadoras do Visual Transformers

Joelson S. Junior; Giancarlo Lucca; Diego Bottero; Graçaliz P. Dimuro; Helida Santos

doi:10.5753/weit.2023.26601

Joelson S. Junior Universidade Federal do Rio Grande (FURG)
Giancarlo Lucca Universidade Católica de Pelotas (UCPel) https://orcid.org/0000-0002-3776-0260
Diego Bottero Universidade Federal do Rio Grande (FURG)
Graçaliz P. Dimuro Universidade Federal do Rio Grande (FURG) https://orcid.org/0000-0001-6986-9888
Helida Santos Universidade Federal do Rio Grande (FURG) http://orcid.org/0000-0003-2994-2862

DOI: https://doi.org/10.5753/weit.2023.26601

Resumo

Os Transformers emergiram como uma poderosa arquitetura em Inteligência Artificial, revolucionando várias tarefas de processamento de linguagem natural e processamento de imagem. Este artigo apresenta uma análise abrangente sobre a evolução dos Transformers historicamente, destacando seu mecanismo de autoatenção e finalizando com o novo modelo de Transformers Visual. Exploramos as principais contribuições dos trabalhos-chave na área até o desenvolvimento do Visual Transformers. Realizar uma análise sistemática nos ajuda a entender melhor o funcionamento desse modelo e quais são os tópicos chaves para cada abordagem.

Palavras-chave: autoatencao, transformers, pln, visual

Referências

Abdi, H., Valentin, D., and Edelman, B. (1999). Neural networks. Sage. https://doi.org/10.4135/9781412985277

Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., and Salakhutdinov, R. (2019). Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860. https://doi.org/10.48550/arXiv.1901.02860

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805

Di Gangi, M. A., Negri, M., and Turchi, M. (2019). Adapting transformer to end-to-end spoken language translation. In Proceedings of INTERSPEECH 2019, pages 1133-1137. International Speech Communication Association (ISCA). https://doi.org/10.48550/arXiv.2106.04833

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. https://doi.org/10.48550/arXiv.2010.11929

Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine, 29(6):82-97. https://doi.org/10.1109/MSP.2012.2205597

Jing, Y., Yang, Y., Feng, Z., Ye, J., Yu, Y., and Song, M. (2019). Neural style transfer: A review. IEEE Transactions on Visualization and Computer Graphics, 26(11):3365-3385. https://doi.org/10.1109/TVCG.2019.2921336

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324. https://doi.org/10.1109/5.726791

Liu, P.-r., Lu, L., Zhang, J.-y., Huo, T.-t., Liu, S.-x., and Ye, Z.-w. (2021). Application of artificial intelligence in medicine: an overview. Current Medical Science, 41(6):1105-1115. https://doi.org/10.1007/s11596-021-2474-3

Otter, D. W., Medina, J. R., and Kalita, J. K. (2020). A survey of the usages of deep learning for natural language processing. IEEE transactions on neural networks and learning systems, 32(2):604-624. https://doi.org/10.1109/tnnls.2020.2979670

Ouyang, F. and Jiao, P. (2021). Artificial intelligence in education: The three paradigms. Computers and Education: Artificial Intelligence, 2:100020. https://doi.org/10.1016/j.caeai.2021.100020

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088):533-536. https://doi.org/10.1038/

Russell, S. J. (2010). Artificial intelligence a modern approach. Pearson Education, Inc.

Valenzuela, O., Catala, A., Anguita, D., and Rojas, I. (2023). New advances in artificial neural networks and machine learning techniques. Neural Processing Letters, pages 1-4. https://doi.org/10.1007/s11063-023-11350-w

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. https://doi.org/10.48550/arXiv.1706.03762

Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., and Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32. https://doi.org/10.48550/arXiv.1906.08237

Zhang, J. and Man, K.-F. (1998). Time series prediction using rnn in multi-dimension embedding phase space. In SMC’98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No. 98CH36218), volume 2, pages 1868-1873. IEEE. https://doi.org/10.1109/ICSMC.1998.728168