Systematic Review of Guiding Theories for Visual Transformers
Abstract
Transformers have emerged as a powerful architecture in Artificial Intelligence, revolutionizing various Natural Language Processing and image processing tasks. This paper presents a comprehensive analysis of the historical evolution of Transformers, emphasizing their self-attention mechanism and culminating with the introduction of the novel Visual Transformers model. We explore the main contributions made by key works in the field leading up to the development of Visual Transformers. Conducting a systematic analysis helps in gaining a deeper understanding of the functioning of this model and identifying the key topics for each approach.
References
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., and Salakhutdinov, R. (2019). Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860. https://doi.org/10.48550/arXiv.1901.02860
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805
Di Gangi, M. A., Negri, M., and Turchi, M. (2019). Adapting transformer to end-to-end spoken language translation. In Proceedings of INTERSPEECH 2019, pages 1133-1137. International Speech Communication Association (ISCA). https://doi.org/10.48550/arXiv.2106.04833
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. https://doi.org/10.48550/arXiv.2010.11929
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine, 29(6):82-97. https://doi.org/10.1109/MSP.2012.2205597
Jing, Y., Yang, Y., Feng, Z., Ye, J., Yu, Y., and Song, M. (2019). Neural style transfer: A review. IEEE Transactions on Visualization and Computer Graphics, 26(11):3365-3385. https://doi.org/10.1109/TVCG.2019.2921336
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324. https://doi.org/10.1109/5.726791
Liu, P.-r., Lu, L., Zhang, J.-y., Huo, T.-t., Liu, S.-x., and Ye, Z.-w. (2021). Application of artificial intelligence in medicine: an overview. Current Medical Science, 41(6):1105-1115. https://doi.org/10.1007/s11596-021-2474-3
Otter, D. W., Medina, J. R., and Kalita, J. K. (2020). A survey of the usages of deep learning for natural language processing. IEEE transactions on neural networks and learning systems, 32(2):604-624. https://doi.org/10.1109/tnnls.2020.2979670
Ouyang, F. and Jiao, P. (2021). Artificial intelligence in education: The three paradigms. Computers and Education: Artificial Intelligence, 2:100020. https://doi.org/10.1016/j.caeai.2021.100020
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088):533-536. https://doi.org/10.1038/
Russell, S. J. (2010). Artificial intelligence a modern approach. Pearson Education, Inc.
Valenzuela, O., Catala, A., Anguita, D., and Rojas, I. (2023). New advances in artificial neural networks and machine learning techniques. Neural Processing Letters, pages 1-4. https://doi.org/10.1007/s11063-023-11350-w
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. https://doi.org/10.48550/arXiv.1706.03762
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., and Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32. https://doi.org/10.48550/arXiv.1906.08237
Zhang, J. and Man, K.-F. (1998). Time series prediction using rnn in multi-dimension embedding phase space. In SMC’98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No. 98CH36218), volume 2, pages 1868-1873. IEEE. https://doi.org/10.1109/ICSMC.1998.728168
