Handwritten Text Entry in Virtual Reality Using Gesture Recognition and Word Prediction

João Silva; Kaique Carvalho; Luciana Cardoso; Juliana Félix; Fabrizzio Soares; Thamer Horbylon Nascimento

doi:10.5753/svr_estendido.2025.15771

João Silva IF Goiano
Kaique Carvalho IF Goiano
Luciana Cardoso IF Goiano
Juliana Félix UFG / PUC Goiás
Fabrizzio Soares UFG
Thamer Horbylon Nascimento IF Goiano / UFG

DOI: https://doi.org/10.5753/svr_estendido.2025.15771

Resumo

This work presents an approach for text entry in virtual reality (VR) environments, using handwritten letters drawn in the air as a form of natural interaction. The system was developed for the Meta Quest 2 device and is composed of different integrated modules: real-time capture of gestures performed by the user, character recognition using a convolutional neural network (CNN) trained with the EMNIST dataset, and the construction of words through a Trie structure, which enables efficient term search based on recognized letters. Furthermore, the final selection of words is performed based on their usage frequency, which allows prioritizing more probable terms within a common linguistic context. The method allows for complete word input in VR, with consistent performance in the identification of individual letters and automatic suggestion generation, demonstrating that it can provide a fluid, intuitive experience compatible with the immersive interaction proposal in three-dimensional environments.

Referências

Blanco Junior, M. (2022). Reconhecimento de placas de veículos utilizando redes neurais artificiais. Available at: [link].

Boletsis, C. and Kongsvik, S. (2019a). Controller-based text-input techniques for virtual reality: An empirical comparison. International Journal of Virtual Reality, 19(3):2–15. DOI: 10.20870/IJVR.2019.19.3.2917.

Boletsis, C. and Kongsvik, S. (2019b). Text input in virtual reality: A preliminary evaluation of the drum-like vr keyboard. Technologies, 7(2). DOI: 10.3390/technologies7020031.

Carvalho, J. V. d. (2000). Reconhecimento de caracteres manuscritos utilizando regras de associação. Technical report, Universidade Federal de Campina Grande. Available at: [link].

Chen, Z., Yang, D., Liang, J., Liu, X., Wang, Y., Peng, Z., and Huang, S. (2022). Complex handwriting trajectory recovery: Evaluation metrics and algorithm. In Asian Conference on Computer Vision (ACCV) 2022, Lecture Notes in Computer Science, vol. 13517, pages 58–74. DOI: 10.1007/978-3-031-26284-54.

Cohen, G., Afshar, S., Tapson, J., and van Schaik, A. (2017). Emnist: Extending mnist to handwritten letters. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), pages 2921–2926. IEEE. DOI: 10.1109/IJCNN.2017.7966217.

Dudley, J. J., Karlson, A., Todi, K., Benko, H., Longest, M., Wang, R., and Kristensson, P. O. (2024). Efficient mid-air text input correction in virtual reality. In 2024 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE. DOI: 10.1109/ISMAR62088.2024.00105.

Dudley, J. J., Zheng, J., Gupta, A., Benko, H., Longest, M., Wang, R., and Kristensson, P. O. (2023). Evaluating the performance of hand-based probabilistic text input methods on a mid-air virtual qwerty keyboard. IEEE Transactions on Visualization and Computer Graphics. DOI: 10.1109/TVCG.2023.3320238.

Elmgren, R. (2017). Handwriting in vr as a text input method. Master’s thesis, KTH Royal Institute of Technology. Available at: [link].

Gugenheimer, J., Dobbelstein, D., Winkler, C., Haas, G., and Rukzio, E. (2016). Face-touch: Enabling touch interaction in display fixed uis for mobile virtual reality. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology, UIST ’16, pages 49–60, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/2984511.2984576.

Kaipainen, M., Ravaja, N., Tikka, P., Vuori, R., Pugliese, R., Rapino, M., and Takala, T. (2011). Enactive systems and enactive media: Embodied human-machine coupling beyond interfaces. Leonardo, 44(5):433–438. DOI: 10.1162/LEON_a_00244.

Katona, J. (2021). A review of human–computer interaction and virtual reality research fields in cognitive infocommunications. Applied Sciences, 11(6). DOI: 10.3390/app11062646.

Kumar, P., Chaudhary, A., and Sharma, A. (2022). A cnn based air-writing recognition framework for multilinguistic characters and digits. SN Computer Science, 3:453. DOI: 10.1007/s42979-022-01362-z.

Lu, D., Huang, D., and Rai, A. (2019). Fmhash: Deep hashing of in-air-handwriting for user identification. In ICC 2019 - 2019 IEEE International Conference on Communications (ICC), pages 1–7. DOI: 10.1109/ICC.2019.8761508.

Mimura, H., Ito, M., ichi Ito, S., and Fukumi, M. (2021). Personal authentication and recognition of aerial input hiragana using deep neural network. In Komuro, T. and Shimizu, T., editors, Fifteenth International Conference on Quality Control by Artificial Vision, volume 11794, page 1179411. International Society for Optics and Photonics, SPIE. DOI: 10.1117/12.2585333.

Monobe, K. and Ohishi, M. (2025). Research for japanese input method using flick in virtual reality. In 2025 Asia Conference on Algorithms, Computing and Machine Learning (CACML). IEEE. DOI: 10.1109/CACML64929.2025.11010978.

Nascimento, T. H., Felix, J. P., Santos Silva, J. L., and Soares, F. (2023). Text entry on smartwatches using continuous gesture recognition and word dictionary. In International Conference on Human-Computer Interaction, pages 550–562. Springer. DOI: 10.1007/978-3-031-35596-735.

Nascimento, T. H., Nunes Soares, F. A. A. M., Vieira Oliveira, D., Lopes Salvini, R., Martins da Costa, R., and Gonçalves, C. (2017a). Method for text input with google cardboard: An approach using smartwatches and continuous gesture recognition. In 2017 19th Symposium on Virtual and Augmented Reality (SVR), pages 223–226. DOI: 10.1109/SVR.2017.36.

Nascimento, T. H., Soares, F. A. A. M. N., Irani, P. P., Galdino de Oliveira, L. L., and Da Silva Soares, A. (2017b). Method for text entry in smartwatches using continuous gesture recognition. In 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), volume 2, pages 549–554. DOI: 10.1109/COMP-SAC.2017.168.

Shen, J., Boldu, R., Kalla, A., Glueck, M., Surale, H. B., and Karlson, A. (2024). Ringgesture: A ring-based mid-air gesture typing system powered by a deep-learning word prediction framework. IEEE Transactions on Visualization and Computer Graphics, 37(4):Article 111. DOI: 10.1109/TVCG.2024.3456179.

Tsuchida, K., Miyao, H., and Maruyama, M. (2015). Handwritten character recognition in the air by using leap motion controller. In HCI International 2015 – Posters’ Extended Abstracts, pages 534–538. Springer. DOI: 10.1007/978-3-319-21380-491.

Varela, F. J., Thompson, E., and Rosch, E. (1992). The Embodied Mind: Cognitive Science and Human Experience. MIT Press. DOI: 10.7551/mitpress/6730.001.0001.

Vertanen, K. and Kristensson, P. O. (2011). A versatile dataset for text entry evaluations based on genuine mobile emails. In Proceedings of the 13th International Conference on Human Computer Interaction with Mobile Devices and Services, MobileHCI ’11, pages 295–298, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/2037373.2037418.

Wang, Y., Wang, Y., Chen, J., Wang, Y., Yang, J., Jiang, T., and He, J. (2021). Investigating the performance of gesture-based input for mid-air text entry in a virtual environment: A comparison of hand-up versus hand-down postures. Sensors, 21(5). DOI: 10.3390/s21051582.

Weiser, M. (1991). The computer for the 21st century. Scientific American, 265(3):94–105. Available at: [link].