Real Time Detection of Mobile Graphical User Interface Elements Using Convolutional Neural Networks

  • Richard Hada Degaki UFAM
  • Juan Gabriel Colonna UFAM
  • Yadini Lopez SIDIA
  • José Reginaldo Carvalho UFAM
  • Edson Silva UFAM


In this work, we model the Graphical User Interface (GUI) detection challenge as an object detection problem from Computer Vision (CV) domain. Based on our literature review, we identified some works with similar proposals but suffering from reproducibility and comparability problems. Thus, we propose to mitigate these problems by creating a standardized dataset that can be used for training and evaluating CV algorithms in Mobile GUI. For this purpose, we use Rico’s Android application screen collection and semantic annotation of GUI elements and labelled them using the standard Microsoft COCO format for object detection. Finally, we split the dataset into three main challenges: 1) clickable and non-clickable elements; 2) interface components detection; and 3) icons detection. We trained a baseline algorithm considered state-of-the-art on real-time object detection from the YOLO family. Finally, we present quantitative results for the three proposed challenges.
Palavras-chave: Computer Vision Dataset, Deep Neural Networks, GUI, Object Detection, Android.


Lingfeng Bao, Jing Li, Zhenchang Xing, Xinyu Wang, and Bo Zhou. 2015. ScvRipper: Video Scraping Tool for Modeling Developers’ Behavior Using Interaction Data. Proceedings - International Conference on Software Engineering 2, 673–676.

Chunyang Chen, Sidong Feng, Zhenchang Xing, Linda Liu, Shengdong Zhao, and Jinshui Wang. 2019. Gallery D.C.: Design search and knowledge discovery through auto-created GUI component gallery. Proceedings of the ACM on Human-Computer Interaction 3 (11 2019). Issue CSCW.

Jieshan Chen, Mulong Xie, Zhenchang Xing, Chunyang Chen, Xiwei Xu, Liming Zhu, and Guoqiang Li. 2020. Object detection for graphical user interface: Old fashioned or deep learning or a combination?ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1202–1214.

Richard Degaki. 2022. Experiment Tracking with Weights and Biases on clickable and non-clickable GUIs dataset. Software available from

Richard Degaki. 2022. Experiment Tracking with Weights and Biases on GUI icons dataset. Software available from

Richard Degaki. 2022. Experiment Tracking with Weights and Biases on semantic annotated GUIs dataset. Software available from

Richard Degaki. 2022. Rico2coco.

Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A mobile app dataset for building data-driven design applications. UIST 2017 - Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, 845–854.

Morgan Dixon and James Fogarty. 2010. Prefab: Implementing Advanced Behaviors Using Pixel-Based Reverse Engineering of Interface Structure Prefab, user interface toolkits, pixel-based reverse engineering. (2010).

Glenn Jocher, Alex Stoken, Ayush Chaurasia, Jirka Borovec, NanoCode012, TaoXie, Yonghye Kwon, Kalen Michael, Liu Changyu, Jiacong Fang, Abhiram V, Laughing, tkianai, yxNONG, Piotr Skalski, Adam Hogan, Jebastin Nadar, imyhxy, Lorenzo Mammana, AlexWang1900, Cristi Fati, Diego Montes, Jan Hajek, Laurentiu Diaconu, Mai Thanh Minh, Marc, albinxavi, fatih, oleg, and wanghaoyang0106. 2021. ultralytics/yolov5: v6.0 - YOLOv5n ’Nano’ models, Roboflow integration, TensorFlow export, OpenCV DNN support. (10 2021).

Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, and Jason Baldridge. 2020. Mapping Natural Language Instructions to Mobile UI Action Sequences. Annual Conference of the Association for Computational Linguistics (ACL 2020).

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. 2014. Microsoft COCO: Common Objects in Context. (5 2014).

Kevin Moran, Carlos Bernal-C ´ Ardenas, Michael Curcio, Richard Bonett, and Denys Poshyvanyk. 2018. Machine Learning-Based Prototyping of Graphical User Interfaces for Mobile Apps. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING(2018), 1.

Tuan Anh Nguyen and Christoph Csallner. 2015. Reverse Engineering Mobile Application User Interfaces With REMAUI.

Ju Qian, Zhengyu Shang, Shuoyan Yan, Yan Wang, and Lin Chen. 2020. RoScript: A visual script driven truly non-intrusive robotic testing system for touch screen applications. Proceedings - International Conference on Software Engineering, 297–308.

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. (2016).

Alibaba Tech. 2019. UI2code: How to Fine-tune Background and Foreground Analysis | LaptrinhX. [link].

Daniel Toyama, Philippe Hamel, Anita Gergely, Gheorghe Comanici, Amelia Glaese, Zafarali Ahmed, Tyler Jackson, Shibl Mourad, and Doina Precup. 2021. AndroidEnv: A Reinforcement Learning Platform for Android. (5 2021).

Thomas D White, Gordon Fraser, and Guy J Brown. 2019. Improving Ran-dom GUI Testing with Image-Based Widget Detection. (2019), 11–19.

Tom Yeh, Tsung-Hsiang Chang, and Robert C Miller. 2009. Sikuli: Using GUI Screenshots for Search and Automation. (2009).
DEGAKI, Richard Hada; COLONNA, Juan Gabriel; LOPEZ, Yadini; CARVALHO, José Reginaldo; SILVA, Edson. Real Time Detection of Mobile Graphical User Interface Elements Using Convolutional Neural Networks. In: SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 28. , 2022, Curitiba. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 169-177.