Real Time Detection of Mobile Graphical User Interface Elements Using Convolutional Neural Networks

Richard Hada Degaki; Juan Gabriel Colonna; Yadini Lopez; José Reginaldo Carvalho; Edson Silva

Richard Hada Degaki UFAM
Juan Gabriel Colonna UFAM
Yadini Lopez SIDIA
José Reginaldo Carvalho UFAM
Edson Silva UFAM

Resumo

In this work, we model the Graphical User Interface (GUI) detection challenge as an object detection problem from Computer Vision (CV) domain. Based on our literature review, we identified some works with similar proposals but suffering from reproducibility and comparability problems. Thus, we propose to mitigate these problems by creating a standardized dataset that can be used for training and evaluating CV algorithms in Mobile GUI. For this purpose, we use Rico’s Android application screen collection and semantic annotation of GUI elements and labelled them using the standard Microsoft COCO format for object detection. Finally, we split the dataset into three main challenges: 1) clickable and non-clickable elements; 2) interface components detection; and 3) icons detection. We trained a baseline algorithm considered state-of-the-art on real-time object detection from the YOLO family. Finally, we present quantitative results for the three proposed challenges.

Palavras-chave: Computer Vision Dataset, Deep Neural Networks, GUI, Object Detection, Android.

Referências

Lingfeng Bao, Jing Li, Zhenchang Xing, Xinyu Wang, and Bo Zhou. 2015. ScvRipper: Video Scraping Tool for Modeling Developers’ Behavior Using Interaction Data. Proceedings - International Conference on Software Engineering 2, 673–676. https://doi.org/10.1109/ICSE.2015.220

Chunyang Chen, Sidong Feng, Zhenchang Xing, Linda Liu, Shengdong Zhao, and Jinshui Wang. 2019. Gallery D.C.: Design search and knowledge discovery through auto-created GUI component gallery. Proceedings of the ACM on Human-Computer Interaction 3 (11 2019). Issue CSCW. https://doi.org/10.1145/3359282

Jieshan Chen, Mulong Xie, Zhenchang Xing, Chunyang Chen, Xiwei Xu, Liming Zhu, and Guoqiang Li. 2020. Object detection for graphical user interface: Old fashioned or deep learning or a combination?ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1202–1214. https://doi.org/10.1145/3368089.3409691

Richard Degaki. 2022. Experiment Tracking with Weights and Biases on clickable and non-clickable GUIs dataset. Software available from wandb.com. https://wandb.ai/rhd/ricoco

Richard Degaki. 2022. Experiment Tracking with Weights and Biases on GUI icons dataset. Software available from wandb.com. https://wandb.ai/rhd/ricoco_icon_legend

Richard Degaki. 2022. Experiment Tracking with Weights and Biases on semantic annotated GUIs dataset. Software available from wandb.com. https://wandb.ai/rhd/rico2coco_clickable

Richard Degaki. 2022. Rico2coco. https://github.com/xrhd/rico2coco

Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A mobile app dataset for building data-driven design applications. UIST 2017 - Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, 845–854. https://doi.org/10.1145/3126594.3126651

Morgan Dixon and James Fogarty. 2010. Prefab: Implementing Advanced Behaviors Using Pixel-Based Reverse Engineering of Interface Structure Prefab, user interface toolkits, pixel-based reverse engineering. (2010).

Glenn Jocher, Alex Stoken, Ayush Chaurasia, Jirka Borovec, NanoCode012, TaoXie, Yonghye Kwon, Kalen Michael, Liu Changyu, Jiacong Fang, Abhiram V, Laughing, tkianai, yxNONG, Piotr Skalski, Adam Hogan, Jebastin Nadar, imyhxy, Lorenzo Mammana, AlexWang1900, Cristi Fati, Diego Montes, Jan Hajek, Laurentiu Diaconu, Mai Thanh Minh, Marc, albinxavi, fatih, oleg, and wanghaoyang0106. 2021. ultralytics/yolov5: v6.0 - YOLOv5n ’Nano’ models, Roboflow integration, TensorFlow export, OpenCV DNN support. (10 2021). https://doi.org/10.5281/ZENODO.5563715

Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, and Jason Baldridge. 2020. Mapping Natural Language Instructions to Mobile UI Action Sequences. Annual Conference of the Association for Computational Linguistics (ACL 2020). https://www.aclweb.org/anthology/2020.acl-main.729.pdf

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. 2014. Microsoft COCO: Common Objects in Context. (5 2014). http://arxiv.org/abs/1405.0312

Kevin Moran, Carlos Bernal-C ´ Ardenas, Michael Curcio, Richard Bonett, and Denys Poshyvanyk. 2018. Machine Learning-Based Prototyping of Graphical User Interfaces for Mobile Apps. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING(2018), 1.

Tuan Anh Nguyen and Christoph Csallner. 2015. Reverse Engineering Mobile Application User Interfaces With REMAUI. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7372013

Ju Qian, Zhengyu Shang, Shuoyan Yan, Yan Wang, and Lin Chen. 2020. RoScript: A visual script driven truly non-intrusive robotic testing system for touch screen applications. Proceedings - International Conference on Software Engineering, 297–308. https://doi.org/10.1145/3377811.3380431

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. (2016). http://pjreddie.com/yolo/

Alibaba Tech. 2019. UI2code: How to Fine-tune Background and Foreground Analysis | LaptrinhX. [link].

Daniel Toyama, Philippe Hamel, Anita Gergely, Gheorghe Comanici, Amelia Glaese, Zafarali Ahmed, Tyler Jackson, Shibl Mourad, and Doina Precup. 2021. AndroidEnv: A Reinforcement Learning Platform for Android. (5 2021). http://arxiv.org/abs/2105.13231

Thomas D White, Gordon Fraser, and Guy J Brown. 2019. Improving Ran-dom GUI Testing with Image-Based Widget Detection. (2019), 11–19. https://doi.org/10.1145/3293882.3330551

Tom Yeh, Tsung-Hsiang Chang, and Robert C Miller. 2009. Sikuli: Using GUI Screenshots for Search and Automation. (2009). http://rubyonrails.org/

Real Time Detection of Mobile Graphical User Interface Elements Using Convolutional Neural Networks

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)