Real Time Detection of Mobile Graphical User Interface Elements Using Convolutional Neural Networks
Resumo
In this work, we model the Graphical User Interface (GUI) detection challenge as an object detection problem from Computer Vision (CV) domain. Based on our literature review, we identified some works with similar proposals but suffering from reproducibility and comparability problems. Thus, we propose to mitigate these problems by creating a standardized dataset that can be used for training and evaluating CV algorithms in Mobile GUI. For this purpose, we use Rico’s Android application screen collection and semantic annotation of GUI elements and labelled them using the standard Microsoft COCO format for object detection. Finally, we split the dataset into three main challenges: 1) clickable and non-clickable elements; 2) interface components detection; and 3) icons detection. We trained a baseline algorithm considered state-of-the-art on real-time object detection from the YOLO family. Finally, we present quantitative results for the three proposed challenges.
Palavras-chave:
Computer Vision Dataset, Deep Neural Networks, GUI, Object Detection, Android.
Referências
Lingfeng Bao, Jing Li, Zhenchang Xing, Xinyu Wang, and Bo Zhou. 2015. ScvRipper: Video Scraping Tool for Modeling Developers’ Behavior Using Interaction Data. Proceedings - International Conference on Software Engineering 2, 673–676. https://doi.org/10.1109/ICSE.2015.220
Chunyang Chen, Sidong Feng, Zhenchang Xing, Linda Liu, Shengdong Zhao, and Jinshui Wang. 2019. Gallery D.C.: Design search and knowledge discovery through auto-created GUI component gallery. Proceedings of the ACM on Human-Computer Interaction 3 (11 2019). Issue CSCW. https://doi.org/10.1145/3359282
Jieshan Chen, Mulong Xie, Zhenchang Xing, Chunyang Chen, Xiwei Xu, Liming Zhu, and Guoqiang Li. 2020. Object detection for graphical user interface: Old fashioned or deep learning or a combination?ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1202–1214. https://doi.org/10.1145/3368089.3409691
Richard Degaki. 2022. Experiment Tracking with Weights and Biases on clickable and non-clickable GUIs dataset. Software available from wandb.com. https://wandb.ai/rhd/ricoco
Richard Degaki. 2022. Experiment Tracking with Weights and Biases on GUI icons dataset. Software available from wandb.com. https://wandb.ai/rhd/ricoco_icon_legend
Richard Degaki. 2022. Experiment Tracking with Weights and Biases on semantic annotated GUIs dataset. Software available from wandb.com. https://wandb.ai/rhd/rico2coco_clickable
Richard Degaki. 2022. Rico2coco. https://github.com/xrhd/rico2coco
Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A mobile app dataset for building data-driven design applications. UIST 2017 - Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, 845–854. https://doi.org/10.1145/3126594.3126651
Morgan Dixon and James Fogarty. 2010. Prefab: Implementing Advanced Behaviors Using Pixel-Based Reverse Engineering of Interface Structure Prefab, user interface toolkits, pixel-based reverse engineering. (2010).
Glenn Jocher, Alex Stoken, Ayush Chaurasia, Jirka Borovec, NanoCode012, TaoXie, Yonghye Kwon, Kalen Michael, Liu Changyu, Jiacong Fang, Abhiram V, Laughing, tkianai, yxNONG, Piotr Skalski, Adam Hogan, Jebastin Nadar, imyhxy, Lorenzo Mammana, AlexWang1900, Cristi Fati, Diego Montes, Jan Hajek, Laurentiu Diaconu, Mai Thanh Minh, Marc, albinxavi, fatih, oleg, and wanghaoyang0106. 2021. ultralytics/yolov5: v6.0 - YOLOv5n ’Nano’ models, Roboflow integration, TensorFlow export, OpenCV DNN support. (10 2021). https://doi.org/10.5281/ZENODO.5563715
Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, and Jason Baldridge. 2020. Mapping Natural Language Instructions to Mobile UI Action Sequences. Annual Conference of the Association for Computational Linguistics (ACL 2020). https://www.aclweb.org/anthology/2020.acl-main.729.pdf
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. 2014. Microsoft COCO: Common Objects in Context. (5 2014). http://arxiv.org/abs/1405.0312
Kevin Moran, Carlos Bernal-C ´ Ardenas, Michael Curcio, Richard Bonett, and Denys Poshyvanyk. 2018. Machine Learning-Based Prototyping of Graphical User Interfaces for Mobile Apps. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING(2018), 1.
Tuan Anh Nguyen and Christoph Csallner. 2015. Reverse Engineering Mobile Application User Interfaces With REMAUI. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7372013
Ju Qian, Zhengyu Shang, Shuoyan Yan, Yan Wang, and Lin Chen. 2020. RoScript: A visual script driven truly non-intrusive robotic testing system for touch screen applications. Proceedings - International Conference on Software Engineering, 297–308. https://doi.org/10.1145/3377811.3380431
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. (2016). http://pjreddie.com/yolo/
Alibaba Tech. 2019. UI2code: How to Fine-tune Background and Foreground Analysis | LaptrinhX. [link].
Daniel Toyama, Philippe Hamel, Anita Gergely, Gheorghe Comanici, Amelia Glaese, Zafarali Ahmed, Tyler Jackson, Shibl Mourad, and Doina Precup. 2021. AndroidEnv: A Reinforcement Learning Platform for Android. (5 2021). http://arxiv.org/abs/2105.13231
Thomas D White, Gordon Fraser, and Guy J Brown. 2019. Improving Ran-dom GUI Testing with Image-Based Widget Detection. (2019), 11–19. https://doi.org/10.1145/3293882.3330551
Tom Yeh, Tsung-Hsiang Chang, and Robert C Miller. 2009. Sikuli: Using GUI Screenshots for Search and Automation. (2009). http://rubyonrails.org/
Chunyang Chen, Sidong Feng, Zhenchang Xing, Linda Liu, Shengdong Zhao, and Jinshui Wang. 2019. Gallery D.C.: Design search and knowledge discovery through auto-created GUI component gallery. Proceedings of the ACM on Human-Computer Interaction 3 (11 2019). Issue CSCW. https://doi.org/10.1145/3359282
Jieshan Chen, Mulong Xie, Zhenchang Xing, Chunyang Chen, Xiwei Xu, Liming Zhu, and Guoqiang Li. 2020. Object detection for graphical user interface: Old fashioned or deep learning or a combination?ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1202–1214. https://doi.org/10.1145/3368089.3409691
Richard Degaki. 2022. Experiment Tracking with Weights and Biases on clickable and non-clickable GUIs dataset. Software available from wandb.com. https://wandb.ai/rhd/ricoco
Richard Degaki. 2022. Experiment Tracking with Weights and Biases on GUI icons dataset. Software available from wandb.com. https://wandb.ai/rhd/ricoco_icon_legend
Richard Degaki. 2022. Experiment Tracking with Weights and Biases on semantic annotated GUIs dataset. Software available from wandb.com. https://wandb.ai/rhd/rico2coco_clickable
Richard Degaki. 2022. Rico2coco. https://github.com/xrhd/rico2coco
Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A mobile app dataset for building data-driven design applications. UIST 2017 - Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, 845–854. https://doi.org/10.1145/3126594.3126651
Morgan Dixon and James Fogarty. 2010. Prefab: Implementing Advanced Behaviors Using Pixel-Based Reverse Engineering of Interface Structure Prefab, user interface toolkits, pixel-based reverse engineering. (2010).
Glenn Jocher, Alex Stoken, Ayush Chaurasia, Jirka Borovec, NanoCode012, TaoXie, Yonghye Kwon, Kalen Michael, Liu Changyu, Jiacong Fang, Abhiram V, Laughing, tkianai, yxNONG, Piotr Skalski, Adam Hogan, Jebastin Nadar, imyhxy, Lorenzo Mammana, AlexWang1900, Cristi Fati, Diego Montes, Jan Hajek, Laurentiu Diaconu, Mai Thanh Minh, Marc, albinxavi, fatih, oleg, and wanghaoyang0106. 2021. ultralytics/yolov5: v6.0 - YOLOv5n ’Nano’ models, Roboflow integration, TensorFlow export, OpenCV DNN support. (10 2021). https://doi.org/10.5281/ZENODO.5563715
Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, and Jason Baldridge. 2020. Mapping Natural Language Instructions to Mobile UI Action Sequences. Annual Conference of the Association for Computational Linguistics (ACL 2020). https://www.aclweb.org/anthology/2020.acl-main.729.pdf
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. 2014. Microsoft COCO: Common Objects in Context. (5 2014). http://arxiv.org/abs/1405.0312
Kevin Moran, Carlos Bernal-C ´ Ardenas, Michael Curcio, Richard Bonett, and Denys Poshyvanyk. 2018. Machine Learning-Based Prototyping of Graphical User Interfaces for Mobile Apps. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING(2018), 1.
Tuan Anh Nguyen and Christoph Csallner. 2015. Reverse Engineering Mobile Application User Interfaces With REMAUI. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7372013
Ju Qian, Zhengyu Shang, Shuoyan Yan, Yan Wang, and Lin Chen. 2020. RoScript: A visual script driven truly non-intrusive robotic testing system for touch screen applications. Proceedings - International Conference on Software Engineering, 297–308. https://doi.org/10.1145/3377811.3380431
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. (2016). http://pjreddie.com/yolo/
Alibaba Tech. 2019. UI2code: How to Fine-tune Background and Foreground Analysis | LaptrinhX. [link].
Daniel Toyama, Philippe Hamel, Anita Gergely, Gheorghe Comanici, Amelia Glaese, Zafarali Ahmed, Tyler Jackson, Shibl Mourad, and Doina Precup. 2021. AndroidEnv: A Reinforcement Learning Platform for Android. (5 2021). http://arxiv.org/abs/2105.13231
Thomas D White, Gordon Fraser, and Guy J Brown. 2019. Improving Ran-dom GUI Testing with Image-Based Widget Detection. (2019), 11–19. https://doi.org/10.1145/3293882.3330551
Tom Yeh, Tsung-Hsiang Chang, and Robert C Miller. 2009. Sikuli: Using GUI Screenshots for Search and Automation. (2009). http://rubyonrails.org/
Publicado
07/11/2022
Como Citar
DEGAKI, Richard Hada; COLONNA, Juan Gabriel; LOPEZ, Yadini; CARVALHO, José Reginaldo; SILVA, Edson.
Real Time Detection of Mobile Graphical User Interface Elements Using Convolutional Neural Networks. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 28. , 2022, Curitiba.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2022
.
p. 169-177.