Once Learning for Looking and Identifying Based on YOLO-v5 Object Detection

  • Lucas Althoff UnB / University of Poitiers
  • Mylène C. Q. Farias UnB
  • Li Weigang UnB


Object detection is an essential capacity of computer vision solutions. It has gained attention over the last years by using a core component of the “Once learning” and “Few-shot learning” mechanism. This research analyzes the ability of a machine learning framework named “You Only Look Once,” to perform object localization task in a “Heuristic once learning” context. It will also study the advantages and practical limitations of YOLO by experimenting with two types of implementation: 1) the simplest one (a.k.a tiny YOLO), and 2) the first version of YOLO. The case studies are carried out in various visual data types and object contexts, such as object deformation caused by fast-forward frame, spatial distortion caused by isometric projection, and gaming images with abnormal objects. Finally, we build a dataset accounting for a new task so-called “Heuristic once learning”. Results using YOLO-v5 in such conditions showed that YOLO had difficulties to generalize simple abstractions of the characters, pointing to the necessity of new approaches to solve such challenges.

Palavras-chave: Few-shot learning, Object Detection, Once learning, YOLO


Rowel Atienza. 2022. Improving Model Generalization by Agreement of Learned Representations from Data Augmentation. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2022), 3927–3936. https://doi.org/10.1109/WACV51458.2022.00398

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901. https://doi.org/doi/abs/10.5555/3495724.3495883

Zhang Deyin, Wei Penghui, Tang Mingwei, Chen Conghan, Wang Li, and Hong Wenxuan. 2020. Investigation of Aircraft Surface Defects Detection Based on YOLO Neural Network. 2020 7th International Conference on Information Science and Control Engineering (ICISCE) (2020), 781–785. https://doi.org/10.1109/ICISCE50968.2020.00165

Li Fei-Fei, Rob Fergus, and Pietro Perona. 2004. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop. IEEE, 178–178. https://doi.org/10.1109/CVPR.2004.383

Hasna Fadhilah Hasya, Hilal Hudan Nuha, and Maman Abdurohman. 2021. Real Time-based Skin Cancer Detection System using Convolutional Neural Network and YOLO. 2021 4th International Conference of Computer and Informatics Engineering (IC2IE) (2021), 152–157. https://doi.org/10.1109/IC2IE53219.2021.9649224

Peiyuan Jiang, Daji Ergu, Fangyao Liu, Ying Cai, and Bo Ma. 2022. A Review of Yolo Algorithm Developments. Procedia Computer Science (2022).

Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. 2015. Human-level concept learning through probabilistic program induction. Science 350, 6266 (2015), 1332–1338.

Shutao Li, Xudong Kang, Leyuan Fang, Jianwen Hu, and Haitao Yin. 2017. Pixel-level image fusion: A survey of the state of the art. Inf. Fusion 33 (2017), 100–112. https://doi.org/10.1016/j.inffus.2016.05.004

Weigang Li and Nilton Correia da Silva. 1999. A study of parallel neural networks. In IJCNN.

Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Chung-Yi Li, and Hung yi Lee. 2020. Looking for Clues of Language in Multilingual BERT to Improve Cross-lingual Generalization.

W. Liu, Dragomir Anguelov, D. Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In ECCV. https://doi.org/10.1007/978-3-319-46448-0_2

Elizabeth F. Loftus, D. Glen Miller, and Herbert J. Burns. 1978. Semantic integration of verbal information into a visual memory. Journal of experimental psychology. Human learning and memory 4 1 (1978), 19–31. https://doi.org/10.1037//0278-7393.4.1.19

Erik G Miller, Nicholas E Matsakis, and Paul A Viola. 2000. Learning from one example through shared densities on transforms. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), Vol. 1. IEEE, 464–471. https://doi.org/10.1109/CVPR.2000.855856

Milad Moradi, Kathrin Blagec, Florian Haberl, and Matthias Samwald. 2021. GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain. ArXiv abs/2109.02555 (2021).

Telmo J. P. Pires, Eva Schlinger, and Dan Garrette. 2019. How Multilingual is Multilingual BERT?. In ACL. https://doi.org/10.18653/v1/P19-1493

Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 779–788.

Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, et al. 2022. A generalist agent. arXiv preprint arXiv:2205.06175 (2022).

Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. 2019. Meta-Transfer Learning for Few-Shot Learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 403–412. https://doi.org/10.1109/CVPR.2019.00049

Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip H. S. Torr, and Timothy M. Hospedales. 2018. Learning to Compare: Relation Network for Few-Shot Learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 1199–1208.

Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using Convolutional Networks. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 648–656. https://doi.org/10.1109/CVPR.2015.7298664

Oriol Vinyals, Charles Blundell, Timothy P. Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching Networks for One Shot Learning. In NIPS. https://doi.org/doi/10.5555/3157382.3157504

Yaqing Wang, Quanming Yao, James T Kwok, and Lionel M Ni. 2020. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur) 53, 3 (2020), 1–34. https://doi.org/10.1145/3386252

Li Weigang. 1998. A Study of Parallel Self-Organizing Map. arXiv: Quantum Physics (1998).

Shijie Wu and Mark Dredze. 2019. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT. In EMNLP. https://doi.org/10.18653/v1/D19-1077

Wenyan Yang, Yanlin Qian, Francesco Cricri, Lixin Fan, and Joni-Kristian Kämäräinen. 2018. Object Detection in Equirectangular Panorama. 2018 24th International Conference on Pattern Recognition (ICPR) (2018), 2190–2195.

Xuanrui Zhang, Xieyang Su, Junbo Yu, Weihong Jiang, Shengchun Wang, Yuan Zhang, Zhiyong Zhang, and Liang Wang. 2021. Combine Object Detection with Skeleton-Based Action Recognition to Detect Smoking Behavior. 2021 The 5th International Conference on Video and Image Processing (2021). https://doi.org/10.1145/3511176.3511194

Yifan Zhang, Xu Li, Feiyue Wang, Baoguo Wei, and Lixin Li. 2021. A Comprehensive Review of One-stage Networks for Object Detection. 2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) (2021), 1–6.

Xingkui Zhu, Shuchang Lyu, Xu Wang, and Qi Zhao. 2021. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (2021), 2778–2788. https://doi.org/10.1109/ICCVW54120.2021.00312
Como Citar

Selecione um Formato
ALTHOFF, Lucas; FARIAS, Mylène C. Q.; WEIGANG, Li. Once Learning for Looking and Identifying Based on YOLO-v5 Object Detection. In: SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 28. , 2022, Curitiba. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 319-325.

Artigos mais lidos do(s) mesmo(s) autor(es)