Yin Yang Convolutional Nets: Image Manifold Extraction by the Analysis of Opposites

Augusto Seben da Rosa; Frederico Santos de Oliveira; Anderson da Silva Soares; Arnaldo Candido Junior

doi:10.5753/latinoware.2024.245312

Augusto Seben da Rosa UTFPR http://orcid.org/0009-0001-6773-2674
Frederico Santos de Oliveira UFMT https://orcid.org/0000-0002-5885-6747
Anderson da Silva Soares UFG http://orcid.org/0000-0002-2967-6077
Arnaldo Candido Junior UNESP http://orcid.org/0000-0002-5647-0891

DOI: https://doi.org/10.5753/latinoware.2024.245312

Resumo

Computer vision in general presented several ad- vances such as training optimizations, new architectures (pure attention, efficient block, vision language models, generative mod- els, among others). This have improved performance in several tasks such as classification, and others. However, the majority of these models focus on modifications that are taking distance from realistic neuroscientific approaches related to the brain. In this work, we adopt a more bio-inspired approach and present the Yin Yang Convolutional Network, an architecture that extracts visual manifold, its blocks are intended to separate analysis of colors and forms at its initial layers, simulating occipital lobe’s operations. Our results shows that our architecture provides State- of-the-Art efficiency among low parameter architectures in the dataset CIFAR-10. Our first model reached 93.32% test accuracy, 0.8% more than the older SOTA in this category, while having 150k less parameters (726k in total). Our second model uses 52k parameters, losing only 3.86% test accuracy. We also performed an analysis on ImageNet, where we reached 66.49% validation accuracy with 1.6M parameters. We make the code publicly available at: https://github.com/NoSavedDATA/YinYang CNN.

Palavras-chave: Convolutional Neural Networks, Bio-inspired Neural Networks

Referências

A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, “Dive into deep learning,” 2023.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advancesin neural information processing systems, vol. 25, 2012. [Online]. Available: [link] c399862d3b9d6b76c8436e924a68c45b-Abstract.html

A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan et al., “Searching for mobilenetv3,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1314–1324.

M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” in International conference on machine learning. PMLR, 2021, pp. 10 096–10 106.

I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollar, “Designing network design spaces,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.

C. Yu, C. Gao, J. Wang, G. Yu, C. Shen, and N. Sang, “Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation,” International Journal of Computer Vision, vol. 129, pp. 3051–3068, 2021.

H. Pan, Y. Hong, W. Sun, and Y. Jia, “Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes,” IEEE Transactions on Intelligent Transportation Systems, 2022.

Q. Wan, Z. Huang, J. Lu, G. Yu, and L. Zhang, “Seaformer: Squeeze-enhanced axial transformer for mobile semantic segmentation,” 2023.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for Image recognition at scale,” 2021.

Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, and Y. Li, “Maxvit: Multi-axis vision transformer,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIV. Springer, 2022, pp. 459–479.

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10 684–10 695.

A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” 2022.

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.

K. Roy, A. Jaiswal, and P. Panda, “Towards spike-based machine intelligence with neuromorphic computing,” Nature, vol. 575, no. 7784, pp. 607–617, 2019.

W. Li, J. Zhao, L. Su, N. Jiang, and Q. Hu, “Spiking neural networks for object detection based on integrating neuronal variants and self-attention mechanisms,” Applied Sciences, vol. 14, no. 20, p. 9607, 2024.

J. H. Elder and A. J. Sachs, “Psychophysical receptive fields of edge detection mechanisms,” Vision Research, vol. 44, no. 8, pp. 795–813, 2004. [Online]. Available: [link]

M. M. Bannert and A. Bartels, “Human v4 activity patterns predict behavioral performance in imagery of object color,” Journal of Neuroscience, vol. 38, no. 15, pp. 3657–3668, 2018.

D. Beatty, “Visual pigments and the labile scotopic visual system of fish,” Vision research, vol. 24, no. 11, pp. 1563–1573, 1984.

B. R. Conway, “Color vision, cones, and color-coding in the cortex,” The Neuroscientist, vol. 15, no. 3, pp. 274–290, 2009.

G. Hinton, “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv preprint arXiv:1207.0580, 2012.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint ArXiv:1704.04861, 2017.

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning. PMLR, 2019, pp. 6105–6114.

C. Zhou, H. Zhang, L. Yu, Y. Ye, Z. Zhou, L. Huang, Z. Ma, X. Fan, H. Zhou, and Y. Tian, “Direct training high-performance deep spiking neural networks: a review of theories and methods,” Frontiers in Neuro-science, vol. 18, p. 1383844, 2024.

M. Shoeybi, M. Patwary, R. Puri, P. LeGresley, J. Casper, and B. Catan- zaro, “Megatron-lm: Training multi-billion parameter language models using model parallelism,” arXiv preprint arXiv:1909.08053, 2019.

Z. Zhang, A. Zhang, M. Li, H. Zhao, G. Karypis, and A. Smola, “Multimodal chain-of-thought reasoning in language models,” 2023.

D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in International Conference on Learning Representations, 2019. [Online].Available: [link]

M. Schwarzer, J. S. O. Ceron, A. Courville, M. G. Bellemare, R. Agarwal, and P. S. Castro, “Bigger, better, faster: Human-level atari with human-level efficiency,” in International Conference on Machine Learning. PMLR, 2023, pp. 30 365–30 380. [Online]. Available: [link]

S.-Y. Zhou and C.-Y. Su, “A novel lightweight convolutional neural network, exquisitenetv2,” 2022.

J. P. Schwarz Schuler, S. Roman´ı, M. Abdel-nasser, H. Rashwan, and D. Puig, “Grouped pointwise convolutions reduce parameters in convolutional neural networks,” Mendel, vol. 28, pp. 23–31, 06 2022.