Efficient Deep Learning for Image Classification: Lighter Preprocessing and Fewer Parameters
Resumo
Convolutional neural networks have achieved state-of-the-art performance in several computer vision tasks recently, learning high-level representations directly from RGB images. However, using deeper architectures has led to high computational costs, hindering deployment on devices with limited resources. Additionally, models are usually specialized in a single domain/task while an increasing amount of real-world applications need to deal with multiple domains simultaneously. The computational cost of storing and running multiple instances of those costly models can limit their utilization even more. This Ph.D. thesis aims to reduce the computational burden of deep learning, focusing on two main aspects: reducing data preprocessing cost and sharing parameters across multiple domains/tasks. These contributions have led to the creation of efficient models with high classification performance and reduced costs, allowing them to be deployed in a wider array of devices.
Referências
G. Habib and S. Qureshi, “Optimization and acceleration of convolutional neural networks: A survey,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 7, pp. 4244–4268, 2022.
B. Deguerre, C. Chatelain, and G. Gasso, “Fast object detection in compressed JPEG images,” in IEEE Intelligent Transportation Systems Conference (ITSC’19), 2019, pp. 333–338.
Y. Li, S. Gu, L. V. Gool, and R. Timofte, “Learning filter basis for convolutional neural network compression,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 5623–5632.
M. Ehrlich and L. S. Davis, “Deep residual learning in the JPEG transform domain,” in IEEE International Conference on Computer Vision (ICCV’19), 2019, pp. 3484–3493.
A. Marchisio, M. A. Hanif, F. Khalid, G. Plastiras, C. Kyrkou, T. Theocharides, and M. Shafique, “Deep learning for edge computing: Current trends, cross-layer optimizations, and open research challenges,” in IEEE Computer Society Annual Symposium on VLSI (ISVLS’19), 2019, pp. 553–559.
S.-A. Rebuffi, H. Bilen, and A. Vedaldi, “Learning multiple visual domains with residual adapters,” in Advances in Neural Information Processing Systems, 2017, pp. 506–516.
R. Berriel, S. Lathuillere, M. Nabi, T. Klein, T. Oliveira-Santos, N. Sebe, and E. Ricci, “Budget-aware adapters for multi-domain learning,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 382–391.
Y. Du, Z. Chen, C. Jia, X. Li, and Y.-G. Jiang, “Bag of tricks for building an accurate and slim object detector for embedded applications,” in International Conference on Multimedia Retrieval (ICMR’21), 2021, pp. 519–525.
B. Deguerre, C. Chatelain, and G. Gasso, “Object detection in the DCT domain: is luminance the solution?” in IEEE Int. Conf. on Pattern Recog. (ICPR’20), 2021, pp. 2627–2634.
M. Ehrlich, L. Davis, S.-N. Lim, and A. Shrivastava, “Analyzing and mitigating jpeg compression defects in deep learning,” in IEEE/CVF Int. Conf. on Comput. Vis. Workshops (ICCVW’21), 2021, pp. 2357–2367.
X. Wang, Z. Zhou, Z. Yuan, J. Zhu, G. Sun, Y. Cao, Y. Zhang, and K. Sun, “Fd-cnn: A frequency-domain fpga acceleration scheme for cnn-based image processing applications,” ACM Trans. on Embedded Comput. Syst. (TECS), 2022.
L. Gueguen, A. Sergeev, B. Kadlec, R. Liu, and J. Yosinski, “Faster neural networks straight from JPEG,” in Annual Conference on Neural Information Processing Systems (NIPS’18), 2018, pp. 3937–3948.
S.-Y. Lo and H.-M. Hang, “Exploring semantic segmentation on the DCT representation,” in Proceedings of the ACM Multimedia Asia, 2019, pp. 1–6.
K. Xu, M. Qin, F. Sun, Y. Wang, Y.-K. Chen, and F. Ren, “Learning in the Frequency Domain,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1740–1749. [Online]. Available: [link]
M. Ehrlich, L. Davis, S.-N. Lim, and A. Shrivastava, “Quantization Guided JPEG Artifact Correction,” in Proceedings of the European Conference on Computer Vision. Springer, 2020.
S. F. Santos, N. Sebe, and J. Almeida, “CV-C3D: action recognition on compressed videos with convolutional 3d networks,” in SIBGRAPI – Conference on Graphics, Patterns and Images (SIBGRAPI’19), 2019, pp. 24–30.
S. F. Santos and J. Almeida, “Faster and accurate compressed video action recognition straight from the frequency domain,” in Conference on Graphics, Patterns and Images (SIBGRAPI’20), 2020, pp. 1–7.
B. Rajesh, M. Javed, S. Srivastava et al., “DCT-CompCNN: A novel image classification network using JPEG compressed DCT coefficients,” in IEEE Conf. on Information and Communication Technol. (CICT’19), 2019, pp. 1–6.
Y. Tang, X. Zhang, X. Hu, S. Wang, and H. Wang, “Facial expression recognition using frequency neural network,” IEEE Trans. on Image Process. (IEEE TIP), vol. 30, pp. 444–457, 2020.
Y. He, W. Chen, Z. Liang, D. Chen, Y. Tan, X. Luo, C. Li, and Y. Guo, “Fast and accurate lane detection via frequency domain learning,” in ACM Int. Conf. on Multimedia (ACM-MM’21), 2021, pp. 890–898.
A. Deshpande, V. V. Estrela, and P. Patavardhan, “The dct-cnn-resnet50 architecture to classify brain tumors with super-resolution, convolutional neural network, and the resnet50,” Neuroscience Informatics, vol. 1, no. 4, p. 100013, 2021.
S. F. Santos, “Aprendizado profundo eficiente para classificação de imagens: Reduzindo o custo de pré-processamento e otimizando parâmetros,” Ph.D. dissertation, Universidade Federal de São Paulo (UNIFESP). Instituto de Ciência e Tecnologia, 2023.
S. F. Santos, N. Sebe, and J. Almeida, “The good, the bad, and the ugly: Neural networks straight from jpeg,” in IEEE International Conference on Image Processing (ICIP’20), 2020, pp. 1896–1900.
S. F. Santos and J. Almeida, “Less is more: Accelerating faster neural networks straight from jpeg,” Iberoamerican Congress on Pattern Recognition (CIARP’21), 2021.
S. F. Santos, R. Berriel, T. O. Santos, N. Sebe, and J. Almeida, “Budget-aware pruning for multi-domain learning,” in International Conference on Image Analysis and Processing (ICIAP’23), 2023.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’16), 2016, pp. 770–778.
M. Wallingford, H. Li, A. Achille, A. Ravichandran, C. Fowlkes, R. Bhotika, and S. Soatto, “Task adaptive parameter sharing for multi-task learning,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 7561–7570.
J. G. C. Presotto, S. F. Santos, L. P. Valem, F. A. Faria, J. P. Papa, J. Almeida, and D. C. G. Pedronette, “Weakly supervised learning based on hypergraph manifold ranking,” Journal of Visual Communication and Image Representation, vol. 89, p. 103666, 2022.
M. D. S. Miranda, L. F. A. e Silva, S. F. Santos, V. A. de Santiago Júnior, T. S. Körting, and J. Almeida, “A high-spatial resolution dataset and few-shot deep learning benchmark for image classification,” in Conference on Graphics, Patterns and Images (SIBGRAPI’22), 2022, pp. 19–24.
S. F. Santos, N. Sebe, and J. Almeida, “Cnns for jpegs: A study in computational cost,” arXiv preprint arXiv:2012.14426, 2023.
S. F. Santos, R. Berriel, T. Oliveira-Santos, N. Sebe, and J. Almeida, “Budget-aware pruning: Handling multiple domains with less parameters,” arXiv preprint arXiv:2309.11464, 2023.