An Experimental Analysis of Model Compression Techniques for Object Detection

Andrey de Aguiar Salvi; Rodrigo Coelho Barros

doi:10.5753/kdmile.2020.11958

Andrey de Aguiar Salvi Pontifícia Universidade Católica
Rodrigo Coelho Barros Pontifícia Universidade Católica

DOI: https://doi.org/10.5753/kdmile.2020.11958

Resumo

Recent research on Convolutional Neural Networks focuses on how to create models with a reduced number of parameters and a smaller storage size while keeping the model’s ability to perform its task, allowing the use of the best CNN for automating tasks in limited devices, with reduced processing power, memory, or energy consumption constraints. There are many different approaches in the literature: removing parameters, reduction of the floating-point precision, creating smaller models that mimic larger models, neural architecture search (NAS), etc. With all those possibilities, it is challenging to say which approach provides a better trade-off between model reduction and performance, due to the difference between the approaches, their respective models, the benchmark datasets, or variations in training details. Therefore, this article contributes to the literature by comparing three state-of-the-art model compression approaches to reduce a well-known convolutional approach for object detection, namely YOLOv3. Our experimental analysis shows that it is possible to create a reduced version of YOLOv3 with 90% fewer parameters and still outperform the original model by pruning parameters. We also create models that require only 0.43% of the original model’s inference effort.

Palavras-chave: Deep Learning, Efficient Convolutions, Model Compression, Object Detection, Pruning, YOLOv3

Referências

Bucilă, C., Caruana, R., and Niculescu-Mizil, A. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, United States of America, pp. 535–541, 2006.

Canziani, A., Paszke, A., and Culurciello, E. An analysis of deep neural network models for practical applications. https://arxiv.org/abs/1605.07678, 2016.

Chen, G., Choi, W., Yu, X., Han, T., and Chandraker, M. Learning efficient object detection models with knowledge distillation. In Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, United States of America, pp. 742–751, 2017.

Cheng, Y., Wang, D., Zhou, P., and Zhang, T. A survey of model compression and acceleration for deep neural networks. https://arxiv.org/abs/1710.09282, 2017.

Courbariaux, M., Bengio, Y., and David, J.-P. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., pp. 3123–3131, 2015.

Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., and Bengio, Y. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. https://arxiv.org/abs/1602.02830, 2016.

Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., pp. 1269–1277, 2014.

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascalnetwork.org/challenges/VOC/voc2007/workshop/index.html, 2007.

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascalnetwork.org/challenges/VOC/voc2012/workshop/index.html, 2012.

Frankle, J. and Carbin, M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. https://arxiv.org/abs/1803.03635, 2018.

Frankle, J., Dziugaite, G. K., Roy, D. M., and Carbin, M. Stabilizing the lottery ticket hypothesis. https://arxiv.org/abs/1903.01611, 2019.

Gong, Y., Liu, L., Yang, M., and Bourdev, L. Compressing deep convolutional networks using vector quantization. https://arxiv.org/abs/1412.6115, 2014.

Han, S., Mao, H., and Dally, W. J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. https://arxiv.org/abs/1510.00149, 2015.

He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., and Li, M. Bag of tricks for image classification with convolutional neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, United States of America, pp. 558–567, 2019.

Hinton, G., Vinyals, O., and Dean, J. Distilling the knowledge in a neural network. https://arxiv.org/abs/1503.02531, 2015.

Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., and Keutzer, K. Squeezenet: Alexnetlevel accuracy with 50x fewer parameters and [0.5mb model size. https://arxiv.org/abs/1602.07360, 2016.

Jocher, G., guigarfr, perry0418, Ttayu, Veitch-Michaelis, J., Bianconi, G., Baltacı, F., Suess, D., and WannaSeaU. ultralytics/yolov3: Video Inference, Transfer Learning Improvements, 2019.

Li, H., Ouyang, W., and Wang, X. Multi-bias non-linear activation in deep neural networks. In International conference on machine learning. New York, United States of America, pp. 221–229, 2016.

Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollar, P. Focal loss for dense object detection. In The IEEE International Conference on Computer Vision (ICCV). Venice, Italy, pp. 2980–2988, 2017.

Rastegari, M., Ordonez, V., Redmon, J., and Farhadi, A. Xnor-net: Imagenet classification using binary convolutional neural networks. In The European Conference on Computer Vision (ECCV). Cham, Switzerland, pp. 525–542, 2016.

Redmon, J. and Farhadi, A. Yolov3: An incremental improvement. https://arxiv.org/abs/1804.02767, 2018.

Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., and Bengio, Y. Fitnets: Hints for thin deep nets. https://arxiv.org/abs/1412.6550, 2014.

Savarese, P., Silva, H., and Maire, M. Winning the lottery with continuous sparsification. https://arxiv.org/abs/1912.04427, 2019.

Tung, F. and Mori, G. Clip-q: Deep network compression learning by in-parallel pruning-quantization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, United States of America, pp. 7873–7882, 2018.

Wang, W., Hong, W., Wang, F., and Yu, J. Gan-knowledge distillation for one-stage object detection. IEEE Access vol. 8, pp. 60719–60727, Mar, 2020.

Wang, X., Zhang, R., Sun, Y., and Qi, J. Kdgan: Knowledge distillation with generative adversarial networks. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., pp. 775–786, 2018.

Wong, A., Famuori, M., Shafiee, M. J., Li, F., Chwyl, B., and Chung, J. Yolo nano: a highly compact you only look once convolutional neural network for object detection. https://arxiv.org/abs/1910.01271, 2019.