Pig aggression classification using CNN, Transformers and Recurrent Networks

  • Junior Silva Souza UFMS
  • Eduardo Bedin UCDB
  • Gabriel Toshio Hirokawa Higa UCDB
  • Newton Loebens UNIPAMPA
  • Hemerson Pistori UCDB

Resumo


The recognition of behavioral relationships related to aggression in pigs is an important task that is visually observed in the livestock industry. However, this task is laborious and susceptible to errors, which can be reduced through automation by visually classifying videos captured in a controlled environment. Video classification can be automated using techniques from computer vision and artificial intelligence, employing neural network methods. The primary techniques utilized in this study are variants of transformers: STAM, TimeSformer, and ViViT, as well as techniques using convolutions, such as ResNet3D2, ResNet(2+1)D, and CnnLstm. These techniques were compared to analyze their individual contributions. The performance was evaluated using metrics such as accuracy, precision, and recall. The TimeSformer technique demonstrated the most promising results in video classification, achieving a median accuracy of 0.729.

Palavras-chave: Aggressiveness, Video Classification, Transformers, Convolutional

Referências

L. Lassaletta, F. Estellés, A. H. Beusen, L. Bouwman, S. Calvet, H. J. Van Grinsven, J. C. Doelman, E. Stehfest, A. Uwizeye, and H. Westhoek, “Future global pig production systems according to the shared socioeconomic pathways,” Science of the Total Environment, vol. 665, pp. 739–751, 2019.

A. Alameer, I. Kyriazakis, H. A. Dalton, A. L. Miller, and J. Bacardit, “Automatic recognition of feeding and foraging behaviour in pigs using deep learning,” biosystems engineering, vol. 197, pp. 91–104, 2020.

H. Shao, J. Pu, and J. Mu, “Pig-posture recognition based on computer vision: Dataset and exploration,” Animals, vol. 11, no. 5, p. 1295, 2021.

J. Xu, S. Zhou, A. Xu, J. Ye, and A. Zhao, “Automatic scoring of postures in grouped pigs using depth image and cnn-svm,” Computers and Electronics in Agriculture, vol. 194, p. 106746, 2022.

S. Ma, Q. Zhang, T. Li, and H. Song, “Basic motion behavior recognition of single dairy cow based on improved rexnet 3d network,” Computers and Electronics in Agriculture, vol. 194, p. 106772, 2022.

M. Wang, M. L. Larsen, D. Liu, J. F. Winters, J.-L. Rault, and T. Norton, “Towards re-identification for long-term tracking of group housed pigs,” Biosystems Engineering, vol. 222, pp. 71–81, 2022.

M. F. Hansen, E. M. Baxter, K. M. Rutherford, A. Futro, M. L. Smith, and L. N. Smith, “Towards facial expression recognition for on-farm welfare assessment in pigs,” Agriculture, vol. 11, no. 9, p. 847, 2021.

W. Shigang, W. Jian, C. Meimei, and W. Jinyang, “A pig face recognition method for distinguishing features,” in 2021 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC). IEEE, 2021, pp. 972–976.

J. Liao, H. Li, A. Feng, X. Wu, Y. Luo, X. Duan, M. Ni, and J. Li, “Domestic pig sound classification based on transformercnn,” Applied Intelligence, pp. 1–17, 2022.

Z. Hu, H. Yang, and T. Lou, “Dual attention-guided feature pyramid network for instance segmentation of group pigs,” Computers and Electronics in Agriculture, vol. 186, p. 106140, 2021.

M. Kim, Y. Choi, J.-n. Lee, S. Sa, and H.-c. Cho, “A deep learning-based approach for feeding behavior recognition of weanling pigs,” Journal of Animal Science and Technology, vol. 63, no. 6, pp. 1453–1463, 2021. [Online]. DOI: 10.5187/jast.2021.e127

Y. Zhang, J. Cai, D. Xiao, Z. Li, and B. Xiong, “Real-time sow behavior detection based on deep learning,” Computers and Electronics in Agriculture, vol. 163, p. 104884, 2019.

K. Zhang, D. Li, J. Huang, and Y. Chen, “Automated video behavior recognition of pigs using two-stream convolutional networks,” Sensors, vol. 20, no. 4, p. 1085, 2020.

E. Fernández-Carrión, J. Á. Barasona, Á. Sánchez, C. Jurado, E. Cadenas-Fernández, and J. M. Sánchez-Vizcáıno, “Computer vision applied to detect lethargy through animal motion monitoring: a trial on african swine fever in wild boar,” Animals, vol. 10, no. 12, p. 2241, 2020.

C. Chen, W. Zhu, M. Oczak, K. Maschat, J. Baumgartner, M. L. V. Larsen, and T. Norton, “A computer vision approach for recognition of the engagement of pigs with different enrichment objects,” Computers and Electronics in Agriculture, vol. 175, p. 105580, 2020.

M. Wang, M. Larsen, F. Bayer, K. Maschat, J. Baumgartner, J.-L. Rault, T. Norton et al., “A pca-based frame selection method for applying cnn and lstm to classify postural behaviour in sows,” Computers and Electronics in Agriculture, vol. 189, p. 106351, 2021.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.

X. Dai, Y. Chen, B. Xiao, D. Chen, M. Liu, L. Yuan, and L. Zhang, “Dynamic head: Unifying object detection heads with attentions,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 7373–7382.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.

Y. Li, C.-Y. Wu, H. Fan, K. Mangalam, B. Xiong, J. Malik, and C. Feichtenhofer, “Mvitv2: Improved multiscale vision transformers for classification and detection,” 2021.

D. Liu, M. Oczak, K. Maschat, J. Baumgartner, B. Pletzer, D. He, and T. Norton, “A computer vision-based method for spatial-temporal action recognition of tail-biting behaviour in group-housed pigs,” Biosystems Engineering, vol. 195, pp. 27–41, 2020.

P. Patil, V. Pawar, Y. Pawar, and S. Pisal, “Video content classification using deep learning,” arXiv preprint arXiv:2111.13813, 2021.

D. K. Singh, M. A. Ansari, and S. Pallawi, “Computer vision based visual activity classification through deep learning approaches,” in 2022 IEEE Region 10 Symposium (TENSYMP), 2022, pp. 1–5.

D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, and M. Paluri, “A closer look at spatiotemporal convolutions for action recognition,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 6450–6459.

C. Harris, K. R. Finn, M.-L. Kieseler, M. R. Maechler, and P. U. Tse, “Deepaction: a matlab toolbox for automated classification of animal behavior in video,” Scientific Reports, vol. 13, no. 1, p. 2688, 2023.

J. Bi, Z. Zhu, and Q. Meng, “Transformer in computer vision,” in 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), 2021, pp. 178–188.

Z. Fu, “Vision transformer: Vit and its derivatives,” 2022.

A. Arnab, M. Dehghani, G. Heigold, C. Sun, M. Lǔcić, and C. Schmid, “Vivit: A video vision transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 6836–6846.

G. Sharir, A. Noy, and L. Zelnik-Manor, “An image is worth 16x16 words, what is a video worth?” 2021.

G. Bertasius, H. Wang, and L. Torresani, “Is space-time attention all you need for video understanding?” 2021. [Online]. Available: [link]

W. Liu,W. Luo, Z. Li, P. Zhao, S. Gao et al., “Margin learning embedded prediction for video anomaly detection with a few anomalies.” in IJCAI, 2019, pp. 3023–3030.

R. Strudel, R. Garcia, I. Laptev, and C. Schmid, “Segmenter: Transformer for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 7262–7272.

M. M. Islam, N. Tasnim, and J.-H. Baek, “Human gender classification using transfer learning via pareto frontier cnn networks,” Inventions, vol. 5, no. 2, p. 16, 2020.

A. Nasirahmadi, B. Sturm, S. Edwards, K.-H. Jeppsson, A.-C. Olsson, S. Müller, and O. Hensel, “Deep learning and machine vision approaches for posture detection of individual pigs,” Sensors, vol. 19, no. 17, 2019.

C. Desai, “Comparative analysis of optimizers in deep neural networks,” International Journal of Innovative Science and Research Technology, vol. 5, no. 10, pp. 959–962, 2020.

A. Almadani, A. Shivdeo, E. Agu, and J. Kpodonu, “Deep video action recognition models for assessing cardiac function from echocardiograms,” in 2022 IEEE International Conference on Big Data (Big Data), 2022, pp. 5189–5199.

W. Zhou, Y. Zhu, J. Lei, R. Yang, and L. Yu, “Lsnet: Lightweight spatial boosting network for detecting salient objects in rgb-thermal images,” IEEE Transactions on Image Processing, vol. 32, pp. 1329–1340, 2023.

C. Zhang, A. Ding, Z. Fu, J. Ni, Q. Chen, Z. Xiong, B. Liu, Y. Cao, S. Chen, and X. Liu, “Deep learning for gastric location classification: An analysis of location boundaries and improvements through attention and contrastive learning,” Smart Health, vol. 28, p. 100394, 2023.

S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM computing surveys, vol. 54, no. 10s, pp. 1–41, 2022.

J. Han, J. Siegford, D. Colbry, R. Lesiyon, A. Bosgraaf, C. Chen, T. Norton, and J. P. Steibel, “Evaluation of computer vision for detecting agonistic behavior of pigs in a single-space feeding stall through blocked cross-validation strategies,” Computers and Electronics in Agriculture, vol. 204, p. 107520, 2023.
Publicado
06/11/2024
SOUZA, Junior Silva; BEDIN, Eduardo; HIGA, Gabriel Toshio Hirokawa; LOEBENS, Newton; PISTORI, Hemerson. Pig aggression classification using CNN, Transformers and Recurrent Networks. In: WORKSHOP DE VISÃO COMPUTACIONAL (WVC), 19. , 2024, Rio Paranaíba/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 1-6. DOI: https://doi.org/10.5753/wvc.2024.34004.

Artigos mais lidos do(s) mesmo(s) autor(es)

Obs.: Esse plugin requer que pelo menos um plugin de estatísticas/relatórios esteja habilitado. Se o seu plugins de estatísticas oferece mais que uma métrica, então, por favor, também selecione uma métrica principal na página de configurações administrativas do site e/ou da revista.