Video cropping using salience maps: a case study on a sidewalk dataset

  • Suayder M. Costa USP
  • Rafael J. P. Damaceno USP
  • Roberto M. Cesar Jr. USP

Resumo


Video cropping aims trim video frames to highlight a subject area. This paper introduces a new framework for automated video cropping tailored to sidewalk footage, which is particularly useful in applications such sidewalk navigability and urban planning. By developing a method for video salience annotation using simple mouse input, the introduced framework provides a simple and flexible approach for video cropping. This application is crucial in scenarios where accurately focusing on pedestrian areas is necessary to enhance analysis and decisionmaking processes. The experimental results obtained from real data in the wild shows that the method is robust to a large variety of sidewalk conditions in different Brazilian cities.

Referências

K. Apostolidis and V. Mezaris, “A fast smart-cropping method and dataset for video retargeting,” in 2021 IEEE International Conference on Image Processing (ICIP), 2021, pp. 2618–2622.

Y.-L. Chen, T.-W. Huang, K.-H. Chang, Y.-C. Tsai, H.-T. Chen, and B.-Y. Chen, “Quantitative analysis of automatic image cropping algorithms:a dataset and comparative study,” in WACV 2017, 2017.

G. Bellitto, F. Proietto Salanitri, S. Palazzo, F. Rundo, D. Giordano, and C. Spampinato, “Hierarchical domain-adapted feature learning for video saliency prediction,” International Journal of Computer Vision, vol. 129, no. 12, pp. 3216–3232, Dec 2021. [Online]. Available: DOI: 10.1007/s11263-021-01519-y

Y. Wang, Q. Huang, C. Jiang, J. Liu, M. Shang, and Z. Miao, “Video stabilization: A comprehensive survey,” Neurocomput., vol. 516, no. C, p. 205–230, jan 2023. [Online]. Available: DOI: 10.1016/j.neucom.2022.10.008

P. Linardos, E. Mohedano, J. J. Nieto, N. E. O’Connor, X. Giró-i-Nieto, and K. McGuinness, “Simple vs complex temporal recurrences for video saliency prediction,” in 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9-12, 2019. BMVA Press, 2019, p. 182. [Online]. Available: [link]

S. Jain, P. Yarlagadda, S. Jyoti, S. Karthik, R. Subramanian, and V. Gandhi, “Vinet: Pushing the limits of visual modality for audio-visual saliency prediction,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3520–3527.

A. Bruckert, M. Christie, and O. Le Meur, “Where to look at the movies: Analyzing visual attention to understand movie editing,” Behavior Research Methods, vol. 55, no. 6, pp. 2940–2959, Sep 2023. [Online]. Available: DOI: 10.3758/s13428-022-01949-7

K. Zhang, Y. Shang, S. Li, S. Liu, and Z. Chen, “Salcrop: Spatio-temporal saliency based video cropping,” in 2022 IEEE International Conference on Visual Communications and Image Processing (VCIP), 2022, pp. 1–1.

Z. Tang, C. Lv, and Y. Tang, “Adaptive cropping with interframe relative displacement constraint for video retargeting,” Signal Processing: Image Communication, vol. 104, p. 116666, 2022. [Online]. Available: [link]

H. Imani and M. B. Islam, “Spatio-temporal consistent non-homogeneous extreme video retargeting,” in 2024 IEEE International Conference on Consumer Electronics (ICCE), 2024, pp. 1–6.

R. Damaceno, L. Ferreira, F. Miranda, M. Hosseini, and R. Cesar Jr, “Sideseeing: A multimodal dataset and collection of tools for sidewalk assessment,” arXiv preprint arXiv:2407.06464, 2024.

M. Jiang, S. Huang, J. Duan, and Q. Zhao, “Salicon: Saliency in context,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1072–1080.

N. W. Kim, Z. Bylinskii, M. A. Borkin, K. Z. Gajos, A. Oliva, F. Durand, and H. Pfister, “Bubbleview: an interface for crowdsourcing image importance maps and tracking visual attention,” ACM Transactions on Computer-Human Interaction (TOCHI), vol. 24, no. 5, pp. 1–40, 2017.

M. C. Chen, J. R. Anderson, and M. H. Sohn, “What can a mouse cursor tell us more? correlation of eye/mouse movements on web browsing,” in CHI’01 extended abstracts on Human factors in computing systems, 2001, pp. 281–282.

Y. Gitman, M. Erofeev, D. Vatolin, B. Andrey, and F. Alexey, “Semi-automatic visual-attention modeling and its application to video compression,” in 2014 IEEE international conference on image processing (ICIP). IEEE, 2014, pp. 1105–1109.

W. Wang, J. Shen, J. Xie, M. Cheng, H. Ling, and A. Borji, “Revisiting video saliency prediction in the deep learning era,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
Publicado
30/09/2024
COSTA, Suayder M.; DAMACENO, Rafael J. P.; CESAR JR., Roberto M.. Video cropping using salience maps: a case study on a sidewalk dataset. In: WORKSHOP DE TRABALHOS EM ANDAMENTO - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 37. , 2024, Manaus/AM. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 111-116. DOI: https://doi.org/10.5753/sibgrapi.est.2024.31654.