Exploring multi-camera views from user-generated sports videos

  • Larissa Pessoa UFAM
  • Elton Alencar UFAM
  • Fernanda Costa UFAM
  • Guilherme Souza UFAM
  • Rosiane de Freitas UFAM

Resumo


The proliferation of mobile devices with video recording capabilities has revolutionized audiovisual content creation, sharing, and consumption, turning user-generated video (UGV) platforms into major data sources. Despite this growth, there is a notable gap in publicly available datasets featuring multiangle recordings of sports events captured with various mobile cameras. This paper introduces the MUVY Dataset, which offers a diverse collection of sports videos from multiple perspectives, unrestricted by video size. The dataset addresses common challenges in user-generated videos, such as shaking, occlusions, blurring, and abrupt movements. Each video is accompanied by metadata that include camera identification, YouTube URLs, extracted frames, and object annotations. Covering sports like soccer, American football, artistic gymnastics, athletics, basketball, tennis, and cricket, the MUVY Dataset facilitates advancements in video understanding and viewpoint selection. Initial experiments in camera pose estimation demonstrate the dataset’s potential for training models in this domain. Additionally, it supports the selection of the closest viewpoint based on object detection and the relative area occupied by detected objects. Overall, the MUVY Dataset aims to advance multi-camera video analysis and related research areas.
Palavras-chave: multicam, object detection, sport events, video dataset, user-generated video, Youtube

Referências

Bailer, W., Pike, C., Bauwens, R., Grandl, R., Matton, M., and Thaler, M. Multi-sensor concert recording dataset including professional and user-generated content. In Proceedings of the 6th ACM multimedia systems conference. pp. 201–206, 2015.

Cheng, T., Song, L., Ge, Y., Liu, W., Wang, X., and Shan, Y. Yolo-world: Real-time open-vocabulary object detection. arXiv preprint arXiv:2401.17270 , 2024.

Cho, B., Le, B. M., Kim, J., Woo, S., Tariq, S., Abuadbba, A., and Moore, K. Towards understanding of deepfake videos in the wild. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. pp. 4530–4537, 2023.

Citraro, L., Márquez-Neila, P., Savare, S., Jayaram, V., Dubout, C., Renaut, F., Hasfura, A., Ben Shitrit, H., and Fua, P. Real-time camera pose estimation for sports fields. Machine Vision and Applications vol. 31, pp. 1–13, 2020.

Cricri, F., Roininen, M., Mate, S., Leppänen, J., Curcio, I. D., and Gabbouj, M. Multi-sensor fusion for sport genre classification of user generated mobile videos. In 2013 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp. 1–6, 2013.

de Aguiar Salvi, A. and Barros, R. C. An experimental analysis of model compression techniques for object detection. Proceedings of the 8th KDMiLe, 2020, Brasil., 2020.

Deliege, A., Cioppa, A., Giancola, S., Seikavandi, M. J., Dueholm, J. V., Nasrollahi, K., Ghanem, B., Moeslund, T. B., and Van Droogenbroeck, M. Soccernet-v2: A dataset and benchmarks for holistic understanding of broadcast soccer videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4508–4519, 2021.

Fayyad, U. Knowledge discovery in databases: An overview. In International Conference on Inductive Logic Program- ming. Springer, pp. 1–16, 1997.

Giancola, S., Amine, M., Dghaily, T., and Ghanem, B. Soccernet: A scalable dataset for action spotting in soccer videos. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 1711–1721, 2018.

Gonçalves, L. A., Zampolo, R. F., and Barros, F. B. A multi-stream dense network with different receptive fields to assess visual quality. In Symposium on Knowledge Discovery, Mining and Learning (KDMiLe). SBC, pp. 65–72, 2019.

Held, J., Cioppa, A., Giancola, S., Hamdi, A., Ghanem, B., and Van Droogenbroeck, M. Vars: Video assistant referee system for automated soccer decision making from multiple views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5085–5096, 2023.

Li, Y., Chen, L., He, R., Wang, Z., Wu, G., and Wang, L. Multisports: A multi-person video dataset of spatio-temporally localized sports actions. In Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 13536–13545, 2021.

Naab, T. K. and Sehl, A. Studies of user-generated content: A systematic review. Journalism 18 (10): 1256–1273, 2017.

Olagoke, A. S., Ibrahim, H., and Teoh, S. S. Literature survey on multi-camera system and its application. IEEE Access vol. 8, pp. 172892–172922, 2020.

Perera, A. G., Law, Y. W., Ogunwa, T. T., and Chahl, J. A multiviewpoint outdoor dataset for human action recognition. IEEE Transactions on Human-Machine Systems 50 (5): 405–413, 2020.

Ristani, E., Solera, F., Zou, R., Cucchiara, R., and Tomasi, C. Performance measures and a data set for multi-target, multi-camera tracking. In European conference on computer vision. Springer, pp. 17–35, 2016.

Saini, M., Venkatagiri, S. P., Ooi, W. T., and Chan, M. C. The jiku mobile video dataset. In Proceedings of the 4th ACM multimedia systems conference. pp. 108–113, 2013.

Su, S., Delbracio, M., Wang, J., Sapiro, G., Heidrich, W., and Wang, O. Deep video deblurring for hand-held cameras. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1279–1288, 2017.

Tang, Y., Bi, J., Xu, S., Song, L., Liang, S., Wang, T., Zhang, D., An, J., Lin, J., Zhu, R., et al. Video understanding with large language models: A survey. arXiv preprint arXiv:2312.17432 , 2023.

Wang, M., Shi, D., Guan, N., Yi, W., Zhang, T., and Fan, Z. Multi-target multi-camera tracking with human body part semantic features. In CIKM. pp. 199–208, 2019.

Zhang, Y., Bai, Y., Chang, J., Zang, X., Lu, S., Lu, J., Feng, F., Niu, Y., and Song, Y. Leveraging watch-time feedback for short-video recommendations: A causal labeling framework. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. pp. 4952–4959, 2023.

Zhao, Z., Chai, W., Hao, S., Hu, W., Wang, G., Cao, S., Song, M., Hwang, J.-N., and Wang, G. A survey of deep learning in sports applications: Perception, comprehension, and decision. arXiv preprint arXiv:2307.03353, 2023.
Publicado
17/11/2024
PESSOA, Larissa; ALENCAR, Elton; COSTA, Fernanda; SOUZA, Guilherme; FREITAS, Rosiane de. Exploring multi-camera views from user-generated sports videos. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 12. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 105-112. ISSN 2763-8944. DOI: https://doi.org/10.5753/kdmile.2024.244721.