Adding Crowd Noise to Sports Commentary using Generative Models

Neil Shah; Dharmeshkumar M. Agrawal; Niranajan Pedanekar

doi:10.5753/lique.2021.15715

Neil Shah Tata Consultancy Services Pvt. Ltd.
Dharmeshkumar M. Agrawal Tata Consultancy Services Pvt. Ltd.
Niranajan Pedanekar Tata Consultancy Services Pvt. Ltd.

DOI: https://doi.org/10.5753/lique.2021.15715

Abstract

Crowd noise forms an integral part of a live sports experience. In the post-COVID era, when live audiences are absent, crowd noise needs to be added to the live commentary. This paper exploits the correlation between commentary and crowd noise of a live sports event and presents an audio stylizing sports commentary method by generating live stadium-like sound using neural generative models. We use the Generative Adversarial Network (GAN)-based architectures such as Cycle-consistent GANs (Cycle-GANs) and Mel-GANs to generate live stadium-like sound samples given the live commentary. Due to the unavailability of raw commentary sound samples, we use end-to-end time-domain source separation models (SEGAN and Wave-U-Net) to extract commentary sound from combined recordings of the live sound acquired from YouTube highlights of soccer videos. We present a qualitative and a subjective user evaluation of the similarity of the generated live sound with the reference live sound.

References

[n.d.]. Canceled Events Due to the Coronavirus: A Complete List. https://www.vulture.com/2020/05/events-cancelled-coronavirus.html. (Accessed on 05/11/2020).

[n.d.]. FIFA 20 crowd noise to be used for real Premier League games - here’s how it’ll work | GamesRadar+. https://www.gamesradar.com/fifa-20-crowd-noise-to-be-used-for-real-premier-league-games-heres-how-itll-work/. (Accessed on 02/04/2021).

[n.d.]. German Bundesliga broadcasts: Where the ’crowd noise’ feed comes from and how they made it. https://www.espn.in/football/germanbundesliga/story/4102971/german-bundesliga-broadcasts-where-the-crowd-noise-feed-comes-from-and-how-they-made-it. (Accessed on 02/04/2021).

[n.d.]. MyApplause: An app allowing users to cheer from their homes for enhanced sporting experience. https://myapplause.app/en/myapplause-en/. (Accessed on 03/26/2021).

Gino Brunner, Yuyi Wang, Roger Wattenhofer, and Sumu Zhao. 2018. Symbolic music genre transfer with cyclegan. In 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 786–793.

Sheranne Fairley and B David Tyler. 2012. Bringing baseball to the big screen: Building sense of community outside of the ballpark. Journal of Sport Management 26, 3 (2012), 258–270.

Mack Hagood and Travis Vogan. 2016. The 12th man: Fan noise in the contemporary NFL. Popular communication 14, 1 (2016), 30–38.

John Hall, Barry O’Mahony, and Julian Vieceli. 2010. An empirical model of attendance factors at major sporting events. International Journal of Hospitality Management 29, 2 (2010), 328–334.

Alan M Nevill, Nigel J Balmer, and A Mark Williams. 2002. The influence of crowd noise and experience upon refereeing decisions in football. Psychology of Sport and Exercise 3, 4 (2002), 261–272.

Santiago Pascual, Antonio Bonafonte, and Joan Serra. 2017. SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452 (2017).

Marco Pasini. 2019. Melgan-vc: Voice conversion and audio style transfer on arbitrarily long samples using spectrograms. arXiv preprint arXiv:1910.03713 (2019).

ITUT Rec. 1994. P. 85. a method for subjective performance assessment of the quality of speech voice output devices. International Telecommunication Union, Geneva (1994).

Daniel Stoller, Sebastian Ewert, and Simon Dixon. 2018. Wave-u-net: A multi-scale neural network for end-to-end audio source separation. arXiv preprint arXiv:1806.03185 (2018).

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223–2232.