SimpleHRTF-3D: From Head Mesh to Immersive Spatial Audio

Augusto M. P. de Mendonça; Abraão de Santana; Jan M. Teixeira; Kedson A. Silva; Mariza Ferro; Igor M. Coelho

doi:10.5753/webmedia.2025.16045

Augusto M. P. de Mendonça UFF
Abraão de Santana UFF
Jan M. Teixeira UFF
Kedson A. Silva UFF
Mariza Ferro UFF
Igor M. Coelho UFF

DOI: https://doi.org/10.5753/webmedia.2025.16045

Resumo

Head-Related Transfer Functions (HRTFs) are essential for immersive spatial audio in multimedia applications like virtual reality and gaming, yet personalization remains challenging due to precise anthropometric measurement requirements. This paper introduces an open source pipeline for HRTF customization through three objectives. First, we validate a published Random Forest model on HUTUBS, achieving similar R2=89.8% and SD=4.45 dB with manual measures. Second, SimpleHRTF-3D automates extraction from 3D head meshes using two-step PSO, achieving 10.96% mean extraction error (a 2.31 percentage point absolute reduction from 13.27% single-step error), yielding R2=89.3% and SD=5.04 dB. As proof of concept, we extended the method to use photogrammetry, enabling photo-to-HRTF personalization. Validated on 58 HUTUBS subjects, our pipeline integrates manual, mesh, and image methods, providing reproducible tools for multimedia HRTF adaptation. The results demonstrate high-fidelity spatial audio capabilities for diverse immersive applications.

Palavras-chave: HRTF personalization, spatial audio, machine learning, particle swarm optimization, metaheuristics, 3D mesh extraction, anthropometric measurements

Referências

Laith Abualigah. 2025. Particle Swarm Optimization: Advances, Applications, and Experimental Insights. Computers, Materials and Continua 82, 2 (2025), 1539–1592. DOI: 10.32604/cmc.2025.060765

Ramona Bomhardt, Hark Braren, and Janina Fels. 2017. Individualization of headrelated transfer functions using principal component analysis and anthropometric dimensions. Proceedings of Meetings on Acoustics 29, 1 (09 2017), 050007. DOI: 10.1121/2.0000562

Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (2001), 5–32. DOI: 10.1023/A:1010933404324

Fabian Brinkmann, Manoj Dinakaran, Robert Pelzer, Peter Grosche, Daniel Voss, and Stefan Weinzierl. 2019. A Cross-Evaluated Database of Measured and Simulated HRTFs Including 3D Head Meshes, Anthropometric Features, and Headphone Impulse Responses. Journal of the Audio Engineering Society 67 (09 2019), 705–718. DOI: 10.17743/jaes.2019.0024

Dawson-Haggerty et al. [n. d.]. trimesh. [link]

Brinkmann Fabian, Dinakaran Manoj, Pelzer Robert, Wohlgemuth Jan Joschka, Seipel Fabian, Voss Daniel, Grosche Peter, and Weinzierl Stefan. 2019. The HUTUBS head-related transfer function (HRTF) database repository. DOI: 10.14279/depositonce-8487

Davide Fantini, Michele Geronazzo, Federico Avanzini, and Stavros Ntalampiras. 2025. A Survey on Machine Learning Techniques for Head-Related Transfer Function Individualization. IEEE Open Journal of Signal Processing 6 (2025), 30–56. DOI: 10.1109/OJSP.2025.3528330

Janina Fels and Michael Vorlaender. 2009. Anthropometric Parameters Influencing Head-Related Transfer Functions. ACTA ACUSTICA united with ACUSTICA 95 (03 2009), 331–342. DOI: 10.3813/AAA.918156

Mariza Ferro, Gabrieli D. Silva, Felipe B. de Paula, Vitor Vieira, and Bruno Schulze. 2023. Towards a sustainable artificial intelligence: A case study of energy efficiency in decision tree algorithms. Concurrency and Computation: Practice and Experience 35, 17 (2023), e6815. DOI: 10.1002/cpe.6815 arXiv: [link]

Iyyakutti Iyappan Ganapathi, Syed Sadaf Ali, Surya Prakash, Ngoc-Son Vu, and Naoufel Werghi. 2023. A Survey of 3D Ear Recognition Techniques. ACM Computing Surveys, Volume 55, Issue 10 55, 10, Article 204 (2023), 36 pages. DOI: 10.1145/3560884

A. Giachetti, E. Mazzi, F. Piscitelli, M. Aono, A. Ben Hamza, T. Bonis, P. Claes, A. Godil, C. Li, M. Ovsjanikov, V. Pătrăucean, C. Shu, J. Snyders, P. Suetens, A. Tatsuma, D. Vandermeulen, S. Wuhrer, and P. Xi. 2014. Automatic location of landmarks used in manual anthropometry. In EurographicsWorkshop on 3D Object Retrieval (Strasbourg, France) (3DOR 14). Eurographics Association, Goslar, DEU, 93–100.

J. Kennedy and R. Eberhart. 1995. Particle swarm optimization. In Proceedings of ICNN’95 - International Conference on Neural Networks, Vol. 4. 1942–1948 vol.4. DOI: 10.1109/ICNN.1995.488968

Geon Woo Lee and Hong Kook Kim. 2018. Personalized HRTF Modeling Based on Deep Neural Network Using Anthropometric Measurements and Images of the Ear. Applied Sciences 8, 11 (2018). DOI: 10.3390/app8112180

Yangyang Lin, Johannes G. G. Dobbe, Nadia Lachkar, Elsa M. Ronde, Theo H. Smit, Corstiaan C. Breugem, and Geert J. Streekstra. 2024. A three-dimensional algorithm for precise measurement of human auricle parameters. Scientific Reports 14, 1 (2024), 10760. DOI: 10.1038/s41598-024-61351-5

Dalius Matuzevičius and Art¯uras Serackis. 2022. Three-Dimensional Human Head Reconstruction Using Smartphone-Based Close-Range Video Photogrammetry. Applied Sciences 12, 1 (2022). DOI: 10.3390/app12010229

Microsoft Research. 2025. Spatial Audio–Project Overview. [link]. Industrial context: spatial-audio applications, personalised HRTF challenges, and technology transfer to Windows 10, Xbox One, Soundscape and HoloLens.

John C. Middlebrooks. 1999. Individual differences in external-ear transfer functions reduced by scaling in frequency. Journal of the Acoustical Society of America 106, 3 (September 1999), 1480–1492. DOI: 10.1121/1.427176

Parham Mokhtari, Ryouichi Nishimura, and Hironori Takemoto. 2008. Toward HRTF personalization: an auditory-perceptual evaluation of simulated and measured HRTFs. In Proceedings of the 14th International Conference on Auditory Display.

Md. Mursalin and Syed Mohammed Shamsul Islam. 2021. Deep Learning for 3D Ear Detection: A Complete Pipeline From Data Generation to Segmentation. IEEE Access 9 (2021), 164976–164985. DOI: 10.1109/ACCESS.2021.3129507

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.

Robert Pelzer, Manoj Dinakaran, Fabian Brinkmann, Steffen Lepa, Peter Grosche, and Stefan Weinzierl. 2020. Head-related transfer function recommendation based on perceptual similarities and anthropometric features. The Journal of the Acoustical Society of America 148, 6 (12 2020), 3809–3817. DOI: 10.1121/.2011.11.005

Myllena Prado, Lucas Althoff, Sana Alamgeer, Alessandro Rodrigues e Silva, Ravi Prakash, Marcelo M. Carvalho, and Mylène C. Q. Farias. 2022. 360RAT: A Tool for Annotating Regions of Interest in 360-degree Videos. In WebMedia ’22 (Curitiba, Brazil). Association for Computing Machinery, 272–280. DOI: 10.1145/3539637.3557930

Surya Prakash and Phalguni Gupta. 2012. An efficient ear localization technique. Image and Vision Computing 30, 1 (2012), 38–50. DOI: 10.1016/j.imavis.2011.11.005

Sheldon M. Ross. 2021. Introduction to Probability and Statistics for Engineers and Scientists (6 ed.). Academic Press, Amsterdam. 32–33 pages.

Radu Bogdan Rusu and Steve Cousins. 2011. 3D is here: Point Cloud Library (PCL). In 2011 IEEE International Conference on Robotics and Automation. 1–4. DOI: 10.1109/ICRA.2011.5980567

Johannes L. Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4104–4113. DOI: 10.1109/CVPR.2016.445

Johannes L. Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In Computer Vision – ECCV 2016. 501–518. DOI: 10.1109/CVPR.2016.445

Aleph Silveira, Roope Raisamo, Fotios Spyridonis, Alexandra Covaci, George Ghinea, and Celso A. S. Santos. 2023. Guidelines for conducting biofeedbackenhanced QoE studies in mulsemedia-enhanced virtual reality. In Proceedings of the 29th Brazilian Symposium on Multimedia and the Web (Ribeirão Preto, Brazil) (WebMedia ’23). Association for Computing Machinery, New York, NY, USA, 32–40. DOI: 10.1145/3617023.3617029

F. Stärz, S. Van De Par, S. Roskopf, L. O. H. Kroczek, A. Mühlberger, and M. Blau. 2025. Comparison of binaural auralisations to a real loudspeaker in an audiovisual virtual classroom scenario: Effect of room acoustic simulation, HRTF dataset, and head-mounted display on room acoustic perception. Acta Acustica 9 (2025), 31. DOI: 10.1051/aacus/2025012

Yuqi Teng and Xiaoli Zhong. 2023. An Individualized HRTF Model Based on Random Forest and Anthropometric Parameters. In 2023 15th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC). 143–146. DOI: 10.1109/IHMSC58761.2023.00041

Unidata. 2025. Network Common Data Form (netCDF). Boulder, CO. DOI: 10.5065/D6H70CW6 [software].

Writefull. 2025. Writefull for Overleaf - AI Language Feedback for LaTeX. [link].

Song Yan, Johan Wirta, and Joni-Kristian Kämäräinen. 2020. Anthropometric clothing measurements from 3D body scans. Machine Vision and Applications 31, 1 (jan 2020), 7. DOI: 10.1007/s00138-019-01054-4

SimpleHRTF-3D: From Head Mesh to Immersive Spatial Audio

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)