Framework Procedural para Geração de Dados Sintéticos Baseados em Simulação Estatística: Uma Abordagem para Análise de Comportamento Humano

  • Jáder Louis de S. Gonçalves UFRO
  • Wyllgner F. Amorim UFRO
  • Nicolas F. C. Sales UFRO
  • Lucas Marques da Cunha UFRO

Resumo


Data sets for training convolutional neural networks (CNNs) pose a challenge due to the difficulty of obtaining them and the sensitive nature of the information involved, leading to high costs, scarcity, and potential biases. To address these limitations, synthetic data generation provides a viable solution, enabling controlled and ethical research in human behavior. The main objective was to develop a framework capable of simulating environments and generating data procedurally for AI training, focusing on engagement monitoring and the identification of behavioral deviations in educational and corporate contexts. Among the specific objectives, the creation of an application persona, Visage Track, stands out, designed to calculate engagement and behavioral deviations, thereby validating the developed techniques. The proposed approach integrates procedural generation methods into a complex statistical simulation, allowing for customizable and controlled scenarios without the need for direct interaction with real subjects. The framework and Visage Track were constructed to facilitate experiments in simulated environments. In the evaluation, the prototype demonstrated high accuracy in presence detection (up to 98%), consistent behavioral simulations across multiple scenarios, and the ability to generate scalable, multimodal datasets. These results indicate the framework’s potential to support research in multimedia content generation, multimodal interaction, and affective computing, offering an adaptable and ethical infrastructure for studies involving human behavior in controlled environments.

Referências

Ahmed Abouelazm, Mohammad Mahmoud, Conrad Walter, Oleksandr Shchetsura, Erne Hussong, Helen Gremmelmaier, and J. Marius Zöllner. 2025. Bridging Simulation and Usability: A User-Friendly Framework for Scenario Generation in CARLA. arXiv:2507.19883 [cs.RO] [link]

In-Chang Baek, Tae-Hwa Park, Jin-Ha Noh, Cheong-Mok Bae, and Kyung-Joong Kim. 2024. ChatPCG: Large Language Model-Driven Reward Design for Procedural Content Generation. arXiv preprint (2024). Available at [link].

L. Brigato and L. Iocchi. 2020. A Close Look at Deep Learning with Small Data. arXiv:2003.12843 [cs.LG] Available at [link].

Felipe Zago Canal. 2024. Método para reconhecimento em tempo real de expressões faciais em grupos utilizando redes neurais convolucionais. Dissertação (Mestrado). Universidade Federal de Santa Catarina, Araranguá, SC. Advisor(s) Pozzebon, Eliane. Available at [link].

Che-Jui Chang, Danrui Li, Deep Patel, Parth Goel, Honglu Zhou, Seonghyeon Moon, Samuel S. Sohn, Sejong Yoon, Vladimir Pavlovic, and Mubbasir Kapadia. 2024. M3Act: Learning from Synthetic Human Group Activities. arXiv:2306.16772 [cs.CV] [link]

Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, and Roozbeh Mottaghi. 2022. ProcTHOR: Large-Scale Embodied AI Using Procedural Generation. arXiv preprint (2022). Available at [link].

TensorFlow Developers. 2022. TensorFlow. Zenodo (2022). Available at [link].

M. Ali Akber DEWAN, Mahbub MURSHED, and Fuhua LIN. 2019. Engagement detection in online learning: a review. Smart Learning Environments 6, 1 (jan 2019), 1. DOI: 10.1186/s40561-018-0080-z

Ke Dong, Chengjie Zhou, Yihan Ruan, and Yuzhi Li. 2020. Mobile-NetV2 model for image classification. In 2020 2nd International Conference on Information Technology and Computer Application (ITCA). IEEE, 476–480. DOI: 10.1109/ITCA52113.2020.00106.

Alan AA Donovan and Brian W Kernighan. 2015. The Go programming language. Addison-Wesley Professional.

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An Open Urban Driving Simulator. Available at [link].

Zeyad Emam, Andrew Kondrich, Sasha Harrison, Felix Lau, Yushi Wang, Aerin Kim, and Elliot Branson. 2021. On The State of Data In Computer Vision: Human Annotations Remain Indispensable for Developing Deep Learning Models. arXiv:2108.00114 [cs.CV] Available at [link].

Sophie Feinauer, Lea Schuller, Irene Groh, Lynn Huestegge, and Tibor Petzoldt. 2022. O potencial da gamificação para a educação do usuário em automação de direção parcial e condicional: um estudo com simulador de direção. Transportation Research Part F: Traffic Psychology and Behaviour 90 (2022), 252–268. DOI: 10.1016/j.trf.2022.08.009

Martin Fowler. 2002. Patterns of Enterprise Application Architecture. Addison-Wesley Professional, Boston, MA. ISBN: 978-0321127426.

Jáder Gonçalves. 2025. VisageTrack Project. GitHub. [link] Acesso em: 8 out. 2025.

Miguel Grinberg. 2018. Flask web development. "O’Reilly Media, Inc.".

Pierre Gutierrez, Maria Luschkova, Antoine Cordier, Mustafa Shukor, Mona Schappert, and Tim Dahmen. 2021. Synthetic training data generation for deep learning based quality inspection. In Fifteenth International Conference on Quality Control by Artificial Vision, Takashi Komuro and Tsuyoshi Shimizu (Eds.), Vol. 11794. International Society for Optics and Photonics, SPIE, 1179403. DOI: 10.1117/12.2586824.

Aravinda Jatavallabha. 2024. O piloto automático da Tesla: ética e tragédia. arXiv preprint arXiv:2409.17380. [link] Licença: CC BY 4.0.

Yufei Jia, Guangyu Wang, Yuhang Dong, Junzhe Wu, Yupei Zeng, Haonan Lin, ZifanWang, Haizhou Ge,Weibin Gu, Kairui Ding, Zike Yan, Yunjie Cheng, Yue Li, Ziming Wang, Chuxuan Li, Wei Sui, Lu Shi, Guanzhong Tian, Ruqi Huang, and Guyue Zhou. 2025. DISCOVERSE: Efficient Robot Simulation in Complex High-Fidelity Environments. arXiv:2507.21981 [cs.RO] [link]

Arthur Juliani, Vincent-Pierre Berges, Ervin Teng, Andrew Cohen, Jonathan Harper, Chris Elion, Chris Goy, Yuan Gao, Hunter Henry, Marwan Mattar, and Danny Lange. 2020. Unity: A General Platform for Intelligent Agents. arXiv preprint (2020). Available at [link].

Victor A Kich, Jair A Bottega, Raul Steinmetz, Ricardo B Grando, Ayanori Yorozu, and Akihisa Ohya. 2024. Advancing Behavior Generation in Mobile Robotics through High-Fidelity Procedural Simulations. arXiv preprint (2024). Available at [link].

Donald Ervin KNUTH. 1997. The art of computer programming (3 ed.). Vol. 2. Addison-Wesley, Boston. Seção 3.2.1: The Linear Congruential Method.

Quanyi Li, Zhenghao Peng, Lan Feng, Qihang Zhang, Zhenghai Xue, and Bolei Zhou. 2022. MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning. arXiv:2109.12674 [cs.LG] [link]

Quanyi Li, Zhenghao Peng, Qihang Zhang, Cong Qiu, Chunxiao Liu, and Bolei Zhou. 2020. Improving the Generalization of End-to-End Driving through Procedural Generation. arXiv preprint (2020). Available at [link].

Paulo Victor Borges Oliveira Lima, Djefferson Maranhão, and Carlos De Salles Soares. 2023. Automatic Emotion Detection in the Learning of Algorithms. In Anais do XXIX Simpósio Brasileiro de Sistemas Multimídia e Web (WebMedia 2023). Sociedade Brasileira de Computação, Ribeirão Preto, SP, 56–64.

Viktor Makoviychuk, LukaszWawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. 2021. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning. arXiv:2108.10470 [cs.RO] [link]

Wes McKinney. 2018. Python para análise de dados: Tratamento de dados com Pandas, NumPy e IPython. Novatec Editora.

Nilo Ney Coutinho Menezes. 2021. Introdução à programação com Python (3ª ed.). Novatec Editora.

David Mulero-Pérez, Manuel Benavent-Lledo, Jose Garcia-Rodriguez, and Markus Vincze. 2025. Unrealgensyn: a framework for generating synthetic videos of Unfrequent human events. Virtual Reality 29, 2 (2025), 76. DOI: 10.1007/s10055-025-01146-9

Andreas C Müller and Sarah Guido. 2016. Introduction to machine learning with Python: a guide for data scientists. "O’Reilly Media, Inc.".

Alex Okita. 2019. Learning C# programming with Unity 3D. AK Peters/CRC Press.

Paula T. Palomino, Lennart Nacke, and Seiji Isotani. 2023. Gamificação Narrativa para Engajamento e Personalização: Redefinindo a Experiência do Aprendizado Digital. In Anais do XXIX Simpósio Brasileiro de Sistemas Multimídia eWeb (WebMedia 2023) – Concurso de Teses e Dissertações. Sociedade Brasileira de Computação, Ribeirão Preto, SP, 27–30. DOI: 10.5753/webmedia_estendido.2023.233396

Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442 [cs.HC] [link]

Yang Peng, Ming Zhang, Fangqiang Yu, Jinglin Xu, and Shang Gao. 2020. Digital twin hospital buildings: an exemplary case study through continuous lifecycle integration. Advances in Civil Engineering 2020, 1 (2020), 8846667. DOI: 10.1155/2020/8846667.

Aravind Sasidharan Pillai. 2022. Student Engagement Detection in Classrooms through Computer Vision and Deep Learning: A Novel Approach Using YOLOv4. Sage Science Review of Educational Technology 5, 1 (2022), 87–97. Available at [link] index.php/ssret/article/view/144.

Ana Romero, Pedro Carvalho, Luís Côrte-Real, and Américo Pereira. 2023. Synthesizing Human Activity for Data Generation. Journal of Imaging 9, 10 (2023). DOI: 10.3390/jimaging9100204

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2019. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv:1801.04381 [cs.CV] [link]

Ole Schmedemann, Melvin Baaß, Daniel Schoepflin, and Thorsten Schüppstuhl. 2022. Procedural synthetic training data generation for AI-based defect detection in industrial surface inspection. Procedia CIRP 107 (2022), 1101–1106. Leading manufacturing systems transformation – Proceedings of the 55th CIRP Conference on Manufacturing Systems 2022. DOI: 10.1016/j.procir.2022.05.115.

Maxime Sermesant, Hervé Delingette, Hubert Cochet, Pierre Jaïs, and Nicholas Ayache. 2021. Applications of artificial intelligence in cardiovascular imaging. Nature Reviews Cardiology 18, 8 (2021), 600–609. DOI: 10.1038/s41569-021-00544-y.

Noor Shaker, Julian Togelius, and Mark J. Nelson. 2016. Procedural Content Generation in Games. Springer. DOI: 10.1007/978-3-319-42716-4.

Daniel T. Speckhard, Tim Bechtel, Luca M. Ghiringhelli, Martin Kuban, Santiago Rigamonti, and Claudia Draxl. 2024. How big is Big Data? arXiv:2405.11404 [stat.ML] Available at [link].

Vajira Thambawita, Pegah Salehi, Sajad Amouei Sheshkal, Steven A. Hicks, Hugo L. Hammer, Sravanthi Parasa, Thomas de Lange, Pål Halvorsen, and Michael A. Riegler. 2022. SinGAN-Seg: Synthetic training data generation for medical image segmentation. PLOS ONE 17, 5 (05 2022), 1–24. DOI: 10.1371/journal.pone.0267976

Kaitlyn Tracy, Lazaros Rafail Kouzelis, Rami Dari, and Ourania Spantidi. 2025. Planogen: A Procedural Generation Framework for Dynamic VR Research Environments. Virtual Worlds 4, 3 (2025). DOI: 10.3390/virtualworlds4030033

Pablo Werlang and Patrícia A. Jaques. 2023. Detecção por face de emoções de aprendizagem: abordagem baseada em redes neurais profundas e fluxo de emoções. Revista Brasileira de Informática na Educação – RBIE 31 (2023), 174–204. DOI: 10.5753/rbie.2023.2936. Available at [link].

Jacob Whitehill, Zewelanji Serpell, Yi-Ching Lin, Aysha Foster, and Javier R. Movellan. 2014. The Faces of Engagement: Automatic Recognition of Student Engagementfrom Facial Expressions. IEEE Transactions on Affective Computing 5, 1 (2014), 86–98. DOI: 10.1109/TAFFC.2014.2316163

Eric Windmill. 2020. Flutter in Action. Simon and Schuster.

Linqi Ye, Rankun Li, Xiaowen Hu, Jiayi Li, Boyang Xing, Yan Peng, and Bin Liang. 2025. Unity RL Playground: A Versatile Reinforcement Learning Framework for Mobile Robots. arXiv:2503.05146 [cs.RO] [link]

Mark Zhao, Niket Agarwal, Aarti Basant, Buğra Gedik, Satadru Pan, Mustafa Ozdal, Rakesh Komuravelli, Jerry Pan, Tianshu Bao, Haowei Lu, Sundaram Narayanan, Jack Langman, Kevin Wilfong, Harsha Rastogi, Carole-Jean Wu, Christos Kozyrakis, and Parik Pol. 2022. Understanding data storage and ingestion for large-scale deep recommendation model training: industrial product. In Proceedings of the 49th Annual International Symposium on Computer Architecture. ACM, 1042–1057. DOI: 10.1145/3470496.3533044.
Publicado
10/11/2025
GONÇALVES, Jáder Louis de S.; AMORIM, Wyllgner F.; SALES, Nicolas F. C.; CUNHA, Lucas Marques da. Framework Procedural para Geração de Dados Sintéticos Baseados em Simulação Estatística: Uma Abordagem para Análise de Comportamento Humano. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 31. , 2025, Rio de Janeiro/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 275-283. DOI: https://doi.org/10.5753/webmedia.2025.16041.