An Ensemble Approach to Facial Deepfake Detection Using Self-Supervised Features

Yan Martins B. Gurevitz Cunha; José Matheus C. Boaro; Daniel de Sousa Moraes; Pedro Cutrim dos Santos; Polyana Bezerra da Costa; Antonio José Grandson Busson; Julio Cesar Duarte; Sérgio Colcher

doi:10.5753/webmedia.2024.243194

Yan Martins B. Gurevitz Cunha PUC-Rio
José Matheus C. Boaro PUC-Rio
Daniel de Sousa Moraes PUC-Rio
Pedro Cutrim dos Santos PUC-Rio
Polyana Bezerra da Costa PUC-Rio
Antonio José Grandson Busson BTG Pactual
Julio Cesar Duarte IME
Sérgio Colcher PUC-Rio

DOI: https://doi.org/10.5753/webmedia.2024.243194

Abstract

Substantial efforts have been dedicated to developing methods for detecting deepfake content, especially with the creation of large and diverse datasets with both higher image quality and demographic features. In this scenario, CNN-based approaches showed good initial success, later improved by their combination with Vision Transformers. More recently, Foundation Models (FMs) have emerged, improving performance across many visual tasks, including deepfake detection, and combining self-supervised features generated by FMs with CNN-based classifiers has resulted in significant performance gains. However, taking advantage of multiple maps of self-supervised features is not as straightforward as just adding more channels to the classifier. Therefore, this work explores ensemble techniques to effectively utilize these diverse self-supervised feature maps for realistic facial deepfake detection. Our experiments indicate that combining the output results of different classifiers, each one utilizing a single map of self-supervised features, leads to significant performance improvements, and several committee approaches consistently outperform individual classifiers, demonstrating the potential of these methods in enhancing deepfake detection accuracy.

Keywords: deep fake detection, self-supervised, vision transformers, deep learning, foundation models

References

Redha Ali, Russell C. Hardie, Barath Narayanan Narayanan, and Supun De Silva. 2019. Deep Learning Ensemble Methods for Skin Lesion Analysis towards Melanoma Detection. In 2019 IEEE National Aerospace and Electronics Conference (NAECON). IEEE, Dayton, OH, USA, 311–316. DOI: 10.1109/NAECON46414.2019.9058245

Roberto Amoroso, Davide Morelli, Marcella Cornia, Lorenzo Baraldi, Alberto Del Bimbo, and Rita Cucchiara. 2024. Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images. [link]

Ben Beaumont-Thomas. 2024. Taylor Swift deepfake pornography sparks renewed calls for US legislation. [link].

Nicolò Bonettini, Edoardo Daniele Cannas, Sara Mandelli, Luca Bondi, Paolo Bestagini, and Stefano Tubaro. 2021. Video Face Manipulation Detection Through Ensemble of CNNs. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, Milan, Italy, 5012–5019. DOI: 10.1109/ICPR48806.2021.9412711

Preeti Chaudhary, Aditya Verma, Vinay Kukreja, and Rishabh Sharma. 2024. Integrating Deep Learning and Ensemble Methods for Robust Tomato Disease Detection: A Hybrid CNN-RF Model Analysis. In 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO). IEEE, Noida, India, 1–4. DOI: 10.1109/ICRITO61523.2024.10522213

François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, HI, USA, 1251–1258. DOI: 10.1109/CVPR. 2017.195

Thipwimon Chompookham and OJIEL Surinta. 2021. Ensemble methods with deep convolutional neural networks for plant leaf recognition. ICIC Express Letters 15, 6 (2021), 553–565.

Davide Alessandro Coccomini, Nicola Messina, Claudio Gennaro, and Fabrizio Falchi. 2022. Combining EfficientNet and Vision Transformers for Video Deepfake Detection. In Image Analysis and Processing – ICIAP 2022, Stan Sclaroff, Cosimo Distante, Marco Leo, Giovanni M. Farinella, and Federico Tombari (Eds.). Springer International Publishing, Cham, 219–229. DOI: 10.1007/978-3-031-06433-3_19

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. [link]

Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. 2020. The DeepFake Detection Challenge Dataset. arXiv:2006.07397

Nikolaos Giatsoglou, Symeon Papadopoulos, and Ioannis Kompatsiaris. 2023. Investigation of ensemble methods for the detection of deepfake face manipulations. [link]

Bruno Rocha Gomes, Antonio J. G. Busson, José Boaro, and Sérgio Colcher. 2023. Realistic Facial Deep Fakes Detection Through Self-Supervised Features Generated by a Self-Distilled Vision Transformer. In Proceedings of the 29th Brazilian Symposium on Multimedia and the Web (WebMedia ’23). Association for Computing Machinery, New York, NY, USA, 177–183. DOI: 10.1145/3617023.3617047

Young-Jin Heo, Young-Ju Choi, Young-Woon Lee, and Byung-Gyu Kim. 2021. Deepfake detection scheme based on vision transformer and distillation. preprint 2104.01353 abs/2104.01353 (2021), 7 pages. DOI: 10.48550/2104.01353

Brittaney Kiefer. 2023. This Brand’s Social Experiment Uses AI to Expose the Dark Side of ’Sharenting’. [link].

Romeo Lanzino, Federico Fontana, Anxhelo Diko, Marco Raoul Marini, and Luigi Cinque. 2024. Faster Than Lies: Real-time Deepfake Detection using Binary Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. IEEE, Seattle, WA, USA, 3771–3780.

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Seattle, WA, USA, 3207–3216.

Sachin Mehta, Ezgi Mercan, Jamen Bartlett, Donald Weaver, Joann G. Elmore, and Linda Shapiro. 2018. Y-Net: Joint Segmentation and Classification for Diagnosis of Breast Biopsy Images. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II (Granada, Spain). Springer-Verlag, Berlin, Heidelberg, 893–901.

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin ElNouby, et al. 2024. Dinov2: Learning robust visual features without supervision. Transactions on Machine Learning Research Journal 1 (2024), 1–31. DOI: 10.48550/arxiv.2304.07193

Artem A Pokroy and Alexey D Egorov. 2021. EfficientNets for deepfake detection: Comparison of pretrained models. In 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus). IEEE, St. Petersburg, Moscow, Russia, 598–600. DOI: 10.1109/ElConRus51938.2021.9396092

Tal Reiss, Bar Cavia, and Yedid Hoshen. 2023. Detecting Deepfakes Without Seeing Any. ArXiv abs/2311.01458 (2023), 16 pages. [link]

Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Seoul, Korea (South), 1–11. DOI: 10.1109/ICCV.2019.00009

Rhianna Schmunk. 2024. Explicit fake images of Taylor Swift prove laws haven’t kept pace with tech, experts say. [link].

Laura Stroebel, Mark Llewellyn, Tricia Hartley, Tsui Shan Ip, and Mohiuddin Ahmed. 2023. A systematic literature review on the effectiveness of deepfake detection techniques. Journal of Cyber Security Technology 7, 2 (2023), 83–113. DOI: 10.1080/23742917.2023.2192888

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence. AAAI Press, San Francisco, California, USA, 4278–4284. DOI: 10.48550/arXiv.1602.07261

Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, Long Beach, CA, USA, 6105–6114. [link]

Eric Tjon, Melody Moh, and Teng-Sheng Moh. 2021. Eff-YNet: A Dual Task Network for DeepFake Detection and Segmentation. In 2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM). IEEE, Seoul, Korea (South), 1–8. DOI: 10.1109/IMCOM51814.2021.9377373

Loc Trinh and Yan Liu. 2021. An Examination of Fairness of AI Models for Deepfake Detection. arXiv:2105.00558 [cs.CV]

Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Yu-Gang Jiang, and Ser-Nam Li. 2022. M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection. In Proceedings of the 2022 International Conference on Multimedia Retrieval (Newark, NJ, USA) (ICMR ’22). Association for Computing Machinery, New York, NY, USA, 615–623. DOI: 10.1145/3512527.3531415

Ying Xu, Philipp Terhörst, Kiran Raja, and Marius Pedersen. 2023. A Comprehensive Analysis of AI Biases in DeepFake Detection With Massively Annotated Databases. arXiv:2208.05845 [cs.CV]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters 23, 10 (2016), 1499–1503. DOI: 10.1109/LSP.2016.2603342

Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Weiming Zhang, and Nenghai Yu. 2022. Self-supervised Transformer for Deepfake Detection. arXiv:2203.01265 [cs.CV] [link]

An Ensemble Approach to Facial Deepfake Detection Using Self-Supervised Features

Abstract

References

Most read articles by the same author(s)