Proxy Tasks Ensemble for Explainable Inference in Sensitive Data

  • Clara Ernesto USP
  • Sandra Avila UNICAMP
  • Carlos Caetano UNICAMP
  • Leo S. F. Ribeiro USP

Resumo


Child sexual abuse imagery (CSAI) classification inherits several challenges by its nature. Due to the limited access to training data and the highly sensitive nature of the images in question, most solutions are not reproducible, not distributable, not explainable, and vulnerable to attacks. In this context, we propose the Ensemble for Inference in Sensitive data using Proxy Tasks (EISP) framework to realize explainable inference in sensitive data, without having to train directly in target images, by using an ensemble of Proxy Tasks related to CSAI detection, such as nudity detection, age estimation, and perceived gender classification. If the EISP system is provided with Proxy Tasks that correlate to the target task, it can be optimized with feature combination, realize predictions about the given task and provide explainability both by creating data visualizations, and extracting SHAP feature importance through each prediction. In this Work In Progress paper, we test the framework and explain its inner works using a public non-CSAI dataset.

Referências

E. Bursztein, E. Clarke et al., “Rethinking the detection of child sexual abuse imagery on the internet,” in The World Wide Web Conference, 2019.

J. Barth, L. Bermetz et al., “The current prevalence of child sexual abuse worldwide: A systematic review and meta-analysis,” International Journal of Public Health, 2013.

N. Pereda, G. Guilera et al., “The prevalence of child sexual abuse in community and student samples: A meta-analysis,” Clinical Psychology Review, 2009.

M. Stoltenborgh, M. H. van Ijzendoorn et al., “A global perspective on child sexual abuse: Meta-analysis of prevalence around the world,” Child Maltreatment, 2011.

Y. Zhang, R. Jia et al., “The secret revealer: Generative model-inversion attacks against deep neural networks,” 2020. [Online]. Available: [link]

T. Coelho, L. S. F. Ribeiro et al., “Minimizing risk through minimizing model-data interaction: A protocol for relying on proxy tasks when designing child sexual abuse imagery detection models,” in ACM Conference on Fairness, Accountability, and Transparency, 2025, p. 1543–1553. [Online]. DOI: 10.1145/3715275.3732102

T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, p. 785–794. [Online]. DOI: 10.1145/2939672.2939785

S. Z. Hassan, K. Ahmad et al., “Visual sentiment analysis from disaster images in social media,” 2020.

B. Westlake, M. Bouchard, and R. Frank, “Comparing methods for detecting child exploitation content online,” in European Intelligence and Security Informatics Conference, 2012.

Microsoft, “Tackling child sexual abuse strategy.” 2009.

J. Sivic and A. Zisserman, “Video google: A text retrieval approach to object matching in videos,” in International Conference on Computer Vision, 2003, pp. 1470–1477.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT press, 2016.

R. Rothe, R. Timofte, and L. V. Gool, “DEX: Deep EXpectation of Apparent Age from a Single Image,” in 2015 IEEE International Conference on Computer Vision Workshop (ICCVW). Santiago, Chile: IEEE, Dec. 2015, pp. 252–257.

M. Polastro and P. Eleuterio, “Nudetective: A forensic tool to help combat child pornography through automatic nudity detection,” in Workshops on Database and Expert Systems Applications, 2010.

——, “A statistical approach for identifying videos of child pornography at crime scenes,” in International Conference on Availability, Reliability and Security, 2012.

B. Zhou, A. Lapedriza et al., “Places: A 10 million image database for scene recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

C. Laranjeira, J. Macedo et al., “Seeing without looking: Analysis pipeline for child sexual abuse datasets,” 2022. [Online]. Available: [link]

J. C. S. Reis, A. Correia et al., “Explainable machine learning for fake news detection,” in ACM Conference on Web Science, 2019, p. 17–26. [Online]. DOI: 10.1145/3292522.3326027

S. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” 2017. [Online]. Available: [link]

I. de Paz Centeno, “ipazc/mtcnn: v1.0.0,” Oct. 2024. [Online]. DOI: 10.5281/zenodo.13901379

J. Macedo, F. Costa, and J. A. dos Santos, “A Benchmark Methodology for Child Pornography Detection,” in Conference on Graphics, Patterns and Images (SIBGRAPI). IEEE, 2018, pp. 455–462.

M. Merler, N. Ratha et al., “Diversity in faces,” 2019. [Online]. Available: [link]

N. M. Kinyanjui, T. Odonga et al., “Fairness of classifiers across skin tones in dermatology,” in Medical Image Computing and Computer Assisted Intervention, 2020, p. 320–329. [Online]. DOI: 10.1007/978-3-030-59725-2_31

G. Jocher, J. Qiu, and A. Chaurasia, “Ultralytics YOLO,” jan 2023. [Online]. Available: [link]

AdamCodd, “Vit base nsfw detector,” [link], accessed: 2025-08-10.

L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008.

L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,” 2020. [Online]. Available: [link]

K. Lottick, S. Susai et al., “Energy usage reports: Environmental awareness as part of algorithmic accountability,” 2019. [Online]. Available: [link]

A. Pournaras, N. Gkalelis et al., “Combining multiple deep-learning-based image features for visual sentiment analysis,” Dec. 2021. [Online]. DOI: 10.5281/zenodo.6655366

K. Ahmad, M. A. Ayub et al., “Deep models for visual sentiment analysis of disaster-related multimedia content,” 2021. [Online]. Available: [link]
Publicado
30/09/2025
ERNESTO, Clara; AVILA, Sandra; CAETANO, Carlos; RIBEIRO, Leo S. F.. Proxy Tasks Ensemble for Explainable Inference in Sensitive Data. In: WORKSHOP DE TRABALHOS EM ANDAMENTO - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 38. , 2025, Salvador/BA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 194-199.

Artigos mais lidos do(s) mesmo(s) autor(es)