A Deep Learning Approach to Detect Pornography Videos in Educational Repositories
Resumo
A large number of videos are uploaded on educational platforms every minute. Those platforms are responsible for any sensitive media uploaded by their users. An automated detection system to identify pornographic content could assist human workers by pre-selecting suspicious videos. In this paper, we propose a multimodal approach to adult content detection. We use two Deep Convolutional Neural Networks to extract high-level features from both image and audio sources of a video. Then, we concatenate those features and evaluate the performance of classifiers on a set of mixed educational and pornographic videos. We achieve an F1-score of 95.67% on the educational and adult videos set and an F1-score of 94% on our test subset for the pornographic class.Referências
Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., and Vi-jayanarasimhan, S. (2016). Youtube-8m: A large-scale video classification benchmark.arXiv preprint arXiv:1609.08675.
Cortes, C. and Vapnik, V. (1995). Support-vector networks.Machine learning, 20(3):273–297.
Freitas, P. V. A. d., dos Santos, G. N. P., Busson, A. J. G., Álan L. V. Guedes, and Colcher, S. (2019). A baseline for nsfw video detection in e-learning environments. In Anais Principais do XXV Simpósio Brasileiro de Multimídia e Web, pages 357–360, Porto Alegre, RS, Brasil. SBC.
Haykin, S. S. et al. (2009).Neural networks and learning machines/Simon Haykin. New York: Prentice Hall.
Hershey, S., Chaudhuri, S., Ellis, D. P., Gemmeke, J. F., Jansen, A., Moore, R. C., Plakal,M., Platt, D., Saurous, R. A., Seybold, B., et al. (2017). Cnn architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp), pages 131–135. IEEE.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8).
Lee, J., Reade, W., Sukthankar, R., Toderici, G., et al. (2018). The 2nd youtube-8m large-scale video understanding challenge. In Proceedings of the European Conference on Computer Vision (ECCV), pages 0–0.
Liu, Y., Gu, X., Huang, L., Ouyang, J., Liao, M., and Wu, L. (2020). Analyzing periodicity and saliency for adult video detection. Multimedia Tools and Applications, 79(7): 4729–4745.
Moreira, D., Avila, S., Perez, M., Moraes, D., Testoni, V., Valle, E., Goldenstein, S., and Rocha, A. (2016). Pornography classification: The hidden clues in video space–time. Forensic science international, 268:46–61.
Moreira, D., Avila, S., Perez, M., Moraes, D., Testoni, V., Valle, E., Goldenstein, S., and Rocha, A. (2019). Multimodal data fusion for sensitive scene localization. Information Fusion, 45:307–323.
Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4(2):1883.
Singh, S., Kaushal, R., Buduru, A. B., and Kumaraguru, P. (2019). KidsGUARD: Fine Grained Approach for Child Unsafe Video Representation and Detection. In Proceedings of the 34th Annual ACM Symposium on Applied Computing.
Song, K. and Kim, Y.-S. (2020). An enhanced multimodal stacking scheme for online pornographic content detection. Applied Sciences, 10(8):2943.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., and Liu, C. (2018). A survey on deep transfer learning. In International Conference on Artificial Neural Networks, pages 270–279. Springer.
Wold, S., Esbensen, K., and Geladi, P. (1987). Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3):37–52.
Cortes, C. and Vapnik, V. (1995). Support-vector networks.Machine learning, 20(3):273–297.
Freitas, P. V. A. d., dos Santos, G. N. P., Busson, A. J. G., Álan L. V. Guedes, and Colcher, S. (2019). A baseline for nsfw video detection in e-learning environments. In Anais Principais do XXV Simpósio Brasileiro de Multimídia e Web, pages 357–360, Porto Alegre, RS, Brasil. SBC.
Haykin, S. S. et al. (2009).Neural networks and learning machines/Simon Haykin. New York: Prentice Hall.
Hershey, S., Chaudhuri, S., Ellis, D. P., Gemmeke, J. F., Jansen, A., Moore, R. C., Plakal,M., Platt, D., Saurous, R. A., Seybold, B., et al. (2017). Cnn architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp), pages 131–135. IEEE.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8).
Lee, J., Reade, W., Sukthankar, R., Toderici, G., et al. (2018). The 2nd youtube-8m large-scale video understanding challenge. In Proceedings of the European Conference on Computer Vision (ECCV), pages 0–0.
Liu, Y., Gu, X., Huang, L., Ouyang, J., Liao, M., and Wu, L. (2020). Analyzing periodicity and saliency for adult video detection. Multimedia Tools and Applications, 79(7): 4729–4745.
Moreira, D., Avila, S., Perez, M., Moraes, D., Testoni, V., Valle, E., Goldenstein, S., and Rocha, A. (2016). Pornography classification: The hidden clues in video space–time. Forensic science international, 268:46–61.
Moreira, D., Avila, S., Perez, M., Moraes, D., Testoni, V., Valle, E., Goldenstein, S., and Rocha, A. (2019). Multimodal data fusion for sensitive scene localization. Information Fusion, 45:307–323.
Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4(2):1883.
Singh, S., Kaushal, R., Buduru, A. B., and Kumaraguru, P. (2019). KidsGUARD: Fine Grained Approach for Child Unsafe Video Representation and Detection. In Proceedings of the 34th Annual ACM Symposium on Applied Computing.
Song, K. and Kim, Y.-S. (2020). An enhanced multimodal stacking scheme for online pornographic content detection. Applied Sciences, 10(8):2943.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., and Liu, C. (2018). A survey on deep transfer learning. In International Conference on Artificial Neural Networks, pages 270–279. Springer.
Wold, S., Esbensen, K., and Geladi, P. (1987). Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3):37–52.
Publicado
24/11/2020
Como Citar
FREITAS, Pedro V. A. de; BUSSON, Antonio J. G.; GUEDES, Álan L. V.; COLCHER, Sérgio.
A Deep Learning Approach to Detect Pornography Videos in Educational Repositories. In: SIMPÓSIO BRASILEIRO DE INFORMÁTICA NA EDUCAÇÃO (SBIE), 31. , 2020, Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2020
.
p. 1253-1262.
DOI: https://doi.org/10.5753/cbie.sbie.2020.1253.