Avaliação de Modelos de Redes Neurais Recorrentes para Anonimização de Textos em Português
Resumo
Currently, there are several approaches to provide anonymity on the Internet. However, one can still identify anonymous users through their writing style. With the advances in neural network and natural language processing research, the success of a classifier when accurately identify the author of a text is growing. On the other hand, new approaches that use recurrent neural networks for automatic generation of obfuscated texts have also arisen to fight anonymity adversaries. In this work, we evaluate two approaches that use neural networks to generate obfuscated texts. In our experiments, we compared the efficiency of both techniques when removing the stylistic attributes of a text and preserving its original semantics. Our results show a trade-off between the obfuscation level and the text semantics.Referências
Bagnall, D. (2015). Author identication using multi-headed recurrent neural networks. arXiv preprint arXiv:1506.04891.
Banerjee, S. and Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments.
Emmery, C., Manjavacas, E., and Chrupaa, G. (2018). Style Obfuscation by Invariance. In COLING 2018, pages 984–996.
Ganin, Y. and Lempitsky, V. (2014). Unsupervised domain adaptation by backpropaga- tion. arXiv preprint arXiv:1409.7495.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning. MIT press.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):1735–1780.
Mihaylova, T., Karadjov, G., Kiprov, Y., Georgiev, G., Koychev, I., and Nakov, P. (2016).
SU@ PAN'2016: Author Obfuscation. In CLEF (Working Notes), pages 956–969.
Narayanan, A., Paskov, H., Gong, N. Z., Bethencourt, J., Stefanov, E., Shin, E. C. R., and Song, D. (2012). On the feasibility of internet-scale author identication. In 2012 IEEE Symposium on Security and Privacy, pages 300–314. IEEE.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318.
Potthast, M., Hagen, M., and Stein, B. (2016). Author obfuscation: Attacking the state of the art in authorship verication. In CLEF (Working Notes), pages 716–749.
Shetty, R., Schiele, B., and Fritz, M. (2018). A4NT: Author Attribute Anonymity by Adversarial Training of Neural Machine Translation. In 27th USENIX Security Symposium (USENIX Security 18), pages 1633–1650, Baltimore, MD. USENIX Association.
Stamatatos, E., Rangel-Pardo, F. M., Tschuggnall, M., Stein, B., Kestemont, M., Rosso, P., and Potthast, M. (2018). Overview of PAN 2018. Author identication, author proling, and author obfuscation. Lecture Notes in Computer Science, 11018:267–285.
Varela, P., Justino, E., and Oliveira, L. S. (2011). Selecting syntactic attributes for authorship attribution. In IJCNN, pages 167–172. IEEE.
Banerjee, S. and Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments.
Emmery, C., Manjavacas, E., and Chrupaa, G. (2018). Style Obfuscation by Invariance. In COLING 2018, pages 984–996.
Ganin, Y. and Lempitsky, V. (2014). Unsupervised domain adaptation by backpropaga- tion. arXiv preprint arXiv:1409.7495.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning. MIT press.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):1735–1780.
Mihaylova, T., Karadjov, G., Kiprov, Y., Georgiev, G., Koychev, I., and Nakov, P. (2016).
SU@ PAN'2016: Author Obfuscation. In CLEF (Working Notes), pages 956–969.
Narayanan, A., Paskov, H., Gong, N. Z., Bethencourt, J., Stefanov, E., Shin, E. C. R., and Song, D. (2012). On the feasibility of internet-scale author identication. In 2012 IEEE Symposium on Security and Privacy, pages 300–314. IEEE.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318.
Potthast, M., Hagen, M., and Stein, B. (2016). Author obfuscation: Attacking the state of the art in authorship verication. In CLEF (Working Notes), pages 716–749.
Shetty, R., Schiele, B., and Fritz, M. (2018). A4NT: Author Attribute Anonymity by Adversarial Training of Neural Machine Translation. In 27th USENIX Security Symposium (USENIX Security 18), pages 1633–1650, Baltimore, MD. USENIX Association.
Stamatatos, E., Rangel-Pardo, F. M., Tschuggnall, M., Stein, B., Kestemont, M., Rosso, P., and Potthast, M. (2018). Overview of PAN 2018. Author identication, author proling, and author obfuscation. Lecture Notes in Computer Science, 11018:267–285.
Varela, P., Justino, E., and Oliveira, L. S. (2011). Selecting syntactic attributes for authorship attribution. In IJCNN, pages 167–172. IEEE.
Publicado
02/09/2019
Como Citar
FRANCO, Antônio; OLIVEIRA, Leonardo.
Avaliação de Modelos de Redes Neurais Recorrentes para Anonimização de Textos em Português. In: SIMPÓSIO BRASILEIRO DE SEGURANÇA DA INFORMAÇÃO E DE SISTEMAS COMPUTACIONAIS (SBSEG), 19. , 2019, São Paulo.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2019
.
p. 421-426.
DOI: https://doi.org/10.5753/sbseg.2019.13992.