Learning When to Simplify Sentences for Natural Text Simplification

  • Caroline Gasperin USP
  • Lucia Specia USP
  • Tiago F. Pereira USP
  • Sandra M. Aluisio USP

Resumo


This paper introduces a corpus-based approach for selecting sentences that require simplification in the context of Brazilian Portuguese text simplification system. Based on a parallel corpus of original and simplified text versions, we apply a binary classifier to decide in which circumstances a sentence should or not be split – which is the most important syntactic simplification operation – so that the resulting simplified text is natural and not over simplified. Our classifier reaches 73.5% precision and 73.4% recall when selecting the sentences to be split or kept together.

Referências

Bick, E. (2000). The Parsing System “Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. PhD thesis, Aarhus University.

Caseli, H. M., Pereira, T. F., Specia, L., Pardo, T. A. S., Gasperin, C., and Aluísio, S. M. (2009). “Building a brazilian portuguese parallel corpus of original and simplified texts”. In Proceedings of CICLing 2009.

Chandrasekar, R., Doran, C., and Srinivas, B. (1996). “Motivations and methods for text simplification”. In Proceedings of the Sixteenth International Conference on Computational Linguistics (COLING 1996), pages 1041–1044.

Chandrasekar, R. and Srinivas, B. (1997). “Automatic induction of rules for text simplification”. Knowledge-Based Systems, 10(3):183–190.

Devlin, S. and Unthank, G. (2006). “Helping aphasic people process online information”. In Proceedings of the 8th international ACM SIGACCESS Conference on Computers and Accessibility, pages 225–226, Portland, USA.

Inui, K., Fujita, A., Takahashi, T., Iida, R., and Iwakura, T. (2003). “Text simplification for reading assistance”. In Proceedings of the Second International Workshop on Paraphrasing: Paraphrase Acquisition and Applications, pages 9–16.

Jr., A. C., Maziero, E., Gasperin, C., Pardo, T. A. S., Specia, L., and Aluisio, S. M. (2009). “Supporting the adaptation of texts for poor literacy readers: a text simplification editor for brazilian portuguese”. In Proceedings of Workshop of Innovative Use of NLP for Building Educational Applications at NAACL 2009.

Klebanov, B. B., Knight, K., and Marcu, D. (2004). “Text simplification for information-seeking applications”. In On the Move to Meaningful Internet Systems. Lecture Notes in Computer Science, volume 3290, pages 735–747. Springer-Verlag.

Max, A. (2006). “Writing for language-impaired readers”. In Proceedings of the 7th International Conference on Intelligent Text Processing and Computational Linguistics, pages 567–570, Mexico City. Springer-Verlag.

Pardo, T. A. S. and Nunes, M. V. (2006). “Review and evaluation of DiZer - an automatic discourse analyzer for brazilian portuguese”. In Proceedings of PROPOR 2006. Lecture Notes in Computer Science, volume 3960, pages 180–189. Springer-Verlag.

Petersen, S. E. and Ostendorf, M. (2007). “Text simplification for language learners: A corpus analysis”. In Proceedings of the Speech and Language Technology for Education Workshop (SLaTE-2007), pages 69–72, Pennsylvania, USA.

Siddharthan, A. (2003). Syntactic Simplification and Text Cohesion. PhD thesis, University of Cambridge.

Specia, L., Aluísio, S. M., and Pardo, T. A. S. (2008). Manual de simplificação sintática para o português. Technical Report NILC-TR-08-06, NILC.

Williams, S. (2004). Natural Language Generation of discourse relations for different reading levels. PhD thesis, University of Aberdeen.

Williams, S. and Reiter, E. (2005). “Generating readable texts for readers with low basic skills”. In Proceedings of ENLG 2005, pages 140–147.

Witten, I. H. and Frank, E. (2005). Data Mining: Practical machine learning tools and techniques, 2nd Edition. Morgan Kaufmann, San Francisco.
Publicado
20/07/2009
GASPERIN, Caroline; SPECIA, Lucia; PEREIRA, Tiago F.; ALUISIO, Sandra M.. Learning When to Simplify Sentences for Natural Text Simplification. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 7. , 2009, Bento Gonçalves/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2009 . p. 182-191. ISSN 2763-9061.