Natural language processing for social inclusion: a text simplification architecture for different literacy levels
Resumo
A Simplificação Textual é uma área de pesquisa do Processamento de Língua Natural cujo objetivo é maximizar a compreensão de textos escritos via simplificação de sua estrutura linguística. Este artigo apresenta nossa abordagem para simplificação de textos em português do Brasil. Como as pessoas possuem níveis diferentes de letramento, levamos isso em consideração na geração de textos simplificados. Propomos uma arquitetura para simplificação de textos composta de dois níveis: o primeiro é um sistema baseado em aprendizado de máquina que aprende a partir de textos simplificados manualmente o nível apropriado de simplificação de acordo com um dado nível de letramento; e o segundo é um sistema baseado em regras que executa a simplificação propriamente dita das sentenças, seguindo recomendações vindas do primeiro nível.Referências
Aluísio, S. M., Specia, L., Pardo, T. A. S., Maziero, E., Caseli, H. M., and Fortes, R. (2008). A corpus analysis of simple account texts and the proposal of simplification strategies: First steps towards text simplification systems. In Proceedings of the 26th ACM Symposium on Design of Communication (SIGDOC 2008), pages 15–22.
Bick, E. (2000). The Parsing System ”Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. PhD thesis, Aarhus University.
Caseli, H. M. (2007). Indução de léxicos bilíngües e regras para a tradução automática. PhD thesis, Universidade de São Paulo.
Caseli, H. M., Pereira, T. F., Specia, L., Pardo, T. A. S., Gasperin, C., and Aluísio, S. M. (2009). Building a brazilian portuguese parallel corpus of original and simplified texts. Research in Computing Science. Advances in Computational Linguistics: 10th Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2009), 41:59–70.
Chandrasekar, R., Doran, C., and Srinivas, B. (1996). Motivations and methods for text simplification. In Proceedings of the Sixteenth International Conference on Computational Linguistics (COLING 1996), pages 1041–1044.
Chandrasekar, R. and Srinivas, B. (1997). Automatic induction of rules for text simplification. Knowledge-Based Systems, 10(3):183–190.
Devlin, S. and Unthank, G. (2006). Helping aphasic people process online information. In Proceedings of the 8th international ACM SIGACCESS Conference on Computers and Accessibility, pages 225–226, Portland, USA.
Graesser, A. C., McNamara, D. S., Louwerse, M. M., and Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers, 36:193–202.
IBGE (2006). Síntese dos indicadores sociais 2006. [link].
INAF (2007). 5 anos - um balanço dos resultados de 2001 a 2005. [link].
Inui, K., Fujita, A., Takahashi, T., Iida, R., and Iwakura, T. (2003). Text simplification for reading assistance. In Proceedings of the Second International Workshop on Paraphrasing: Paraphrase Acquisition and Applications, pages 9–16.
Klebanov, B. B., Knight, K., and Marcu, D. (2004). Text simplification for information-seeking applications. In On the Move to Meaningful Internet Systems. Lecture Notes in Computer Science, volume 3290, pages 735–747. Springer-Verlag.
Max, A. (2006). Writing for language-impaired readers. In Proceedings of the 7th International Conference on Intelligent Text Processing and Computational Linguistics, pages 567–570, Mexico City. Springer-Verlag.
Pardo, T. A. S. and Nunes, M. V. (2006). Review and evaluation of DiZer - an automatic discourse analyzer for brazilian portuguese. In Proceedings of PROPOR 2006. Lecture Notes in Computer Science, volume 3960, pages 180–189. Springer-Verlag.
Petersen, S. E. and Ostendorf, M. (2007). Text simplification for language learners: A corpus analysis. In Proceedings of the Speech and Language Technology for Education Workshop (SLaTE-2007), pages 69–72, Pennsylvania, USA.
Ramos, W. M. (2006). A compreensão leitora e a ação docente na produção do texto para o ensino a distância. Linguagem e Ensino, 9(1):215–242.
SBC (2006). Grandes desafios da pesquisa em computação no brasil: 2006 - 2016. [link].
Siddharthan, A. (2003). Syntactic Simplification and Text Cohesion. PhD thesis, University of Cambridge.
Specia, L., Aluísio, S. M., and Pardo, T. A. S. (2008). Manual de simplificação sintática para o português. Technical Report NILC-TR-08-06, NILC.
Vickrey, D. and Koller, D. (2008). Sentence simplification for semantic role labeling. In Proceedings of the ACL-HLT 2008, pages 344–352, Columbus, USA.
Williams, S. (2004). Natural Language Generation of discourse relations for different reading levels. PhD thesis, University of Aberdeen.
Williams, S. and Reiter, E. (2005). Generating readable texts for readers with low basic skills. In Proceedings of ENLG 2005, pages 140–147.
Witten, I. H. and Frank, E. (2005). Data Mining: Practical machine learning tools and techniques, 2nd Edition. Morgan Kaufmann, San Francisco.
Bick, E. (2000). The Parsing System ”Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. PhD thesis, Aarhus University.
Caseli, H. M. (2007). Indução de léxicos bilíngües e regras para a tradução automática. PhD thesis, Universidade de São Paulo.
Caseli, H. M., Pereira, T. F., Specia, L., Pardo, T. A. S., Gasperin, C., and Aluísio, S. M. (2009). Building a brazilian portuguese parallel corpus of original and simplified texts. Research in Computing Science. Advances in Computational Linguistics: 10th Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2009), 41:59–70.
Chandrasekar, R., Doran, C., and Srinivas, B. (1996). Motivations and methods for text simplification. In Proceedings of the Sixteenth International Conference on Computational Linguistics (COLING 1996), pages 1041–1044.
Chandrasekar, R. and Srinivas, B. (1997). Automatic induction of rules for text simplification. Knowledge-Based Systems, 10(3):183–190.
Devlin, S. and Unthank, G. (2006). Helping aphasic people process online information. In Proceedings of the 8th international ACM SIGACCESS Conference on Computers and Accessibility, pages 225–226, Portland, USA.
Graesser, A. C., McNamara, D. S., Louwerse, M. M., and Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers, 36:193–202.
IBGE (2006). Síntese dos indicadores sociais 2006. [link].
INAF (2007). 5 anos - um balanço dos resultados de 2001 a 2005. [link].
Inui, K., Fujita, A., Takahashi, T., Iida, R., and Iwakura, T. (2003). Text simplification for reading assistance. In Proceedings of the Second International Workshop on Paraphrasing: Paraphrase Acquisition and Applications, pages 9–16.
Klebanov, B. B., Knight, K., and Marcu, D. (2004). Text simplification for information-seeking applications. In On the Move to Meaningful Internet Systems. Lecture Notes in Computer Science, volume 3290, pages 735–747. Springer-Verlag.
Max, A. (2006). Writing for language-impaired readers. In Proceedings of the 7th International Conference on Intelligent Text Processing and Computational Linguistics, pages 567–570, Mexico City. Springer-Verlag.
Pardo, T. A. S. and Nunes, M. V. (2006). Review and evaluation of DiZer - an automatic discourse analyzer for brazilian portuguese. In Proceedings of PROPOR 2006. Lecture Notes in Computer Science, volume 3960, pages 180–189. Springer-Verlag.
Petersen, S. E. and Ostendorf, M. (2007). Text simplification for language learners: A corpus analysis. In Proceedings of the Speech and Language Technology for Education Workshop (SLaTE-2007), pages 69–72, Pennsylvania, USA.
Ramos, W. M. (2006). A compreensão leitora e a ação docente na produção do texto para o ensino a distância. Linguagem e Ensino, 9(1):215–242.
SBC (2006). Grandes desafios da pesquisa em computação no brasil: 2006 - 2016. [link].
Siddharthan, A. (2003). Syntactic Simplification and Text Cohesion. PhD thesis, University of Cambridge.
Specia, L., Aluísio, S. M., and Pardo, T. A. S. (2008). Manual de simplificação sintática para o português. Technical Report NILC-TR-08-06, NILC.
Vickrey, D. and Koller, D. (2008). Sentence simplification for semantic role labeling. In Proceedings of the ACL-HLT 2008, pages 344–352, Columbus, USA.
Williams, S. (2004). Natural Language Generation of discourse relations for different reading levels. PhD thesis, University of Aberdeen.
Williams, S. and Reiter, E. (2005). Generating readable texts for readers with low basic skills. In Proceedings of ENLG 2005, pages 140–147.
Witten, I. H. and Frank, E. (2005). Data Mining: Practical machine learning tools and techniques, 2nd Edition. Morgan Kaufmann, San Francisco.
Publicado
20/07/2009
Como Citar
GASPERIN, Caroline; MAZIERO, Erick; SPECIA, Lucia; PARDO, Thiago; ALUISIO, Sandra M..
Natural language processing for social inclusion: a text simplification architecture for different literacy levels. In: SEMINÁRIO INTEGRADO DE SOFTWARE E HARDWARE (SEMISH), 36. , 2009, Bento Gonçalves/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2009
.
p. 387-401.
ISSN 2595-6205.
