Experimenting Sentence Split-and-Rephrase Using Part-of-Speech Labels

  • P. Berlanga Neto Universidade de São Paulo
  • E. Y. Okano Universidade de São Paulo
  • E. E. S. Ruiz Universidade de São Paulo


Text simplification (TS) is a natural language transformation process that reduces linguistic complexity while preserving semantics and retaining its original meaning. This work aims to present a research proposal for automatic simplification of texts, precisely a split-and-rephrase approach based on an encoder-decoder neural network model. The proposed method was trained against the WikiSplit English corpus with the help of a part-of-speech tagger and obtained a BLEU score validation of 74.72%. We also experimented with this trained model to split-and-rephrase sentences written in Portuguese with relative success, showing the method’s potential.

Palavras-chave: natural language processing, neural networks, sentence simplification


