HASCH: A High-Performance Automatic Spelling Corrector for Web Texts

  • Guilherme Andrade UFSJ
  • Felipe Teixeira UFSJ
  • Carolina Xavier UFSJ / UFRJ
  • Leonardo Rocha UFSJ

Abstract


Recently, we observe a real democratization data generation caused by the rise of the Web 2.0. These data are mostly provided in the form of texts, ranging from the reports provided by news portals, using a formal language, to comments in blog and micro-blogging applications, that abuse the use of an informal language (“Internetês”). Address this heterogeneity is an essential preprocessing so that these data can be used by tools that aim to infer accurate information based on such data. Thus, this work presents the HASCH (High Performance Automatic Spell CHEcker), HASCH is completely parallelized in shared memory. In our evaluation the HASCH was extremely effective in the correction of texts from different sources, with a linear speedup in the processing of large texts.

References

Edward M. Riseman, A. R. H. (1974). A contextual post-processing system for error correction using binary n-grams. IEEE Trans Computers, C-23(5):480–493.

Kenneth W. Church, W. A. G. (1991). Probability scoring for spelling correction. Statistics and Computing, 1:93–103.

McIlroy, M. D. (1982). Development of a spelling list. IEEE Transactions on Communications, COM-30(1):91–99.

Nix, R. (1981). Experience with a space efficient way to store a dictionary. Communications of the A.C.M., 24(5):297–298.

Norvig, P. How to Write a Spelling Corrector. [link].

Peterson, J. L. (1980). Computer programs for detecting and correcting spelling errors. Communications of the A.C.M., 23(12):676–687.
Published
2012-07-16
ANDRADE, Guilherme; TEIXEIRA, Felipe; XAVIER, Carolina; ROCHA, Leonardo. HASCH: A High-Performance Automatic Spelling Corrector for Web Texts. In: SBC UNDERGRADUATE RESEARCH CONTEST (CTIC-SBC), 31. , 2012, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2012 . p. 41-50.