Multi-Lingual Text Localization via Language-Specific Convolutional Neural Networks

  • Jhonatas Santos de Jesus Conceição Unicamp
  • Allan Pinto Unicamp
  • Luis Decker Unicamp
  • Jose Luis Flores Campana Unicamp
  • Manuel Cordova Neira Unicamp
  • Andrezza A. dos Santos Unicamp
  • Helio Pedrini Unicamp
  • Ricardo Torres Unicamp

Resumo


Localização e Reconhecimento de texto em cena é um tópico em visão computacional que objetiva delimitar regiões candidatas em uma imagem de entrada contendo texto em cena. O desafio desta pesquisa consiste em desenvolver detectores capazes de lidar com diversas fontes de variabilidade tais como tamanho de fontes e cor, fundo complexo, texto em diferentes linguagens, entre outros. Este trabalho apresenta uma comparação entre estratégias para a construção de modelos de classificação baseados em Redes Neurais Convolucionais, para detectar elementos textuais em múltiplas linguagens em imagens, tais como: (i) modelo de classificação construído em um cenário multilíngue; e (ii) modelo de classificação construído em um cenário de linguagem específica. Os experimentos conduzidos neste trabalho indicam que modelos de linguagem específica superam os modelos treinados em um cenário multilíngue, apresentando uma melhoria de 14.79%, 8.94%, e 11.43%, em termos de precisão, revocação e f-measure, respectivamente.

Palavras-chave: Localização de texto em cena, Localização de texto multilíngue, Redes Neurais Convolucionais

Biografia do Autor

##submission.authorWithAffiliation##

Luis Gustavo Lorgus Decker is a master student at University of Campinas(UNICAMP) since 2017, has a bachelor degree in computer science by Universidade Federal de Santa Catarina (UFSC) . Currently working with deep learning for scene text detection, also have experience with computer vision and digital holography.

##submission.authorWithAffiliation##
Jose Flores received the B.Sc. degree in Informatic Engineering from UNSAAC, Peru. Currently, He is pursuing the M.Sc. degree in Computer Science at University of Campinas(UNICAMP). His research focuses on Machine Learning, Deep Learning, Image processing. Specially in text detection and recognition in images
##submission.authorWithAffiliation##
MSc. Manuel Alberto Córdova Neira is a Ph.D. candidate at University of Campinas (UNICAMP), Brazil. He received his B.Sc (Systems Engineer) degree from National University of Loja (UNL), Ecuador in 2010, and his M.Sc. (Computer Science) from Unicamp, in 2015. He worked as Professor at Department of Systems Engineering, National University of Loja (UNL), Ecuador, in 2016-2018.
##submission.authorWithAffiliation##
Andreza Santos is pursuing the B.Sc. degree in Computer Science at University of Campinas(UNICAMP). Her research focuses on Machine Learning and Deep Learning specifically in detecting objects on images and videos.
##submission.authorWithAffiliation##
Prof. Dr. Ricardo da Silva Torres is Full Professor of computer science at the University of Campinas (UNICAMP). Dr. Torres was director of the Institute of Computing, the University of Campinas from 2013 to 2017. Dr. Torres is a Brazilian CNPq research scholar (PQ 1C). Dr. Torres received a B.Sc. in Computer Engineering from University of Campinas, Brazil, in 2000 and his Ph.D. degree in Computer Science at the same university in 2004. Dr. Torres is co-founder and a member of the RECOD lab, where he has been developing multidisciplinary e-Science research projects involving Multimedia Analysis, Multimedia Image Retrieval, Machine Learning, Databases, Digital Libraries, and Geographic Information Systems. Dr. Torres is author/co-author of more than 100 articles in refereed journal and conferences and serves as PC member for several international and national conferences. His research activities have been also associated with the deposit and licensing of several patents.

Referências

X. Liu, D. Liang, S. Yan, D. Chen, Y. Qiao, and J. Yan, "FOTS: Fast Oriented Text Spotting with a Unified Network," IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5676–5685, 2018.

Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, "Character Region Awareness for Text Detection," in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9365–9374.

D. Deng, H. Liu, X. Li, and D. Cai, "PixelLink: Detecting Scene Text via Instance Segmentation," ArXiv, vol. abs/1801.01315, 2018.

D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny, "ICDAR 2015 Competition on Robust Reading," in 13th International Conference on Document Analysis and Recognition, Aug. 2015, pp. 1156–1160.

N. Nayef, F. Yin, I. Bizid, H. Choi, Y. Feng, D. Karatzas, Z. Luo, U. Pal, C. Rigaud, J. Chazalon, W. Khlif, M. Luqman, J.-C. Burie, C.-L. Liu, and J.-M. Ogier, "ICDAR 2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT," in 14th IAPR International Conference on Document Analysis and Recognition, 11 2017, pp. 1454–1459.

N. Nayef, Y. Patel, M. Busta, P. N. Chowdhury, D. Karatzas, W. Khlif, J. Matas, U. Pal, J.-C. Burie, C. lin Liu, and J.-M. Ogier, "ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition - RRC-MLT-2019," ArXiv, vol. abs/1907.00945, 2019.

A. Shrivastava, A. Gupta, and R. Girshick, "Training Region-Based Object Detectors with Online Hard Example Mining," in IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp. 761–769.
Publicado
28/10/2019
CONCEIÇÃO, Jhonatas Santos de Jesus; PINTO, Allan; DECKER, Luis; CAMPANA, Jose Luis Flores; NEIRA, Manuel Cordova; DOS SANTOS, Andrezza A.; PEDRINI, Helio; TORRES, Ricardo. Multi-Lingual Text Localization via Language-Specific Convolutional Neural Networks. In: WORKSHOP DE TRABALHOS DA GRADUAÇÃO - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 32. , 2019, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 215-218. DOI: https://doi.org/10.5753/sibgrapi.est.2019.8333.