Transcriber-AI: Using Artificial Intelligence for the Transcription of Historical Manuscripts

  • Rodrigo C. IFCE
  • Alysson H. IFCE
  • Alexsandro C. IFCE
  • Eduardo M. UNIFOR
  • Elizabeth K. IFCE
  • George Ney IFCE
  • Raimundo Valter IFCE

Abstract


This paper presents Transcritor-IA, a tool for automated transcription of historical manuscripts using Handwritten Text Recognition (HTR) models trained with PyLaia. The platform performs hierarchical segmentation and iterative training based on user-verified transcriptions. We also compare Transcritor-IA with Transkribus©, analyzing performance similarity and factors affecting transcription quality.

References

Arora, A., Chang, C. C., Rekabdar, B., BabaAli, B., Povey, D., Etter, D., Raj, D., Hadian, H., Trmal, J., Garcia, P., Watanabe, S., Manohar, V., Shao, Y., and Khudanpur, S. (2019). Using asr methods for ocr. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pages 663–668.

Humphries, M., Leddy, L. C., Downton, Q., Legace, M., McConnell, J., Murray, I., and Spence, E. (2024). Unlocking the archives: Large language models achieve state-of-the-art performance on the transcription of handwritten historical documents. arXiv preprint arXiv:2411.03340, 1.

Kahle, P., Colutto, S., Hackl, G., and Mühlberger, G. (2017). Transkribus - a service platform for transcription, recognition and retrieval of historical documents. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 04, pages 19–24.

Kiessling, B., Tissot, R., Stokes, P., and Stökl Ben Ezra, D. (2019). escriptorium: An open source platform for historical document analysis. In 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), volume 2, pages 19–19.

Lakshmi, D. V. (2024). Evolution of machine learning algorithms a comprehensive review. AG Volumes, pages 70–80.

Leifert, G., Romein, C., Rabus, A., Ströbel, P. B., and Hodel, T. (2024). Transkribus and beyond: Pioneering the future of transcription technology. In Proceedings of the Transkribus User Conference ’24, Innsbruck, Austria. Transkribus.

Maarand, M., Beyer, Y., Kåsen, A., Fosseide, K. T., and Kermorvant, C. (2022). A comprehensive comparison of open-source libraries for handwritten text recognition in norwegian. In Uchida, S., Barney, E., and Eglin, V., editors, Document Analysis Systems, pages 399–413, Cham. Springer International Publishing.

Mendes, S. T. d. P. and Oliveira, C. B. d. (2016). A transcrição de um manuscrito eclesiástico setecentista para a pesquisa na Área de linguística histórica. e-hum Revista Científica das áreas de História, Letras, Educação e Serviço Social do Centro Universitário de Belo Horizonte, 9(1):18–25.

Michael, J., Weidemann, M., and Labahn, R. (2018). Htr engine based on nns p 3 optimizing speed and performance-htr+. Technical report, Technical report, READ-H2020 Project 674943.

Moreira, A., de Paiva Oliveira, A., Mendes, F. F., de Queiroz, J. M., and Braga, V. (2007). Digitalização de manuscritos históricos: a experiência da casa setecentista de mariana. Ci. Inf. (Brasília), 36(3):89–98.

Nacional, B. (2022). Política de preservação digital da biblioteca nacional. Acessado em 19-05-2025.

Puigcerver, J. and Mocholí, C. (2018). Pylaia. [link]. Acessado em 19-05-2025. Commit específico: 941.

Rakesh, S., Reddy, P. K., Prashanth, V., and Reddy, K. S. (2024). Handwritten text recognition using deep learning techniques: a survey. MATEC Web of Conferences, 392.

Readcoop (2020). + try out transkribus new recognition software pylaia! [link]. Acessado em 19-05-2025.

Twinkle Sharma, P. and Arora, R. (2024). The evolution of artificial intelligence - a comprehensive review. International Journal of Science, Engineering and Technology, 12(3):1–7.
Published
2025-07-20
C., Rodrigo; H., Alysson; C., Alexsandro; M., Eduardo; K., Elizabeth; NEY, George; VALTER, Raimundo. Transcriber-AI: Using Artificial Intelligence for the Transcription of Historical Manuscripts. In: INTEGRATED SOFTWARE AND HARDWARE SEMINAR (SEMISH), 52. , 2025, Maceió/AL. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 133-144. ISSN 2595-6205. DOI: https://doi.org/10.5753/semish.2025.7698.