Unified Knowledge-Graph for Brazilian Indigenous Languages: An Educational Applications Perspective

Abstract


In this paper we present an unified knowledge-graph for Brazilian indigenous languages (BIL) from the perspective of potential applications, with a particular focus to the educational domain. We present BILGraph, a prototype we built for Bororo and Tupian languages, such as Guajajara, Munduruku and Akuntsu. Then we describe the knowledge extraction and entity linking process to build the graph from a dependency treebank and a lexical database for Tupian and Bororo languages. We discuss the limitations of BILGraph, highlighting ethical and practical implementation concerns.

Keywords: Knowledge Graph, Low-resource languages, Natural Language Processing, Indigenous Languages

References

Cabral, A. S. and Rodrigues, A. (2003). Dicionário da língua asurini do tocantins. Belém-Pará: UFPA/IFNOPAP/UnB: IL/LALI.

Cong, J. and Liu, H. (2014). Approaching human language with complex networks. Physics of Life Reviews, 11(4):598–618.

Ferraz Gerardi, F. Bororo Dictionary. Forthcoming. Available upon request.

Ferraz Gerardi, F. (2024). Universaldependencies/udbororo−bdt.

Ferraz Gerardi, F. M., Sollberger, D., and Toribio Serrano, L. (2024). Corpus bororo (corbo) (v0.1.1).

Gerardi, F. F., Reichert, S., Aragon, C., Wientzek, T., List, J.-M., and Forkel, R. (2022a). TuLeD. Tupían Lexical Database. Zenodo.

Gerardi, F. F., Reichert, S., Aragon, C., Wientzek, T., List, J.-M., and Forkel, R. (2022b). TuLeD. Tupían Lexical Database (v0.12).

Harrison, C. and Harrison, C. (2013). Dicionário Guajajara-Português. SIL.

Lewis, J. E., Abdilla, A., Arista, N., Baker, K., Benesiinaabandan, S., Brown, M., Cheung, M., Coleman, M., Cordes, A., Davison, J., Duncan, K., Garzon, S., Harrell, D. F., Jones, P.-L., Kealiikanakaoleohaililani, K., Kelleher, M., Kite, S., Lagon, O., Leigh, J., Levesque, M., Mahelona, K., Moses, C., Nahuewai, I. I., Noe, K., Olson, D., Parker Jones, Ō., Running Wolf, C., Running Wolf, M., Silva, M., Fragnito, S., and Whaanga, H. (2020). Indigenous protocol and artificial intelligence position paper. Project Report DOI: 10.11573/spectrum.library.concordia.ca.00986506, Aboriginal Territories in Cyberspace, Honolulu, HI. Edited by Jason Edward Lewis. English Language Version of ”Ka?ina Hana ?Ōiwi a me ka Waihona ?Ike Hakuhia Pepa Kūlana” available at: [link].

Miller, G. A. (1994). WordNet: A lexical database for English. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994.

Monserrat, R. F. (2000). Vocabulário Amondawa-Português, Vocabulário e frases em Arara e Português, Vocabulário Gavião-Português, Vocabulário e frases em Karipuna e Português, Vocabulário e frases em Makurap e Português, Vocabulário e frases em Suruíe Português, Pequeno dicionário em Tupari e Português. Universidade do Caixas do Sul.

Nivre, J., Abrams, M., Agić, Z., Ahrenberg, L., Antonsen, L., Aranzabe, M. J., Arutie, A., Asahara, M., Ateyah, L., Attia, M., et al. (2020a). Universal dependencies v2: An evergrowing multilingual treebank collection. [link]. Accessed: 2024-08-27.

Nivre, J., de Marneffe, M.-C., Ginter, F., Hajič, J., Manning, C. D., Pyysalo, S., Schuster, S., Tyers, F., and Zeman, D. (2020b). Universal Dependencies v2: An evergrowing multilingual treebank collection. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4034–4043, Marseille, France. European Language Resources Association.

Pinhanez, C. S., Cavalin, P., Vasconcelos, M., and Nogima, J. (2023). Balancing social impact, opportunities, and ethical constraints of using ai in the documentation and vitalization of indigenous languages. In Elkind, E., editor, Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 6174–6182. International Joint Conferences on Artificial Intelligence Organization. AI for Good.

Polleti, G. (2024). Building a language-learning game for Brazilian indigenous languages: A case study. Technical report, arXiv:2403.14515.
Published
2024-11-17
POLLETI, Gustavo; COZMAN, Fabio; GERARDI, Fabrício. Unified Knowledge-Graph for Brazilian Indigenous Languages: An Educational Applications Perspective. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 15. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 159-164. DOI: https://doi.org/10.5753/stil.2024.245403.