A Novel Solution for the Aggregation Problem in Natural Language Interface to Databases (NLIDB)


Natural Language Interface to Databases (NLIDB) systems usually do not deal with aggregations, which can be of two types: aggregation functions (such as count, sum, average, minimum, and maximum) and grouping functions (GROUP BY). This paper addresses the creation of a generic module, to be used in NLIDB systems, that allows such systems to perform queries with aggregations, on the condition that the query results the NLIDB returns are or can be transformed into tables. The paper covers aggregations with specificities, such as ambiguities, timescale differences, aggregations in multiple attributes, the use of superlative adjectives, basic unit measure recognition, and aggregations in attributes with compound names.

Palavras-chave: Natural Language Interface to Database (NLIDB), Question Answering (QA), Databases, Natural Language Processing (NLP), Aggregation, SQL


Abhijeet Gupta (2013). Complex Aggregates In Natural Language Interface To Databases. International Institute of Information Technology, Hyderabad.

Bharati, A., Bhatia, M., Chaitanya, V. and Sangal, R. (2014). Paninian Grammar Framework Applied to English South Asian Language Review, Creative Books, New Delhi, 1998.

Gupta, A., Akula, A., Malladi, D., et al. (2012). A novel approach towards building a portable NLIDB system using the computational Paninian grammar framework. Proc. 2012 International. Conf. on Asian Language Processing, IALP 2012, p. 93–96.

Gupta, A. and Sangal, R. (2013). A Novel Approach to Aggregation Processing in Natural Language Interfaces to Databases. Proc. 10th International Conference on Natural Language Processing - ICON-2013.

Li, F. and Jagadish, H. V. (2014). NaLIR: An interactive natural language interface for querying relational databases. Proc. 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, Utah, USA (June 2014), p. 709–712.

Li, F. and Jagadish, H. V (2014). Constructing an interactive natural language interface for relational databases. Proc. of the VLDB Endowment, v. 8, n. 1, p. 73–84.

Li, F. and Jagadish, H. V (2016). Understanding Natural Language Queries over Relational Databases. ACM SIGMOD Record, v. 45, n. 1, p. 6–13.

Pazos R, R. A., Aguirre L, M. A., González B, J. J., et al. (2016). Comparative study on the customization of natural language interfaces to databases. SpringerPlus 5, 553.

Pazos R, R. A., Verastegui, A. A., Martínez F, J. A., Carpio, M. and Gaspar H, J. (2018). Translation of natural language queries to SQL that involve aggregate functions, grouping and subqueries for a natural language interface to databases. In: Fuzzy Logic Augmentation of Neural and Optimization Algorithms: Theoretical Aspects and Real Applications. Studies in Computational Intell., vol 749. Springer, Cham, p. 431–448.

Tata, S. and Lohman, G. M. (2008). SQAK: Doing more with keywords. Proc. of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver Canada (June 2008), p. 889–901.
NOVELLO, Alexandre Ferreira; CASANOVA, Marco Antonio. A Novel Solution for the Aggregation Problem in Natural Language Interface to Databases (NLIDB). In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 35. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 217-222. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2020.13644.