Empowering Natural Language Interfaces to Databases with Aggregations

Authors

DOI:

https://doi.org/10.5753/jidm.2021.1908

Keywords:

Natural Language Interface to Database (NLIDB), Question Answering (QA), Databases, Natural Language Processing (NLP), Aggregation, SQL

Abstract

A Natural Language Interface to Database (NLIDB) refers to a database interface that translates a question asked in natural language into a structured query. Aggregation questions express aggregation functions, such as count, sum, average, minimum and maximum, and optionally a group by clause and a having clause. NLIDBs deliver good results for standard questions but usually do not deal with aggregation questions. The main contribution of this article is a generic module, called GLAMORISE (GeneraL Aggregation MOdule using a RelatIonal databaSE), that extends NLIDBs to cope with aggregation questions. GLAMORISE covers aggregations with ambiguities, timescale differences, aggregations in multiple attributes, the use of superlative adjectives, basic recognition of measurement units, and aggregations in attributes with compound names.

Downloads

Download data is not yet available.

Author Biography

Marco A. Casanova, PUC-Rio

Marco A. Casanova is Full Professor at the Department of Informatics and Coordinator of the Central Planning and Evaluation Office of the Pontifical Catholic University of Rio de Janeiro – PUC-Rio. He graduated in Electronic Engineering at the Military Institute of Engineering (1974), obtained a M.Sc. in Informatics from PUC-Rio (1976) and a M.Sc. (1977) and a Ph.D. (1979) in Applied Mathematics from Harvard University. He was Graduate Program Coordinator (2005-2007) and Director (2007-2011) of the Department of Informatics of PUC-Rio. His research interests concentrate on database conceptual modeling and construction of database management systems. In July 2012, he received the Scientific Merit Award from the Brazilian Computer Society.

References

ABHIJEET GUPTA. Complex Aggregates In Natural Language Interface To Databases. International Institute of Information Technology, Hyderabad, 2013.

AFFOLTER, K., STOCKINGER, K. AND BERNSTEIN, A. A comparative survey of recent natural language interfaces for databases. VLDB Journal, v. 28, n. 5, p. 793–819, 2019.

BHARATI, A., BHATIA, M., CHAITANYA, V. AND SANGAL, R.. Paninian Grammar Framework Applied to English. South Asian Language Review, Creative Books, New Delhi, 1998.

GARCÍA, G. M. A Keyword-based Query Processing Method for Datasets with Schemas. D.Sc. Thesis, Department of Informatics, PUC-Rio, 2020.

GUPTA, A., AKULA, A., MALLADI, D., ET AL. A novel approach towards building a portable NLIDB system using the computational Paninian grammar framework. Proceedings of the 2012 International Conference on Asian Language Processing - IALP 2012, p. 93–96, 2012.

GUPTA, A. AND SANGAL, R.. A Novel Approach to Aggregation Processing in Natural Language Interfaces to Databases. Proceedings of the 10th International Conference on Natural Language Processing - ICON-2013, 2013.

HONNIBAL, M. AND JOHNSON, M. An improved non-monotonic transition system for dependency parsing. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, p. 1373–1378, 2015.

HONNIBAL, M. AND MONTANI, I. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. Unpublished software application. https://spacy.io, 2017.

IZQUIERDO, Y. T., GARCÍA, G. M., MENENDEZ, E. S., ET AL. QUIOW: A keyword-based query processing tool for RDF datasets and relational databases. Lecture Notes in Computer Science - LNCS, v. 11030, p. 259–269, 2018.

IZQUIERDO, Y. T., GARCÍA, G. M., NOVELLI, B. A., ET AL. Integrating a geomechanical collaborative research portal with a data & knowledge retrieval platform. Rio Oil and Gas Expo and Conference, v. 20, p. 421–422, 2020.

LI, F. AND JAGADISH, H. V. NaLIR: An interactive natural language interface for querying relational databases. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery, p. 709–712, 2014.

LI, F. AND JAGADISH, H. V. Constructing an interactive natural language interface for relational databases. Proceedings of the VLDB Endowment, v. 8, n. 1, p. 73–84, 2014.

LI, F. AND JAGADISH, H. V. Understanding Natural Language Queries over Relational Databases. ACM SIGMOD Record, v. 45, n. 1, p. 6–13, 2016.

NOVELLO, A. F. A Novel Solution to Empower Natural Language Interfaces to Databases (NLIDB) to Handle Aggregations. M.Sc. Dissertation, Department of Informatics, PUC-Rio, 2021.

NOVELLO, A. F. AND CASANOVA, M. A. A Novel Solution for the Aggregation Problem in Natural Language Interface to Databases (NLIDB). Proceedings of the XXXV Brazilian Symposium on Databases - SBBD 2020, Sociedade Brasileira de Computação - SBC. 2020.

PAZOS R, R. A., AGUIRRE L, M. A., GONZÁLEZ B, J. J., ET AL. Comparative study on the customization of natural language interfaces to databases. SpringerPlus, v. 5, n. 1, p. 553, 2016.

PAZOS R, R. A., VERASTEGUI, A. A., MARTÍNEZ F, J. A., CARPIO, M. AND GASPAR H, J. Translation of natural language queries to SQL that involve aggregate functions, grouping and subqueries for a natural language interface to databases. Studies in Computational Intelligence, v. 749, p. 431–448, 2018.

PINHEIRO, J. P. V, CASANOVA, M. A. AND MENENDEZ, E. S. Improving the Quality of the User Experience by Query Answer Modification. Proceedings of the XXXV Brazilian Symposium on Databases - SBBD 2020. Sociedade Brasileira de Computação - SBC.

PRUSKI, P., LOHAR, S., GOSS, W., RASIN, A. AND CLELAND-HUANG, J. TiQi: Answering unstructured natural language trace queries. Requirements Engineering, v. 20, n. 3, p. 215–232, 2015.

SHAH, V., LI, S., KUMAR, A. AND SAUL, L. SpeakQL: towards speech-driven multimodal querying of structured data. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery, p. 2363–2374,

TATA, S. AND LOHMAN, G. M. SQAK: Doing more with keywords. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery, p. 889–902, 2008.

IZQUIERDO, Y.T., GARCIA, G.M., LEMOS, M., ET AL. Keyword Search over the COVID-19 Data. Proceedings of the XXXV Brazilian Symposium on Databases - SBBD 2020, Sociedade Brasileira de Computação - SBC, 2020.

Downloads

Published

2021-11-19

How to Cite

F. Novello, A., & Casanova, M. A. (2021). Empowering Natural Language Interfaces to Databases with Aggregations. Journal of Information and Data Management, 12(5). https://doi.org/10.5753/jidm.2021.1908

Issue

Section

SBBD 2020 Short papers - Extended Papers