Research on Reading Question and Answer Model Documents in the Sports Domain
Abstract
This paper investigates document reader models in question-answering systems. These models analyze pre-selected documents using advanced natural language processing techniques to understand the context and semantics of the text, producing relevant answers. We compare the models BERT, DistilBERT, MiniLM, RoBERTa, and ELECTRA, considering their ability to answer questions related to the sports domain. The results demonstrated that the RoBERTa model provided the best performance considering Exact Match and F-Score, while the DistilBERT model provided the best execution time.
References
Chebbi, I., Boulila, W., and Farah, I. R. (2015). Big data: Concepts, challenges and applications. In Computational Collective Intelligence. Lecture Notes in Computer Science, volume 9330, pages 638–647.
Chen, D., Fisch, A., Weston, J., and Bordes, A. (2017). Reading Wikipedia to answer open-domain questions. CoRR, abs/1704.00051.
Clark, K., Luong, M., Le, Q. V., and Manning, C. D. (2020). ELECTRA: pre-training text encoders as discriminators rather than generators. CoRR, abs/2003.10555.
Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018). BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.
Hirschman, L. and Gaizauskas, R. (2001). Natural language question answering: the view from here. Natural Language Engineering, 7(4):275–300.
Jardim, P. C., Moraes, L. M. P., and Aguiar, C. D. (2023). QASports: A question answering dataset about sports. In Proceedings of the Brazilian Symposium on Databases: Dataset Showcase Workshop, pages 1–12.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
Mishra, A. and Jain, S. K. (2016). A survey on question answering systems with classification. Journal of King Saud University - Computer and Information Sciences, 28(3):345–361.
Moraes, L. M. P., Jardim, P., and Aguiar, C. D. (2023). Design principles and a software reference architecture for big data question answering systems. In Proc. of the 25th International Conference on Enterprise Information Systems, pages 57–67.
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. CoRR, abs/1606.05250.
Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108.
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., and Zhou, M. (2020). Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In Advances in Neural Information Processing Systems, volume 33, pages 5776–5788.
