Luppar: An Information Retrieval System for Closed Document Collections

  • Fabiano Tavares da Silva UECE
  • José Everardo Bessa Maia UECE

Resumo


This article presents Luppar, an Information Retrieval tool for closed collections of documents which uses a local distributional semantic model associated to each corpus. The system performs automatic query expansion using a combination of distributional semantic model and local context analysis and supports relevancy feedback. The performance of the system was evaluated in databases of different domains and presented results equal to or higher than those published in the literature.

Referências


Amati, G. and Van Rijsbergen, C. J. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst., 20(4):357–389.

Baeza-Yates, R. and Ribeiro-Neto, B. (2013). Recuperação de Informação - 2ed: Conceitos e Tecnologia das Máquinas de Busca. Bookman Editora.

Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. (2003). A neural probabilistic language model. Journal of machine learning research, 3(Feb):1137–1155.

Bhogal, J., Macfarlane, A., and Smith, P. (2007). A review of ontology based query expansion. Inf. Process. Manage., 43(4):866–886.

Carpineto, C. and Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Comput. Surv., 44(1):1:1–1:50.

Curran, J. R. and Moens, M. (2002). Improvements in automatic thesaurus extraction. In Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition-Volume 9, pages 59–66. Association for Computational Linguistics.

Ermakova, L. and Mothe, J. (2016). Query Expansion by Local Context Analysis. Coria, pages 1–16.

Gong, Z., Cheang, C. W., and Hou, U. L. (2005). Web query expansion by wordnet. In International Conference on Database and Expert Systems Applications, pages 166– 175. Springer.

Harris, Z. S. (1954). Distributional structure. Word, 10(2-3):146–162.

Hashemi, S. H., Clarke, C. L., Kamps, J., Kiseleva, J., and Voorhees, E. M. (2016). Overview of the trec 2016 contextual suggestion track. In Proceedings of TREC, volume 2016.

Hsu, M.-H., Tsai, M.-F., and Chen, H.-H. (2006). Query expansion with conceptnet and wordnet: An intrinsic comparison. In Asia Information Retrieval Symposium, pages 1–13. Springer.

Landauer, T. K. and Dumais, S. T. (1997). A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological review, 104(2):211.

Lebret, R. and Collobert, R. (2015). Rehabilitation of count-based models for word vector representations. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 417–429. Springer.

Levy, O. and Goldberg, Y. (2014). Linguistic regularities in sparse and explicit word representations. In Proceedings of the eighteenth conference on computational natural language learning, pages 171–180.

Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of the 17th international conference on Computational linguistics-Volume 2, pages 768–774. Association for Computational Linguistics.

Lowe, W. (2001). Towards a theory of semantic space. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 23.

Lu, M., Sun, X.,Wang, S., Lo, D., and Duan, Y. (2015). Query expansion via wordnet for effective code search. In Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22nd International Conference on, pages 545–549. IEEE.

Mikolov, T., Corrado, G., Chen, K., and Dean, J. (2013a). Efficient Estimation of Word Representations in Vector Space. Proceedings of the International Conference on Learning Representations (ICLR 2013).

Mikolov, T., Yih, W.-t., and Zweig, G. (2013b). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 746–751.

Miller, G. A. (1995). Wordnet: A lexical database for english. Commun. ACM, 38(11):39– 41.

One, V. D. (1990). Cd-rom from virginia polytechnic institute and state university. Blacksburg, VA.

Ooi, J., Ma, X., Qin, H., and Liew, S. C. (2015). A survey of query expansion, query suggestion and query refinement techniques. 2015 4th International Conference on Software Engineering and Computer Systems, ICSECS 2015: Virtuous Software Solutions for Big Data, pages 112–117.

Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3):130–137.

Robertson, S. and Zaragoza, H. (2009). The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333–389.

Salton, G., Wong, A., and Yang, C. S. (1975). A vector space model for automatic indexing. Commun. ACM, 18(11):613–620.

Turney, P. D. and Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37:141–188.

Xu, J. and Croft, W. B. (1996). Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’96, pages 4–11, New York, NY, USA. ACM.

Publicado
22/10/2018
DA SILVA, Fabiano Tavares; MAIA, José Everardo Bessa. Luppar: An Information Retrieval System for Closed Document Collections. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 15. , 2018, São Paulo. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 912-923. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2018.4478.