Cross-Language Information Retrieval using Algorithms for Mining Association Rules

  • André Pinto Geraldo UFRGS
  • Viviane Pereira Moreira UFRGS

Resumo


This work proposes the use of algorithms for mining association rules as an approach for Cross-Language Information Retrieval. These algorithms have been widely used to analyze market basket data. The idea is to map the problem of finding associations between sales items to the problem of finding term translations over a parallel corpus. The proposal was validated by means of experiments using different languages, queries and corpora. The results show that the performance of our proposed approach is comparable to the performance of the monolingual baseline and to query translation via machine translation, even though these systems employ more complex #atural Language Processing techniques. A prototype for cross-language web querying was implemented to test the proposed method. The system accepts keywords in Portuguese, translates them into English and submits the query to several web-sites that provide search functionalities.

Referências

Global Reach. [link] accessed on 19-Oct-2007

Agirre, E. and O. L. Lacalle (2007). UBC-ALM: Combining k-NN with SVD for WSD. SemEval '07: Proceedings of the 4th International Workshop on Semantic Evaluations. Prague: 342-345.

Agrawal, R., T. Imielinski, et al. (1993). Mining Association Rules between Sets of Items in Large Databases. Proc. of the ACM SIGMOD Conference on Management of Data. Washington, D.C.

Agrawal, R. and R. Srikant (1994). Fast Algorithms for Mining Association Rules. Proceedings of the 20th VLDB Conference. Santiago, Chile: 487-499.

Cross-Language Evaluation Forum. [link] accessed on 17-May-2010

Geraldo, A. P. and V. Moreira Orengo (2008). Ajustando a importância dos termos: uma extensão à BM25. Anais da Sessão de Pôsteres do Simpósio Brasileiro de Bancos de Dados.

Geraldo, A. P., V. P. Moreira, et al. (2009). On-Demand Associative Cross-Language Information Retrieval. String Processing and Information Retrieval, 16th International Symposium (LNCS 5721). J. Karlgren, T. J. and H. Hyyro. Saariselkä, Springer: 165-173.

Geraldo, A. P. and V. M. Orengo (2008). Ajustando a importância dos termos: uma extensão à BM25. XXIII Simpósio Brasileiro de Banco de dados. Campinas, BR, SBC.

Geraldo, A. P. and V. M. Orengo (2008). UFRGS@CLEF2008: Using Association rules for Cross-Language Information Retrieval. Evaluating Systems for Multilingual and Multimodal Information Access (LNCS 5706). F. Borri, A. Nardi and C. Peters. Aarhus, Denmark, Springer: 66-74.

Global Reach. [link] accessed on 19-Oct-2007

Google Translator 2010. [link] accessed on 17-May-2010

Grefenstette, G. (1998). Cross-Language Information Retrieval. Boston, Kluwer Academic Publishers.

Kraaij, W., J. Nie, et al. (2003). "Embedding web-based statistical translation models in cross-language information retrieval." Computational Linguistics 29(3): 381-419.

LEC Power Translator. [link] accessed on 17-May-2010

Nie, J., M. Simard, et al. (1999). Cross-Language Information Retrieval based on Parallel Texts and Automatic Mining of Parallel Texts from the Web. SIGIR: 74-81.

Orengo, V. M. and C. R. Huyck (2003). Portuguese-English Cross-Language Information Retrieval Using Latent Semantic Indexing. Advances in Cross-Language Information Retrieval - Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002 (LNCS 2785). C. Peters, M. Braschler, J. Gonzalo and M. Kluck. Rome, Italy, Springer.

Salton, G. (1970). "Automatic Processing of Foreign Language Documents." Journal of the American Society for Information Science 21(3): 187-194.

Savoy, J. (2004). "Combining Multiple Strategies for Effective Monolingual and Cross-Language Retrieval." Information Retrieval 7(1-2): 121-148.

Systran. [link] accessed on 22/01/2009

Veloso, A., H. Almeida, et al. (2008). Learning to Rank at Query-Time using Association Rules. SIGIR-08. Singapore: 267-274.

World Internet Statistics. [link] accessed on 17-May-2010

Yang, Y., J. Carbonell, et al. (1997). Translingual Information Retrieval. 15th International Joint Conference on Artificial Inteligence (IJCAI), Nagoya, Japan.

Zettair. [link] accessed on 17-May-2010
Publicado
20/07/2010
GERALDO, André Pinto; MOREIRA, Viviane Pereira. Cross-Language Information Retrieval using Algorithms for Mining Association Rules. In: CONCURSO DE TESES E DISSERTAÇÕES (CTD), 23. , 2010, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2010 . p. 57-64. ISSN 2763-8820.