Construção Automática de Diretórios Web usando Agrupamento Incremental de Termos
Resumo
Métodos baseados em agrupamento hierárquico de documentos são úteis para apoiar a construção de diretórios web de maneira não supervisionada. No entanto, os métodos tradicionais são ineficientes em cenários dinâmicos, com constante atualização do conhecimento. Além disso, estes métodos obtêm uma estrutura hierárquica que é de difícil interpretação para os usuários. Neste trabalho, é proposta uma abordagem de agrupamento incremental de termos que permite (1) organizar coleções de documentos em cenários dinâmicos e (2) obter descritores ao agrupamento para apoiar a interpretação dos resultados. Uma avaliação experimental foi realizada em dados reais de um diretório web, apresentando bons resultados.Referências
Bradley, P. S., Fayyad, U. M., and Reina, C. (1998). Scaling Clustering Algorithms to Large Databases. In Knowledge Discovery and Data Mining, pages 9–15.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7:1–30.
Farnstrom, F., Lewis, J., and Elkan, C. (2000). Scalability for clustering algorithms revisited. ACM SIGKDD Explorations Newsletter, 2:51–57.
Feldman, R. and Sanger, J. (2006). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.
Fung, B. C. M., Wang, K., and Ester, M. (2008). The Encyclopedia of Data Warehousing and Mining, chapter Hierarchical Document Clustering, pages 970–975. Idea Group.
Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys, 31(3):264–323.
Kim, H. J. (2006). On text mining algorithms for automated maintenance of hierarchical knowledge directory. In Knowledge Science, Engineering and Management, Lecture Notes in Computer Science, pages 202–214.
Manning, C. D., Raghavan, P., and Schütze, H. (2008). An Introduction to Information Retrieval. Cambridge University Press.
Marcacini, R. M. and Rezende, S. O. (2010a). Incremental construction of topic hierarchies using hierarchical term clustering. In SEKE’2010: Proceedings of the 22nd International Conference on Software Engineering and Knowledge Engineering, pages 553–558. KSI - Knowledge Systems Institute.
Marcacini, R. M. and Rezende, S. O. (2010b). Torch: a tool for building topic hierarchies from growing text collection. In WFA’2010: IX Workshop de Ferramentas e Aplicações - XVI Webmedia, pages 1–3.
Marchionini, G. (2006). Exploratory search: from finding to understanding. Communications of ACM, 49(4):41–46.
Metwally, A., Agrawal, D., and Abbadi, A. E. (2005). Efficient computation of frequent and top-k elements in data streams. In ICDT’05: Proceedings of 10th International Conference on Database Theory, pages 398–412.
Moura, M. F. and Rezende, S. O. (2010). A simple method for labeling hierarchical document clusters. In IAI’10: Proceedings of the 10th International Conference on Artificial Intelligence and Applications, pages 363–371, Acta Press, 2010.
Nassar, S., Sander, J., and Cheng, C. (2004). Incremental and effective data summarization for dynamic hierarchical clustering. In SIGMOD’04: Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 467–478.
Premalatha, K. and Natarajan, A. (2010). A Literature Review on Document Clustering. Information Technology Journal, 9(5):993–1002.
Xu, R. and Wunsch, D. (2008). Clustering. Wiley-IEEE Press, IEEE Press Series on Computational Intelligence.
Yang, H. C. and Lee, C. H. (2004). A text mining approach on automatic generation of web directories and hierarchies. Expert Systems with Applications, 27(4):645–663.
Zhao, Y. and Karypis, G. (2002). Evaluation of hierarchical clustering algorithms for document datasets. In CIKM ’02: Proceedings of the 11th International Conference on Information and Knowledge Management, pages 515–524.
Zhao, Y., Karypis, G., and Fayyad, U. (2005). Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 10(2):141–168.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7:1–30.
Farnstrom, F., Lewis, J., and Elkan, C. (2000). Scalability for clustering algorithms revisited. ACM SIGKDD Explorations Newsletter, 2:51–57.
Feldman, R. and Sanger, J. (2006). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.
Fung, B. C. M., Wang, K., and Ester, M. (2008). The Encyclopedia of Data Warehousing and Mining, chapter Hierarchical Document Clustering, pages 970–975. Idea Group.
Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys, 31(3):264–323.
Kim, H. J. (2006). On text mining algorithms for automated maintenance of hierarchical knowledge directory. In Knowledge Science, Engineering and Management, Lecture Notes in Computer Science, pages 202–214.
Manning, C. D., Raghavan, P., and Schütze, H. (2008). An Introduction to Information Retrieval. Cambridge University Press.
Marcacini, R. M. and Rezende, S. O. (2010a). Incremental construction of topic hierarchies using hierarchical term clustering. In SEKE’2010: Proceedings of the 22nd International Conference on Software Engineering and Knowledge Engineering, pages 553–558. KSI - Knowledge Systems Institute.
Marcacini, R. M. and Rezende, S. O. (2010b). Torch: a tool for building topic hierarchies from growing text collection. In WFA’2010: IX Workshop de Ferramentas e Aplicações - XVI Webmedia, pages 1–3.
Marchionini, G. (2006). Exploratory search: from finding to understanding. Communications of ACM, 49(4):41–46.
Metwally, A., Agrawal, D., and Abbadi, A. E. (2005). Efficient computation of frequent and top-k elements in data streams. In ICDT’05: Proceedings of 10th International Conference on Database Theory, pages 398–412.
Moura, M. F. and Rezende, S. O. (2010). A simple method for labeling hierarchical document clusters. In IAI’10: Proceedings of the 10th International Conference on Artificial Intelligence and Applications, pages 363–371, Acta Press, 2010.
Nassar, S., Sander, J., and Cheng, C. (2004). Incremental and effective data summarization for dynamic hierarchical clustering. In SIGMOD’04: Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 467–478.
Premalatha, K. and Natarajan, A. (2010). A Literature Review on Document Clustering. Information Technology Journal, 9(5):993–1002.
Xu, R. and Wunsch, D. (2008). Clustering. Wiley-IEEE Press, IEEE Press Series on Computational Intelligence.
Yang, H. C. and Lee, C. H. (2004). A text mining approach on automatic generation of web directories and hierarchies. Expert Systems with Applications, 27(4):645–663.
Zhao, Y. and Karypis, G. (2002). Evaluation of hierarchical clustering algorithms for document datasets. In CIKM ’02: Proceedings of the 11th International Conference on Information and Knowledge Management, pages 515–524.
Zhao, Y., Karypis, G., and Fayyad, U. (2005). Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 10(2):141–168.
Publicado
19/07/2011
Como Citar
MARCACINI, Ricardo M.; REZENDE, Solange O..
Construção Automática de Diretórios Web usando Agrupamento Incremental de Termos. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 8. , 2011, Natal/RN.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2011
.
p. 323-334.
ISSN 2763-9061.