Demand-Driven Associative Classification
Resumo
The ultimate goal of machines is to help humans to solve problems. Such problems range between two extremes: structured problems for which the solution is totally defined (and thus are easily programmed by humans), and random problems for which the solution is completely undefined (and thus cannot be programmed). Problems in the vast middle ground have solutions that cannot be well defined and are, thus, inherently hard to program. Machine Learning is the way to handle this vast middle ground, so that many tedious and difficult hand-coding tasks would be replaced by automatic learning methods. There are several machine learning tasks, and this work is focused on a major one, which is known as classification. Some classification problems are hard to solve, but we show that they can be decomposed into much simpler sub-problems. We also show that independently solving these sub-problems by taking into account their particular demands, often leads to improved classification performance. This is shown empirically, by solving real-world problems using the computationally efficient algorithms that we present in this work. Significant improvements in classification performance are reported for all these problems, under a comparative study involving a broad repertoire of representative algorithms. Further, theoretical evidence supporting our algorithms is also provided.Referências
Boser, B., Guyon, I., and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Conf. on Computational Learning Theory, 144–152. Springer.
Breiman, L. (1984). Classification and regression trees. Wadsworth Intl.
Cover, T. and Hart, P. (1967). NN pattern classification.Trans.on Inf.Theory,13(1):21-27.
Cucker, F. and Smale, S. (2001). On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39(1):1–49.
Guyon, I., Boser, B., and Vapnik, V.(1992). Automatic capacity tuning of very large VCdimension classifiers. In Conf.on Neural Inf. Proc. Systems (NIPS), 147–155. MIT.
Joachims, T. (2006). Training linear SVMs in linear time. In Conf. on Knowledge Discovery and Data Mining (SIGKDD), 217–226. ACM.
Kolmogorov, A. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1:4–7.
Poggio, T. and Girosi, F. (1998). A sparse representation for function approximation. Neural Computation, 10(6):1445–1454.
Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1:81–106.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386–408.
Valiant, L. (1984). A theory of the learnable. Commun. ACM, 27(11):1134–1142.
Veloso, A. (2009). Demand-Driven Associative Classification. PhD Thesis, UFMG.
Veloso, A., Zaki,M., Meira,W., and Gonçalves, M. (2009a). Competence-conscious classification (best paper runner up).In Data Mining Conf. (SDM),918-929.SIAM.
Veloso, A., Zaki, M., Meira, W., Gonçalves, M., and Mossri, H. (2009b). Competence-conscious associative classification.Stat.Analysis and Data Mining,2(5-6):361–377.
Veloso, A., Meira, W., Zaki, M., Gonçalves, M., and Mossri, H. (2009c). Calibrated lazy associative classification. (accepted, to appear). Information Sciences.
Veloso, A., Ferreira, A., Gonçalves, M., and Laender, A. (2009d). Cost-effective on-demand associative name disambiguation. Inf.Proc. and Management. (submitted).
Veloso, A., Gonçalves, M., and Meira, W. (2008a). Learning to rank at query-time using association rules. In Conf. on Res. and Dev. in Inf. Ret. (SIGIR), 267–274. ACM.
Veloso, A., Meira, W., and Zaki, M. (2008b). Calibrated lazy associative classification (best paper runner up). In Braz. Symp. on Databases (SBBD), 135–149. SBC.
Veloso, A., Meira, W., Gonçalves, M., and Zaki, M. (2007a). Multi-label lazy associative classification. In Euro. Conf. on Data Mining and Knowl. Disc.(PKDD), 605–612.
Veloso, A., Meira, W. (2007b). Automatic moderation of comments in a large on-line journalistic environment. In Int.Conf. on Weblogs and Social Media (ICWSM).AAAI.
Veloso, A. and Meira, W. (2007). Efficient on-demand opinion mining. In Braz. Symp. on Databases (SBBD), 332–346. SBC.
Veloso, A., Meira, W., Gonçalves, M., and Zaki,M.(2006a). Multi-evidence, lazy associative classification. In Conf.on Inf.and Know.Managem.(CIKM),218–227.ACM.
Veloso, A., Meira, W., and Zaki, M. J. (2006b). Lazy associative classification. In Int. Conf. on Data Mining (ICDM), 645–654. IEEE.
Veloso, A. and Meira, W.(2005). Rule generation and rule selection techniques for cost-sensitive classification. In Braz. Symp. on Databases (SBBD), 295–309. SBC.
Veloso, A., andMeira,W.(2004). Efficient DataMining for Frequent Itemsets in Evolving and Distributed Data (best Master Thesis). In Braz. Comp. Soc. Conf. (CTD). SBC.
Veloso, A., Meira, W., and Parthasarathy, S.(2003). Efficient, Accurate and Privacy-Preserving Data Mining for Frequent Itemsets in Distributed Databases (best paper runner up). In Braz. Symp. on Databases (SBBD), 281–292. SBC.
Veloso, A., Meira, W., and Bunte, M.(2002). Mining Reliable Models of Associations in Dynamic Databases(best paper). In Braz. Symp. on Databases (SBBD),263–277.SBC.
Breiman, L. (1984). Classification and regression trees. Wadsworth Intl.
Cover, T. and Hart, P. (1967). NN pattern classification.Trans.on Inf.Theory,13(1):21-27.
Cucker, F. and Smale, S. (2001). On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39(1):1–49.
Guyon, I., Boser, B., and Vapnik, V.(1992). Automatic capacity tuning of very large VCdimension classifiers. In Conf.on Neural Inf. Proc. Systems (NIPS), 147–155. MIT.
Joachims, T. (2006). Training linear SVMs in linear time. In Conf. on Knowledge Discovery and Data Mining (SIGKDD), 217–226. ACM.
Kolmogorov, A. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1:4–7.
Poggio, T. and Girosi, F. (1998). A sparse representation for function approximation. Neural Computation, 10(6):1445–1454.
Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1:81–106.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386–408.
Valiant, L. (1984). A theory of the learnable. Commun. ACM, 27(11):1134–1142.
Veloso, A. (2009). Demand-Driven Associative Classification. PhD Thesis, UFMG.
Veloso, A., Zaki,M., Meira,W., and Gonçalves, M. (2009a). Competence-conscious classification (best paper runner up).In Data Mining Conf. (SDM),918-929.SIAM.
Veloso, A., Zaki, M., Meira, W., Gonçalves, M., and Mossri, H. (2009b). Competence-conscious associative classification.Stat.Analysis and Data Mining,2(5-6):361–377.
Veloso, A., Meira, W., Zaki, M., Gonçalves, M., and Mossri, H. (2009c). Calibrated lazy associative classification. (accepted, to appear). Information Sciences.
Veloso, A., Ferreira, A., Gonçalves, M., and Laender, A. (2009d). Cost-effective on-demand associative name disambiguation. Inf.Proc. and Management. (submitted).
Veloso, A., Gonçalves, M., and Meira, W. (2008a). Learning to rank at query-time using association rules. In Conf. on Res. and Dev. in Inf. Ret. (SIGIR), 267–274. ACM.
Veloso, A., Meira, W., and Zaki, M. (2008b). Calibrated lazy associative classification (best paper runner up). In Braz. Symp. on Databases (SBBD), 135–149. SBC.
Veloso, A., Meira, W., Gonçalves, M., and Zaki, M. (2007a). Multi-label lazy associative classification. In Euro. Conf. on Data Mining and Knowl. Disc.(PKDD), 605–612.
Veloso, A., Meira, W. (2007b). Automatic moderation of comments in a large on-line journalistic environment. In Int.Conf. on Weblogs and Social Media (ICWSM).AAAI.
Veloso, A. and Meira, W. (2007). Efficient on-demand opinion mining. In Braz. Symp. on Databases (SBBD), 332–346. SBC.
Veloso, A., Meira, W., Gonçalves, M., and Zaki,M.(2006a). Multi-evidence, lazy associative classification. In Conf.on Inf.and Know.Managem.(CIKM),218–227.ACM.
Veloso, A., Meira, W., and Zaki, M. J. (2006b). Lazy associative classification. In Int. Conf. on Data Mining (ICDM), 645–654. IEEE.
Veloso, A. and Meira, W.(2005). Rule generation and rule selection techniques for cost-sensitive classification. In Braz. Symp. on Databases (SBBD), 295–309. SBC.
Veloso, A., andMeira,W.(2004). Efficient DataMining for Frequent Itemsets in Evolving and Distributed Data (best Master Thesis). In Braz. Comp. Soc. Conf. (CTD). SBC.
Veloso, A., Meira, W., and Parthasarathy, S.(2003). Efficient, Accurate and Privacy-Preserving Data Mining for Frequent Itemsets in Distributed Databases (best paper runner up). In Braz. Symp. on Databases (SBBD), 281–292. SBC.
Veloso, A., Meira, W., and Bunte, M.(2002). Mining Reliable Models of Associations in Dynamic Databases(best paper). In Braz. Symp. on Databases (SBBD),263–277.SBC.
Publicado
20/07/2010
Como Citar
VELOSO, Adriano; MEIRA JR., Wagner.
Demand-Driven Associative Classification. In: CONCURSO DE TESES E DISSERTAÇÕES (CTD), 23. , 2010, Belo Horizonte/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2010
.
p. 89-96.
ISSN 2763-8820.