Pareto Front Elite

  • Celso Yoshikazu Ishida UFPR
  • Aurora T. R. Pozo UFPR

Resumo


Desde a última década, a análise ROC tem sido utilizada para comparações de algoritmos de aprendizado de máquina. A área abaixo da curva ROC (AUC) é considerada um critério relevante para lidar com dados não balanceados, custos de erros de classificação e ruídos. Um classificador com o maior valor de AUC indica que possui o melhor desempenho médio. Baseado nestas preferências, introduzimos um algoritmo de aprendizado de regras. A partir de um grande conjunto de regras, o algoritmo constrói uma Fronteira de Pareto utilizando os critérios de Sensitividade e Especificidade. Comparamos os resultados com outros algoritmos de indução de regras evidenciando que o novo algoritmo obtém um conjunto de regras com altos valores de AUC.

Referências

Bäck, T. and Schwefel, H.-P. (1993). An overview of evolutionary algorithms for parameter optimization. Evolutionary Computation, 1(1):1–23.

Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145–1159.

Clark, P. and Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3:261–283.

Cohen, W. W. (1995). Fast effective rule induction. In ICML, pages 115–123.

Cohen, W. W. and Singer, Y. (1999). A simple, fast, and effective rule learner. In Proceedings of the 6th National Conference on Artificial Intelligence (AAAI-99); Proceedings of the 11th Conference on Innovative Applications of Artificial Intelligence, pages 335–342, Menlo Park, Cal. AAAI/MIT Press.

D.J. Newman, S. Hettich, C. B. and Merz, C. (1998). UCI repository of machine learning databases.

Egan, J. (1975). Signal detection theory and ROC analysis. Academic Press New York.

Fawcett, T. (2001). Using rule sets to maximize roc performance. In ICDM ’01: Proceedings of the 2001 IEEE International Conference on Data Mining, pages 131–138, Washington, DC, USA. IEEE Computer Society.

Fawcett, T. (2003). ROC graphs: Notes and practical considerations for data mining researchers. Technical Report HPL-2003-4, Hewlett Packard Laboratories.

Fawcett, T. and Provost, F. J. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1(3):291–316.

Ferri, C., Flach, P., and Hernandez-Orallo, J. (2002). Learning decision trees using the area under the ROC curve. pages 139–146.

Ferri, C., Flach, P., and Hernandez-Orallo, J. (2004). Delegating classifiers. In Greineer, R. and Schuurmans, D., editors, Proceedings of the 21st International Conference on Machine Learning (ICML 2004). ACM.

Gamberger, D. and Lavrac, N. (2000). Confirmation rule sets. In Zighed, D. A., Komorowski, H. J., and Zytkow, J. M., editors, PKDD, volume 1910 of Lecture Notes in Computer Science, pages 34–43. Springer.

Jovanoski, V. and Lavrac, N. (2001). Classification rule learning with APRIORI-C. In Brazdil, P. and Jorge, A., editors, EPIA, volume 2258 of Lecture Notes in Computer Science, pages 44–51. Springer.

Prati, R. C. and Flach, P. A. (2005). ROCCER: An algorithm for rule learning based on ROC analysis. In Kaelbling, L. P. and Saffiotti, A., editors, IJCAI, pages 823–828. Professional Book Center.

Provost, F. and Domingos, P. (2003). Tree induction for probability based ranking. Machine Learning, 52(3):199–215.

Provost, F., Fawcett, T., and Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceedings 15th International Conference on Machine Learning, pages 445–453. Morgan Kaufmann, San Francisco, CA.

Provost, F. J. and Fawcett, T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In KDD, pages 43–48.

Quinlan, J. (1993). C4. 5: Programs for Machine Learning. Morgan Kaufmann.

Rakotomamonjy, A. (2004). Optimizing area under roc curve with SVMs. In Hernández-Orallo, J., Ferri, C., Lachiche, N., and Flach, P. A., editors, ROCAI, pages 71–80.

Sebag, Aze, and Lucas (2003a). ROC-based evolutionary learning: Application to medical data mining. In International Conference on Artificial Evolution, Evolution Artificielle, LNCS, volume 6.

Sebag, M., Azé, J., and Lucas, N. (2003b). Impact studies and sensitivity analysis in medical data mining with ROC-based genetic learning. In ICDM, pages 637–640. IEEE Computer Society.
Publicado
30/06/2007
ISHIDA, Celso Yoshikazu; POZO, Aurora T. R.. Pareto Front Elite. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 6. , 2007, Rio de Janeiro/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2007 . p. 1062-1071. ISSN 2763-9061.