Skip to main content

Evaluating Clustering Meta-features for Classifier Recommendation

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2021)

Abstract

Data availability in a wide variety of domains has boosted the use of Machine Learning techniques for knowledge discovery and classification. The performance of a technique in a given classification task is significantly impacted by specific characteristics of the dataset, which makes the problem of choosing the most adequate approach a challenging one. Meta-Learning approaches, which learn from meta-features calculated from the dataset, have been successfully used to suggest the most suitable classification algorithms for specific datasets. This work proposes the adaptation of clustering measures based on internal indices for supervised problems as additional meta-features in the process of learning a recommendation system for classification tasks. The gains in performance due to Meta-Learning and the additional meta-features are investigated with experiments based on 400 datasets, representing diverse application contexts and domains. Results suggest that (i) meta-learning is a viable solution for recommending a classifier, (ii) the use of clustering features can contribute to the performance of the recommendation system, and (iii) the computational cost of Meta-Learning is substantially smaller than that of running all candidate classifiers in order to select the best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/rivolli/mfe.

References

  1. Baker, F.B., Hubert, L.J.: Measuring the power of hierarchical cluster analysis. J. Am. Stat. Assoc. 70(349), 31–38 (1975)

    Article  Google Scholar 

  2. Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 28(3), 301–315 (1998)

    Google Scholar 

  3. Bilalli, B., Abelló, A., Aluja-Banet, T.: On the predictive power of meta-features in OpenML. Int. J. Appl. Math. Comput. Sci. 27(4), 697–712 (2017)

    Article  MathSciNet  Google Scholar 

  4. Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning - Applications to Data Mining. Cognitive Technologies, 1st edn. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-73263-1

    Book  MATH  Google Scholar 

  5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  6. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth and Brooks (1984)

    Google Scholar 

  7. Brock, G., Pihur, V., Datta, S., Datta, S.: clValid: An R package for cluster validation. J. Stat. Softw. 25(4), 1–22 (2008). http://www.jstatsoft.org/v25/i04/

  8. Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)

    Article  MathSciNet  Google Scholar 

  9. Castiello, C., Castellano, G., Fanelli, A.M.: Meta-data: characterization of input features for meta-learning. In: Torra, V., Narukawa, Y., Miyamoto, S. (eds.) MDAI 2005. LNCS (LNAI), vol. 3558, pp. 457–468. Springer, Heidelberg (2005). https://doi.org/10.1007/11526018_45

    Chapter  MATH  Google Scholar 

  10. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  11. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979)

    Google Scholar 

  12. Desgraupes, B.: clusterCrit Vignette (2018). https://CRAN.R-project.org/package=clusterCrit/vignettes/clusterCrit.pdf

  13. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  14. Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)

    Article  MathSciNet  Google Scholar 

  15. Filchenkov, A., Pendryak, A.: Datasets meta-feature description for recommending feature selection algorithm. In: Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT), vol. 7, pp. 11–18 (2015)

    Google Scholar 

  16. Garcia, L.P.F., Lorena, A.C., de Souto, M.C.P., Ho, T.K.: Classifier recommendation using data complexity measures. In: 24th International Conference on Pattern Recognition (ICPR), pp. 874–879 (2018)

    Google Scholar 

  17. Garcia, L.P.F., Rivolli, A., Alcobaça, E., Lorena, A.C., de Carvalho, A.C.P.L.F.: Boosting meta-learning with simulated data complexity measures. Intell. Data Anal. 24(5), 1011–1028 (2020)

    Google Scholar 

  18. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)

    Article  Google Scholar 

  19. Handl, J., Knowles, J.: Exploiting the trade-off — the benefits of multiple objectives in data clustering. In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 547–560. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31880-4_38

    Chapter  Google Scholar 

  20. Haykin, S.S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Hoboken (1999)

    Google Scholar 

  21. Hubert, L., Schultz, J.: Quadratic assignment as a general data analysis strategy. Br. J. Math. Stat. Psychol. 29(2), 190–241 (1976)

    Article  MathSciNet  Google Scholar 

  22. Mitchell, T.M.: Machine Learning. McGraw Hill Series in Computer Science. McGraw Hill, New York (1997)

    Google Scholar 

  23. Montgomery, D.C.: Design and Analysis of Experiments, 5th edn. Wiley, Hoboken (2000)

    Google Scholar 

  24. Muñoz, M.A., Villanova, L., Baatar, D., Smith-Miles, K.: Instance spaces for machine learning classification. Mach. Learn. 107(1), 109–147 (2018). https://doi.org/10.1007/s10994-017-5629-5

    Article  MathSciNet  MATH  Google Scholar 

  25. Pfahringer, B., Bensusan, H., Giraud-Carrier, C.G.: Meta-learning by landmarking various learning algorithms. In: 17th International Conference on Machine Learning (ICML), pp. 743–750 (2000)

    Google Scholar 

  26. Pimentel, B.A., de Carvalho, A.C.P.L.F.: A new data characterization for selecting clustering algorithms using meta-learning. Inf. Sci. 477, 203–219 (2019)

    Google Scholar 

  27. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Article  Google Scholar 

  28. Ray, S., Turi, R.H.: Determination of number of clusters in k-means clustering and application in colour segmentation. In: 4th International Conference on Advances in Pattern Recognition and Digital Techniques (ICAPRDT), pp. 137–143 (1999)

    Google Scholar 

  29. Reif, M., Shafait, F., Goldstein, M., Breuel, T., Dengel, A.: Automatic classifier selection for non-experts. Pattern Anal. Appl. 17(1), 83–96 (2014)

    Article  MathSciNet  Google Scholar 

  30. Rice, J.R.: The algorithm selection problem. Adv. Comput. 15, 65–118 (1976)

    Article  Google Scholar 

  31. Rivolli, A., Garcia, L.P.F., Soares, C., Vanschoren, J., de Carvalho, A.C.P.L.F.: Characterizing classification datasets: a study of meta-features for meta-learning. CoRR abs/1808.10406, 1–49 (2019)

    Google Scholar 

  32. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  Google Scholar 

  33. Sakamoto, Y., Ishiguro, M., Kitagawa, G.: Akaike Information Criterion Statistics. Springer, Netherlands (1986)

    MATH  Google Scholar 

  34. Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. 41(1), 1–25 (2008)

    Article  Google Scholar 

  35. Stephenson, N., et al.: Survey of machine learning techniques in drug discovery. Curr. Drug metab. 20(3), 185–193 (2019)

    Article  Google Scholar 

  36. Van Rijn, J.N., et al.: OpenML: a collaborative science platform. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), pp. 645–649 (2013)

    Google Scholar 

  37. Vukicevic, M., Radovanovic, S., Delibasic, B., Suknovic, M.: Extending meta-learning framework for clustering gene expression data with component-based algorithm design and internal evaluation measures. Int. J. Data Min. Bioinform. (IJDMB) 14(2), 101–119 (2016)

    Article  Google Scholar 

  38. Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 841–847 (1991)

    Article  Google Scholar 

Download references

Acknowledgment

Research carried out using the computational resources of the Center for Mathematical Sciences Applied to Industry (CeMEAI) funded by FAPESP (grant 2013/07375-0).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luís P. F. Garcia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Garcia, L.P.F., Campelo, F., Ramos, G.N., Rivolli, A., de Carvalho, A.C.P.d.L.F. (2021). Evaluating Clustering Meta-features for Classifier Recommendation. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13073. Springer, Cham. https://doi.org/10.1007/978-3-030-91702-9_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91702-9_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91701-2

  • Online ISBN: 978-3-030-91702-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics