Analyzing datasets under Instance Space Analysis framework: extension and exploration of hardness embeddings

  • Diogo B. Rodrigues ITA
  • Lucas R. R. Barros ITA
  • Alfredo A. A. E. de Queiroz ITA
  • Ana C. Lorena ITA

Resumo


In Machine Learning (ML), no single algorithm consistently outperforms others across all datasets. Meta-Learning aims to address the challenge of choosing suitable ML algorithms for new tasks by leveraging relationships between dataset characteristics and algorithm performance. One framework that supports this process is Instance Space Analysis (ISA), which enables the visualization of algorithm performance and instance hardness in a two-dimensional space. This study expands the ISA framework for analyzing classification datasets and ML algorithms, making it more complete and enabling meaningful insights into the relationships between dataset characteristics and ML classification performance.

Referências

Brazdil, P., Van Rijn, J. N., Soares, C., and Vanschoren, J. (2022). Metalearning: applications to automated machine learning and data mining. Springer Nature.

Katial, V., Smith-Miles, K., Hill, C., and Hollenberg, L. (2025). On the instance dependence of parameter initialization for the quantum approximate optimization algorithm: Insights via instance space analysis. INFORMS Journal on Computing, 37(1):146–171.

Liu, C., Smith-Miles, K., Wauters, T., and Costa, A. M. (2024). Instance space analysis for 2d bin packing mathematical models. European Journal of Operational Research, 315(2):484–498.

Lorena, A. C., Paiva, P. Y., and Prudêncio, R. B. (2024). Trusting my predictions: on the value of instance-level analysis. ACM Computing Surveys, 56(7):1–28.

Muñoz, M. A., Yan, T., Leal, M. R., Smith-Miles, K., Lorena, A. C., Pappa, G. L., and Rodrigues, R. M. (2021). An instance space analysis of regression problems. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(2):1–25.

Muñoz, M. A., Villanova, L., Baatar, D., and Smith-Miles, K. (2018). Instance spaces for machine learning classification. Machine Learning, 107:109–147.

Neelofar, N., Smith-Miles, K., Muñoz, M. A., and Aleti, A. (2023). Instance space analysis of search-based software testing. IEEE Transactions on Software Engineering, 49(4):2642–2660.

Paiva, P. Y. A., Moreno, C. C., Smith-Miles, K., Valeriano, M. G., and Lorena, A. C. (2022). Relating instance hardness to classification performance in a dataset: a visual approach. Machine Learning, 111(8):3085–3123.

Paiva, P. Y. A., Smith-Miles, K., Valeriano, M. G., and Lorena, A. C. (2021). Pyhard: a novel tool for generating hardness embeddings to support data-centric analysis. arXiv preprint arXiv:2109.14430.

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ”why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pages 1135–1144.

Rice, J. R. (1976). The algorithm selection problem. Advances in Computers, 15:65–118.

Salojärvi, J., Puolamäki, K., Simola, J., Kovanen, L., Kojo, I., and Kaski, S. (2005). Inferring relevance from eye movements: Feature extraction. Publications in Computer and Information Science, Report A82, Helsinki University of Technology, Helsinki, Finland.

Seedat, N., Imrie, F., and van der Schaar, M. (2024). Dissecting sample hardness: Fine-grained analysis of hardness characterization methods. In The Twelfth International Conference on Learning Representations.

Smith-Miles, K. and Muñoz, M. A. (2023). Instance space analysis for algorithmnbsp;testing: Methodology and software tools. ACM Comput. Surv., 55(12).

Smith-Miles, K. A. (2009). Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv., 41(1).

Vanschoren, J. (2018). Meta-learning: A survey. CoRR, abs/1810.03548.

Wolpert, D. H. (2002). The Supervised Learning No-Free-Lunch Theorems, pages 25–42. Springer London, London.
Publicado
29/09/2025
RODRIGUES, Diogo B.; BARROS, Lucas R. R.; QUEIROZ, Alfredo A. A. E. de; LORENA, Ana C.. Analyzing datasets under Instance Space Analysis framework: extension and exploration of hardness embeddings. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 22. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 1658-1669. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2025.13857.