Towards Efficient Training through Critical Periods

Vinicius Yuiti Fukase; Heitor Gama; Barbara Bueno; Lucas Libanio

doi:10.5753/sibgrapi.est.2025.38300

Vinicius Yuiti Fukase USP
Heitor Gama USP
Barbara Bueno USP
Lucas Libanio USP

DOI: https://doi.org/10.5753/sibgrapi.est.2025.38300

Resumo

Critical Learning Periods encompass an important phenomenon involving deep learning, where early epochs play a decisive role in the success of many training recipes, such as data augmentation. Existing works confirm the existence of this phenomenon and provide useful insights. However, the literature lacks efforts to precisely identify when critical periods occur. In this work, we fill this gap by introducing a systematic approach for identifying critical periods during the training of deep neural networks, focusing on eliminating computationally intensive regularization techniques and effectively applying mechanisms for reducing computational costs, such as data pruning. Our method leverages generalization prediction mechanisms to pinpoint critical phases where training recipes yield maximum benefits to the predictive ability of models. By halting resourceintensive recipes beyond these periods, we significantly accelerate the learning phase and reduce training time, energy consumption, and CO₂ emissions. Experiments on standard architectures and benchmarks confirm the effectiveness of our method. Specifically, we achieve significant milestones by reducing the training time of popular architectures by up to 59.67%, leading to a 59.47% decrease in CO₂ emissions and a 60% reduction in financial costs, without compromising performance. Our work enhances understanding of training dynamics and paves the way for more sustainable and efficient deep learning practices, particularly in resource-constrained environments. In the era of the race for foundation models, we believe our method emerges as a valuable framework. Code and supplementary material available at https://github.com/baunilhamarga/critical-periods.

Referências

A. Achille, M. Rovere, and S. Soatto, “Critical learning periods in deep networks,” in ICLR, 2019.

M. Kleinman, A. Achille, and S. Soatto, “Critical learning periods for multisensory integration in deep networks,” in CVPR, 2023.

——, “Critical learning periods emerge even in deep linear networks,” in ICLR, 2024.

A. Golatkar, A. Achille, and S. Soatto, “Time matters in regularizing deep networks: Weight decay and data augmentation affect early learning dynamics, matter little near convergence,” in NeurIPS, 2019.

DeepSeek-AI, A. Liu, and B. F. et al., “Deepseek-v3 technical report,” ArXiv, 2024.

A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research),” 2009.

——, “Cifar-100 (canadian institute for advanced research),” 2009.

P. Helber, B. Bischke, A. Dengel, and D. Borth, “Introducing eurosat: A novel dataset and deep learning benchmark for land use and land cover classification,” in IGARSS, 2018.

Y. Le and X. S. Yang, “Tiny imagenet visual recognition challenge,” 2015.

D. Hendrycks, M. Mazeika, S. Kadavath, and D. Song, “Using self-supervised learning can improve model robustness and uncertainty,” in NeurIPS, 2019.

A. Lacoste, A. Luccioni, V. Schmidt, and T. Dandres, “Quantifying the carbon emissions of machine learning,” in NeurIPS, 2019.

A. Faiz, S. Kaneda, R. Wang, R. C. Osi, P. Sharma, F. Chen, and L. Jiang, “Llmcarbon: Modeling the end-to-end carbon footprint of large language models,” in ICLR, 2024.

J. Morrison, C. Na, J. Fernandez, T. Dettmers, E. Strubell, and J. Dodge, “Holistically evaluating the environmental impact of creating language models,” in ICLR, 2025.

P. Maini, M. C. Mozer, H. Sedghi, Z. C. Lipton, J. Z. Kolter, and C. Zhang, “Can neural network memorization be localized?” in ICML, 2023.

G. Yan, H. Wang, and J. Li, “Seizing critical learning periods in federated learning,” in AAAI, 2022.

G. Yan, H. Wang, X. Yuan, and J. Li, “Defl: Defending against model poisoning attacks in federated learning via critical learning periods awareness,” in AAAI, 2023.

——, “Criticalfl: A critical learning periods augmented client selection framework for efficient federated learning,” in KDD, 2023.

A. Faiz, S. Kaneda, R. Wang, R. C. Osi, P. Sharma, F. Chen, and L. Jiang, “Llmcarbon: Modeling the end-to-end carbon footprint of large language models,” in ICLR, 2024.

R. Ballester, X. A. Clemente, C. Casacuberta, M. Madadi, C. A. Corneanu, and S. Escalera, “Predicting the generalization gap in neural networks using topological data analysis,” Neurocomputing, 2024.

K. A. Sankararaman, S. De, Z. Xu, W. R. Huang, and T. Goldstein, “The impact of neural network overparameterization on gradient confusion and stochastic gradient descent,” in ICML, 2020.

Y. Chen, A. Yuille, and Z. Zhou, “Which layer is learning faster? a systematic exploration of layer-wise convergence rate for deep neural networks,” in ICLR, 2023.

S. Carbonnelle and C. D. Vleeschouwer, “Layer rotation: a surprisingly simple indicator of generalization in deep networks?” in ICML, 2019.

A. Dubey and et al., “The llama 3 herd of models,” ArXiv, 2024. [24] S. Yun, D. Han, S. Chun, S. J. Oh, Y. Yoo, and J. Choe, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in ICCV, 2019.

D. Hendrycks, A. Zou, M. Mazeika, L. Tang, B. Li, D. Song, and J. Steinhardt, “Pixmix: Dreamlike pictures comprehensively improve safety measures,” in CVPR, 2022.

H. Zhang, M. Cissé, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in ICLR, 2018.

Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, “Random erasing data augmentation,” in AAAI, 2020.

X. Han, D. Simig, T. Mihaylov, Y. Tsvetkov, A. Celikyilmaz, and T. Wang, “Understanding in-context learning via supportive pretraining data,” in ACL, 2023.

M. Xia, S. Malladi, S. Gururangan, S. Arora, and D. Chen, “LESS: selecting influential data for targeted instruction tuning,” in ICML, 2024.

H. Choi, N. Ki, and H. W. Chung, “BWS: best window selection based on sample scores for data pruning across broad ranges,” in ICML, 2024.

S. Mahabadi and S. Trajanovski, “Core-sets for fair and diverse data summarization,” in NeurIPS, 2023.

L. Engstrom, A. Feldmann, and A. Madry, “Dsdm: Model-aware dataset selection with datamodels,” in ICML, 2024.

G. Xiao, J. Tang, J. Zuo, junxian guo, S. Yang, H. Tang, Y. Fu, and S. Han, “Duoattention: Efficient long-context LLM inference with retrieval and streaming heads,” in ICLR, 2025.

Z. Li, T. Wu, J. Tan, M. Zhang, J. Wang, and D. Lin, “IDIV: Intrinsic decomposition for arbitrary number of input views and illuminations,” in ICLR, 2025.

P. Okanovic, R. Waleffe, V. Mageirakos, K. E. Nikolakakis, A. Karbasi, D. S. Kalogerias, N. M. Gürel, and T. Rekatsinas, “Repeated random sampling for minimizing the time-to-accuracy of learning,” in ICLR, 2024.

E. D. Cubuk, B. Zoph, J. Shlens, and Q. Le, “Randaugment: Practical automated data augmentation with a reduced search space,” in NeurIPS, 2020.

I. Kim, H. Lee, H.-E. Lee, and J. Shin, “Controllable blur data augmentation using 3d-aware motion estimation,” in ICLR, 2025.

J. Robine, M. Höftmann, and S. Harmeling, “Simple, good, fast: Self-supervised world models free of baggage,” in ICLR, 2025.

G. Mason-Williams and F. Dahlqvist, “What makes a good prune? maximal unstructured pruning for maximal cosine similarity,” in ICLR, 2024.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.

Z. Qin, K. Wang, Z. Zheng, J. Gu, X. Peng, Z. Xu, D. Zhou, L. Shang, B. Sun, X. Xie, and Y. You, “Infobatch: Lossless training speed up by unbiased dynamic data pruning,” in ICLR, 2024.

W. Deng, Q. Feng, L. Gao, F. Liang, and G. Lin, “Non-convex learning via replica exchange stochastic gradient MCMC,” in ICML, 2020.

Towards Efficient Training through Critical Periods

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)