Towards Efficient Training through Critical Periods

  • Vinicius Yuiti Fukase USP
  • Heitor Gama USP
  • Barbara Bueno USP
  • Lucas Libanio USP

Resumo


Critical Learning Periods encompass an important phenomenon involving deep learning, where early epochs play a decisive role in the success of many training recipes, such as data augmentation. Existing works confirm the existence of this phenomenon and provide useful insights. However, the literature lacks efforts to precisely identify when critical periods occur. In this work, we fill this gap by introducing a systematic approach for identifying critical periods during the training of deep neural networks, focusing on eliminating computationally intensive regularization techniques and effectively applying mechanisms for reducing computational costs, such as data pruning. Our method leverages generalization prediction mechanisms to pinpoint critical phases where training recipes yield maximum benefits to the predictive ability of models. By halting resourceintensive recipes beyond these periods, we significantly accelerate the learning phase and reduce training time, energy consumption, and CO2 emissions. Experiments on standard architectures and benchmarks confirm the effectiveness of our method. Specifically, we achieve significant milestones by reducing the training time of popular architectures by up to 59.67%, leading to a 59.47% decrease in CO2 emissions and a 60% reduction in financial costs, without compromising performance. Our work enhances understanding of training dynamics and paves the way for more sustainable and efficient deep learning practices, particularly in resource-constrained environments. In the era of the race for foundation models, we believe our method emerges as a valuable framework. Code and supplementary material available at https://github.com/baunilhamarga/critical-periods.

Referências

A. Achille, M. Rovere, and S. Soatto, “Critical learning periods in deep networks,” in ICLR, 2019.

M. Kleinman, A. Achille, and S. Soatto, “Critical learning periods for multisensory integration in deep networks,” in CVPR, 2023.

——, “Critical learning periods emerge even in deep linear networks,” in ICLR, 2024.

A. Golatkar, A. Achille, and S. Soatto, “Time matters in regularizing deep networks: Weight decay and data augmentation affect early learning dynamics, matter little near convergence,” in NeurIPS, 2019.

DeepSeek-AI, A. Liu, and B. F. et al., “Deepseek-v3 technical report,” ArXiv, 2024.

A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research),” 2009.

——, “Cifar-100 (canadian institute for advanced research),” 2009.

P. Helber, B. Bischke, A. Dengel, and D. Borth, “Introducing eurosat: A novel dataset and deep learning benchmark for land use and land cover classification,” in IGARSS, 2018.

Y. Le and X. S. Yang, “Tiny imagenet visual recognition challenge,” 2015.

D. Hendrycks, M. Mazeika, S. Kadavath, and D. Song, “Using self-supervised learning can improve model robustness and uncertainty,” in NeurIPS, 2019.

A. Lacoste, A. Luccioni, V. Schmidt, and T. Dandres, “Quantifying the carbon emissions of machine learning,” in NeurIPS, 2019.

A. Faiz, S. Kaneda, R. Wang, R. C. Osi, P. Sharma, F. Chen, and L. Jiang, “Llmcarbon: Modeling the end-to-end carbon footprint of large language models,” in ICLR, 2024.

J. Morrison, C. Na, J. Fernandez, T. Dettmers, E. Strubell, and J. Dodge, “Holistically evaluating the environmental impact of creating language models,” in ICLR, 2025.

P. Maini, M. C. Mozer, H. Sedghi, Z. C. Lipton, J. Z. Kolter, and C. Zhang, “Can neural network memorization be localized?” in ICML, 2023.

G. Yan, H. Wang, and J. Li, “Seizing critical learning periods in federated learning,” in AAAI, 2022.

G. Yan, H. Wang, X. Yuan, and J. Li, “Defl: Defending against model poisoning attacks in federated learning via critical learning periods awareness,” in AAAI, 2023.

——, “Criticalfl: A critical learning periods augmented client selection framework for efficient federated learning,” in KDD, 2023.

A. Faiz, S. Kaneda, R. Wang, R. C. Osi, P. Sharma, F. Chen, and L. Jiang, “Llmcarbon: Modeling the end-to-end carbon footprint of large language models,” in ICLR, 2024.

R. Ballester, X. A. Clemente, C. Casacuberta, M. Madadi, C. A. Corneanu, and S. Escalera, “Predicting the generalization gap in neural networks using topological data analysis,” Neurocomputing, 2024.

K. A. Sankararaman, S. De, Z. Xu, W. R. Huang, and T. Goldstein, “The impact of neural network overparameterization on gradient confusion and stochastic gradient descent,” in ICML, 2020.

Y. Chen, A. Yuille, and Z. Zhou, “Which layer is learning faster? a systematic exploration of layer-wise convergence rate for deep neural networks,” in ICLR, 2023.

S. Carbonnelle and C. D. Vleeschouwer, “Layer rotation: a surprisingly simple indicator of generalization in deep networks?” in ICML, 2019.

A. Dubey and et al., “The llama 3 herd of models,” ArXiv, 2024. [24] S. Yun, D. Han, S. Chun, S. J. Oh, Y. Yoo, and J. Choe, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in ICCV, 2019.

D. Hendrycks, A. Zou, M. Mazeika, L. Tang, B. Li, D. Song, and J. Steinhardt, “Pixmix: Dreamlike pictures comprehensively improve safety measures,” in CVPR, 2022.

H. Zhang, M. Cissé, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in ICLR, 2018.

Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, “Random erasing data augmentation,” in AAAI, 2020.

X. Han, D. Simig, T. Mihaylov, Y. Tsvetkov, A. Celikyilmaz, and T. Wang, “Understanding in-context learning via supportive pretraining data,” in ACL, 2023.

M. Xia, S. Malladi, S. Gururangan, S. Arora, and D. Chen, “LESS: selecting influential data for targeted instruction tuning,” in ICML, 2024.

H. Choi, N. Ki, and H. W. Chung, “BWS: best window selection based on sample scores for data pruning across broad ranges,” in ICML, 2024.

S. Mahabadi and S. Trajanovski, “Core-sets for fair and diverse data summarization,” in NeurIPS, 2023.

L. Engstrom, A. Feldmann, and A. Madry, “Dsdm: Model-aware dataset selection with datamodels,” in ICML, 2024.

G. Xiao, J. Tang, J. Zuo, junxian guo, S. Yang, H. Tang, Y. Fu, and S. Han, “Duoattention: Efficient long-context LLM inference with retrieval and streaming heads,” in ICLR, 2025.

Z. Li, T. Wu, J. Tan, M. Zhang, J. Wang, and D. Lin, “IDIV: Intrinsic decomposition for arbitrary number of input views and illuminations,” in ICLR, 2025.

P. Okanovic, R. Waleffe, V. Mageirakos, K. E. Nikolakakis, A. Karbasi, D. S. Kalogerias, N. M. Gürel, and T. Rekatsinas, “Repeated random sampling for minimizing the time-to-accuracy of learning,” in ICLR, 2024.

E. D. Cubuk, B. Zoph, J. Shlens, and Q. Le, “Randaugment: Practical automated data augmentation with a reduced search space,” in NeurIPS, 2020.

I. Kim, H. Lee, H.-E. Lee, and J. Shin, “Controllable blur data augmentation using 3d-aware motion estimation,” in ICLR, 2025.

J. Robine, M. Höftmann, and S. Harmeling, “Simple, good, fast: Self-supervised world models free of baggage,” in ICLR, 2025.

G. Mason-Williams and F. Dahlqvist, “What makes a good prune? maximal unstructured pruning for maximal cosine similarity,” in ICLR, 2024.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.

Z. Qin, K. Wang, Z. Zheng, J. Gu, X. Peng, Z. Xu, D. Zhou, L. Shang, B. Sun, X. Xie, and Y. You, “Infobatch: Lossless training speed up by unbiased dynamic data pruning,” in ICLR, 2024.

W. Deng, Q. Feng, L. Gao, F. Liang, and G. Lin, “Non-convex learning via replica exchange stochastic gradient MCMC,” in ICML, 2020.
Publicado
30/09/2025
FUKASE, Vinicius Yuiti; GAMA, Heitor; BUENO, Barbara; LIBANIO, Lucas. Towards Efficient Training through Critical Periods. In: WORKSHOP DE TRABALHOS EM ANDAMENTO - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 38. , 2025, Salvador/BA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 218-223.

Artigos mais lidos do(s) mesmo(s) autor(es)