Optimizing Early Exits in Deep Neural Networks: How to Handle Buffers?
Abstract
Early-exit Deep Neural Networks (EE-DNNs) insert intermediate branches that enable local inference when confidence exceeds predefined thresholds, reducing reliance on cloud processing. However, fixed thresholds fail to adapt to real-world contextual variations. This work investigates dynamically adaptive threshold using multi-armed bandits (MABs) to address concept drift caused by contextual changes. Additionally, a finite input buffer is introduced to balance the accuracy-latency trade-off based on both confidence levels and queue size. Experimental results demonstrate that MAB-based thresholds converge rapidly, across diverse contexts, while the buffer ensures efficient balance the accuracy-latency trade-off.
References
Casale, G., & Roveri, M. (2023). Scheduling inputs in early exit neural networks. IEEE Transactions on Computers.
Fang, B., Zeng, X., Zhang, F., Xu, H., & Zhang, M. (2020). FlexDNN: Input-adaptive on-device deep learning for efficient mobile vision. In IEEE/ACM Symposium on Edge Computing (SEC) (pp. 84–95).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778).
Hu, C., Bao, W., Wang, D., & Liu, F. (2019). Dynamic adaptive DNN surgery for inference acceleration on the edge. In INFOCOM (pp. 1423–1431).
Ju, W., Bao, W., Ge, L., & Yuan, D. (2021a). Dynamic early exit scheduling for deep neural network inference through contextual bandits. In ACM Conference on Information (pp. 823–832).
Ju, W., Bao, W., Yuan, D., Ge, L., & Zhou, B. B. (2021b). Learning early exit for deep neural network inference on mobile devices through multi-armed bandits. In IEEE/ACM CCGrid (pp. 11–20).
Kang, Y., Hauswald, J., Gao, C., Rovinski, A., et al. (2017). Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In ACM Computer Architecture News, 45, 615–629.
Kim, G., & Park, J. (2020). Low cost early exit decision unit design for CNN accelerator. In IEEE International SoC Design Conference (pp. 127–128).
Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images. Master's Thesis, University of Toronto.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25.
Laskaridis, S., Venieris, S. I., Almeida, M., Leontiadis, I., & Lane, N. D. (2020). SPINN: Synergistic progressive inference of neural networks over device and cloud. In ACM MobiCom (pp. 1–15).
Li, E., Zeng, L., Zhou, Z., & Chen, X. (2019). Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications, 19(1), 447–457.
Pacheco, R., Oliveira, F. R., & Couto, R. (2021b). Early-exit deep neural networks for distorted images: Providing an efficient edge offloading. In IEEE GLOBECOM (pp. 1–6).
Pacheco, R. G., Bajpai, D. J., Shifrin, M., Couto, R. S., Menasché, D. S., Hanawal, M. K., & Campista, M. E. M. (2024). UCBEE: A multi-armed bandit approach for early-exit in neural networks. IEEE Transactions on Network and Service Management.
Puterman, M. L. (2014). Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons.
Satyanarayanan, M. (2017). The emergence of edge computing. Computer, 50(1), 30–39.
Shifrin, M., Menasché, D. S., Cohen, A., Goeckel, D., & Gurewitz, O. (2020). Optimal PHY configuration in wireless networks. IEEE/ACM Transactions on Networking, 28(6), 2601–2614.
Teerapittayanon, S., McDanel, B., & Kung, H.-T. (2016). Branchynet: Fast inference via early exiting from deep neural networks. In IEEE International Conference on Pattern Recognition (ICPR) (pp. 2464–2469).
Wang, M., Mo, J., Lin, J., Wang, Z., & Du, L. (2019a). Dynexit: A dynamic early-exit strategy for deep residual networks. In IEEE International Workshop on Signal Processing Systems (SiPS) (pp. 178–183).
Wang, Z., Bao, W., et al. (2019b). SEE: Scheduling early exit for mobile DNN inference during service outage. In ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems (pp. 279–288).
