Adaptive Model Switching for Dynamic Inference Serving in the IoT-edge-cloud Continuum

Tiago Knorst; Michael G. Jordan; Guilherme Korol; Mateus Beck Rutzig; Antonio Carlos Schneider Beck

Tiago Knorst UFRGS
Michael G. Jordan UFRGS
Guilherme Korol NXP Semiconductors
Mateus Beck Rutzig UFSM
Antonio Carlos Schneider Beck UFRGS

Resumo

Although Internet of Things (IoT) devices have enabled Deep Learning (DL) applications to operate closer to end-users, they struggle to scale with modern, computationally heavy DL algorithms due to their limited processing power. A promising solution is to offload computation to remote edge and cloud servers throughout the IoT-edge-cloud continuum. However, inconsistent network conditions, variable latency, and the distinct accuracy and precision requirements of different DL applications hinder the ability to uphold necessary performance standards. To address these issues, we propose AdaptSwitch, a dynamic framework that selects both the DL model and the most suitable execution location across the IoT-edge-cloud continuum, based on the application’s model accuracy and latency needs according to network conditions at a given moment. We evaluated AdaptSwitch using real IoT, edge, and cloud devices running state-of-the-art DL models for image classification while considering network delay data from real-world 5G/Edge/Cloud traffic datasets. Experimental results demonstrate that our approach improves Quality of Experience (QoE) by reducing inference latency while ensuring accuracy requirements are met through adaptive model switching.

Palavras-chave: Adaptation models, Cloud computing, Accuracy, Computational modeling, Switches, Internet of Things, Servers, Quality of experience, Modeling, Load modeling, DL, Model, Switching, Inference, Serving, Offloading, Continuum, IoT, Edge, Cloud