Advanced Machine Learning for Runtime Data Generation

Bukhtawar Zamir; João R. Campos; Marco Vieira

Bukhtawar Zamir University of Coimbra
João R. Campos University of Coimbra
Marco Vieira University of Coimbra

Resumo

Given the ubiquity of software in everyday critical tasks, ensuring its dependability is of utmost importance. Software faults, which can lead to errors and vulnerabilities, can significantly comprise the target system. Various techniques have been developed to improve the dependability of software-intensive systems, from fault avoidance to fault tolerance. Machine Learning (ML) techniques have been playing a vital role in improving the dependability of systems. Nonetheless, such techniques require significant amounts of data, which are not typically available. To overcome this, various techniques, such as fault injection or intrusion injection, have been proposed to generate realistic data. Still, they are computationally expensive and require considerable expertise. At the same time, a recent growing sub-field of ML is generative models. Generative models offer an innovative solution by creating synthetic data that closely resemble real-world samples. If such models could be used to generate realistic synthetic failure or intrusion data on demand, their value would be significant. Notwithstanding, the feasibility of such an approach has not yet been researched. Generative models have only mostly been used for sequential data (e.g., text or music) or data with high spatial dependency (e.g., images). On the other hand, dependability problems often have high dimensional tabular data, for which generative models are yet to excel, and for which it is also considerably more difficult to assess the representativeness of the generated data. This research will focus on determining the feasibility of using generative techniques to generate runtime data to support dependability research.

Palavras-chave: Machine Learning, Generative Models, Artificial Intelligence