An Empirical Study of Large Language Models as Experts in Software Trustworthiness Assessment

Saeed Javani Jananloo; José D’Abruzzo Pereira; Marco Vieira

Saeed Javani Jananloo University of Coimbra
José D’Abruzzo Pereira University of Coimbra
Marco Vieira University of North Carolina at Charlotte

Resumo

As software plays an increasingly central role in daily life, ensuring its trustworthiness is essential. Existing Software Trustworthiness Assessment (STA) techniques often lack theoretical grounding, disregard user expectations, and provide limited actionable guidance. At the same time, Large Language Models (LLMs) have shown strong capabilities in software engineering tasks, but their use for holistic and interpretable STA remains unexplored. This paper presents an exploration of using LLMs to perform context-aware STA, with a specific focus on system software functions written in C programming language. Our approach involved selecting functions from the Linux Kernel, designing tailored prompts, and executing LLM-based assessments. We report on three practical experiences, two involving expert-based assessments and one comparing results against SCOLP, a state-of-the-art automated technique. The results show that LLM-based categorizations achieve substantial agreement with most voted expert rankings, fair agreement with consensus ranking, and only slight agreement with the automated baseline. These experiences provide insight into the limitations of LLMs in supporting trustworthiness assessments in real-world systems.