An Empirical Study of Large Language Models as Experts in Software Trustworthiness Assessment
Resumo
As software plays an increasingly central role in daily life, ensuring its trustworthiness is essential. Existing Software Trustworthiness Assessment (STA) techniques often lack theoretical grounding, disregard user expectations, and provide limited actionable guidance. At the same time, Large Language Models (LLMs) have shown strong capabilities in software engineering tasks, but their use for holistic and interpretable STA remains unexplored. This paper presents an exploration of using LLMs to perform context-aware STA, with a specific focus on system software functions written in C programming language. Our approach involved selecting functions from the Linux Kernel, designing tailored prompts, and executing LLM-based assessments. We report on three practical experiences, two involving expert-based assessments and one comparing results against SCOLP, a state-of-the-art automated technique. The results show that LLM-based categorizations achieve substantial agreement with most voted expert rankings, fair agreement with consensus ranking, and only slight agreement with the automated baseline. These experiences provide insight into the limitations of LLMs in supporting trustworthiness assessments in real-world systems.
Publicado
27/10/2025
Como Citar
JANANLOO, Saeed Javani; PEREIRA, José D’Abruzzo; VIEIRA, Marco.
An Empirical Study of Large Language Models as Experts in Software Trustworthiness Assessment. In: LATIN-AMERICAN SYMPOSIUM ON DEPENDABLE COMPUTING (LADC), 14. , 2025, Valparaíso/Chile.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 75-93.
