Solaria-GPT: A Tailored ChatGPT Tool for Usability Inspection

  • Lennon Chaves UFAM
  • Márcia Lima UFAM / UEA
  • Tayana Conte UFAM

Abstract


Usability defects in software systems result in challenges for users during their interactions with the software. To address these challenges, usability inspection is key for detecting defects during software development, allowing them to be fixed before the defects reach end users. It also plays a critical role after software implementation, during the software maintenance phase. This paper presents Solaria-GPT, a tool based on a tailored version of ChatGPT, designed to assist software engineers in identifying and classifying usability defects following Nielsen’s heuristics. Solaria-GPT facilitates two interactions: the detection of usability defects in user interfaces and the classification of each defect based on the violated heuristic. To evaluate the tool, we conducted a study to assess the performance of Solaria-GPT across textual and media-based inputs (screenshots and videos), focusing on its accuracy rate (correct heuristic violation), utility rate (valid defect identified by Solaria-GPT), and new defect rate (new defect identified by Solaria-GPT). The results indicated a 96.23% accuracy rate in heuristic classification from textual inputs. For utility rate and new defect rate, the Solaria-GPT achieved 86.67% and 86.67%, respectively, for screenshots, and 16.13% and 87.10%, respectively, for videos. Comparisons with other large language models (Claude, Qwen, Gemini, and Deepseek) demonstrated that Solaria-GPT outperformed all alternatives across metrics. These findings suggest that Solaria-GPT is a promising tool for enhancing software usability during the software development lifecycle. Demo Video: https://doi.org/10.5281/zenodo.15275798

Keywords: ChatGPT, LLM, Heuristic Evaluation, Usability Inspection

References

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023).

Anthropic. 2024. The Claude 3 Model Family: Opus, Sonnet, Haiku. [link]

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. 2023. Qwen technical report. arXiv preprint arXiv:2309.16609 (2023).

Victor R Basili. 1994. Goal, question, metric paradigm. Encyclopedia of software engineering 1 (1994), 528–532.

DeepSeek-AI. 2024. DeepSeek-V3 Technical Report. arXiv:2412.19437 [cs.CL] [link]

Yifei Gong, Feng Gu, Kengbin Chen, and Fei Wang. 2020. The Architecture of Micro-services and the Separation of Frond-end and Back-end Applied in a Campus Information System. In 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA). IEEE, 321–324.

Sabda Norman Hayat and Fatwa Ramdani. 2021. A comparative analysis of usability evaluation methods of academic mobile application: are four methods better?. In Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology (Malang, Indonesia) (SIET ’20). Association for Computing Machinery, New York, NY, USA, 136–141. DOI: 10.1145/3427423.3427435

James Lewis and Jeff Sauro. 2021. Usability and User Experience: Design and Evaluation. 972–1015. DOI: 10.1002/9781119636113.ch38

I Scott MacKenzie. 2024. Human-computer interaction: An empirical research perspective.

Ankita Madan and Sanjay Kumar. 2012. Usability evaluation methods: a literature review. International Journal of Engineering Science and Technology 4 (02 2012).

Leonardo Marques, Patrícia Matsubara, Walter Nakamura, Igor Wiese, Luciana Zaina, and Tayana Conte. 2019. UX-Tips: A UX evaluation technique to support the identification of software application problems. In Anais do XXXIII Simpósio Brasileiro de Engenharia de Software (Salvador). SBC, Porto Alegre, RS, Brasil. [link]

Suéllen Martinelli, Nicolas Nascimento, Jonathan Souza, Afonso Sales, and Luciana Zaina. 2022. UX Requirements Matters: Guidelines to Support Software Teams on the Writing of Acceptance Criteria. In Proceedings of the XXXVI Brazilian Symposium on Software Engineering (Virtual Event, Brazil) (SBES ’22). Association for Computing Machinery, New York, NY, USA, 398–408. DOI: 10.1145/3555228.3555230

Marshall McLuhan. 1977. Laws of the Media. ETC: A Review of General Semantics (1977), 173–179.

Walter T. Nakamura, Edson Cesar de Oliveira, Elaine H.T. de Oliveira, David Redmiles, and Tayana Conte. 2022. What factors affect the UX in mobile apps? A systematic mapping study on the analysis of app store reviews. J. Syst. Softw. 193, C (Nov. 2022), 28 pages. DOI: 10.1016/j.jss.2022.111462

Muhammad Nasir, Naveed Ikram, and Zakia Jalil. 2022. Usability inspection: Novice crowd inspectors versus expert. Journal of Systems and Software 183 (2022), 111122.

Roberto Natella, Stefan Winter, Domenico Cotroneo, and Neeraj Suri. 2020. Analyzing the Effects of Bugs on Software Interfaces. IEEE Transactions on Software Engineering 46, 3 (2020), 280–301. DOI: 10.1109/TSE.2018.2850755

Jakob Nielsen. 1995. Usability inspection methods. In Conference Companion on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’95). Association for Computing Machinery, New York, NY, USA, 377–378. DOI: 10.1145/223355.223730

Open AI. 2025. ChatGPT Plus. [link]

Open AI. 2025. Creating a GPT. [link]

Open AI. 2025. Introducing GPT Store. [link]

Eduardo Gouveia Pinheiro, Larissa Albano Lopes, Tayana Uchôa Conte, and Luciana Aparecida Martinez Zaina. 2018. The contribution of non-technical stakeholders on the specification of UX requirements: an experimental study using the proto-persona technique. In Proceedings of the XXXII Brazilian Symposium on Software Engineering (Sao Carlos, Brazil) (SBES ’18). Association for Computing Machinery, NewYork, NY, USA, 92–101. DOI: 10.1145/3266237.3266268

Ali Ebrahimi Pourasad and Walid Maalej. 2025. Does GenAI Make Usability Testing Obsolete?. In 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). 437–449. DOI: 10.1109/ICSE55347.2025.00138

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018).

Luis Rivero, Guto Kawakami, and Tayana Uchoa Conte. 2014. Using a Controlled Experiment to Evaluate Usability Inspection Technologies for Improving the Quality of Mobile Web Applications Earlier in their Design. In 2014 Brazilian Symposium on Software Engineering. 161–170. DOI: 10.1109/SBES.2014.24

Margaret-Anne Storey, Daniel Russo, Nicole Novielli, Takashi Kobayashi, and Dong Wang. 2024. A Disruptive Research Playbook for Studying Disruptive Innovations. ACM Trans. Softw. Eng. Methodol. 33, 8, Article 195 (Nov. 2024), 29 pages. DOI: 10.1145/3678172

Gemini Team. 2024. Gemini: A Family of Highly Capable Multimodal Models. arXiv:2312.11805 [cs.CL] [link]

Christoph Treude and Margaret-Anne Storey. 2025. Generative AI and Empirical Software Engineering: A Paradigm Shift. arXiv:2502.08108 [cs.SE] [link]

Natasha Malveira Costa Valentim and Tayana Conte. 2014. Improving a Usability Inspection Technique Based on Quantitative and Qualitative Analysis. In 2014 Brazilian Symposium on Software Engineering. 171–180. DOI: 10.1109/SBES.2014.23

Jason Wei, Maarten Bosma, Vincent Y Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. 2021. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652 (2021).

Rui Zhong, Yang Cao, Jun Yu, and Masaharu Munetomo. 2024. Large Language Model Assisted Adversarial Robustness Neural Architecture Search. In 2024 6th International Conference on Data-driven Optimization of Complex Systems (DOCS). 433–437. DOI: 10.1109/DOCS63458.2024.10704419
Published
2025-09-22
CHAVES, Lennon; LIMA, Márcia; CONTE, Tayana. Solaria-GPT: A Tailored ChatGPT Tool for Usability Inspection. In: BRAZILIAN SYMPOSIUM ON SOFTWARE ENGINEERING (SBES), 39. , 2025, Recife/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 956-962. ISSN 2833-0633. DOI: https://doi.org/10.5753/sbes.2025.11455.