Development of a Voice-Based Virtual Assistant Powered by LLMs to Facilitate Interaction of Visually Impaired Students with Operating Systems
Abstract
This article presents a project using LLMs with virtual assistants to integrate open-source, text-focused LLMs with audio input. Combining LM Studio and PostgreSQL to manage data and context, it evaluates LLaMA 3, Mistral, and Phi 3 for optimal performance. The results emphasize cost efficiency, flexibility, and data privacy. This solution aims to help visually impaired individuals access information and perform tasks independently with a modular and expandable platform that reduces dependency on external API keys and operational costs.
Keywords:
Chatbots, Pedagogical Agents, LLMs, AI assistant
References
Bala, A. Multimodal LLM using Federated Visual Instruction Tuning for Visually Impaired. IEEE Transactions on Neural Networks and Learning Systems, v.33, n.5, p.2156-2168, 2022.
Borek, C. Comparative evaluation of LLM-based approaches to chatbot creation. Journal of Artificial Intelligence Research, v.65, p.123-145, 2022.
Brown, T.; et al. Language models are few-shot learners. In: Advances in Neural Information Processing Systems, v.33, p.1877-1901, 2020.
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), p.4171-4186, 2019.
Howard, J.; Ruder, S. Universal Language Model Fine-tuning for Text Classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), p.328-339, 2018.
Klemmer, E.; et al. Evaluating Voice-based Assistant for Visually Impaired Users. Proceedings of the ACM on Human-Computer Interaction, v.4, n.CSCW2, p.1-23, 2020.
LM Studio. Discover, download, and run local LLMs. Disponível em: [link]. Acesso em: 18 set. 2023.
nickolaslivero/phidata. Phidata Repository. Disponível em: [link]. Acesso em: 18 set. 2023.
Phidata. LLM OS Architecture. Disponível em: [link]. Acesso em: 18 set. 2023.
Radford, A.; et al. Language Models are Unsupervised Multitask Learners. OpenAI Blog, 2019.
Rafat, M. I. AI-powered Legal Virtual Assistant: Utilizing LLM Optimized by RAG for Housing Dispute Resolution in Finland. Artificial Intelligence and Law, v.31, p.67-88, 2023.
Simeoni, I.; Torroni, P. Empathic Voice: Enabling Emotional Intelligence in Virtual Assistants. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), p.1021-1032, 2021.
Touvron, H.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971, 2023.
Vaswani, A.; et al. Attention Is All You Need. In: Advances in Neural Information Processing Systems, v.30, p.5998-6008, 2017.
Vu, M. D.; et al. GPTVoiceTasker: AI-Powered Voice Assistant for Smartphones. Mobile Computing and Communications Review, v.27, n.3, p.44-59, 2023.
Borek, C. Comparative evaluation of LLM-based approaches to chatbot creation. Journal of Artificial Intelligence Research, v.65, p.123-145, 2022.
Brown, T.; et al. Language models are few-shot learners. In: Advances in Neural Information Processing Systems, v.33, p.1877-1901, 2020.
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), p.4171-4186, 2019.
Howard, J.; Ruder, S. Universal Language Model Fine-tuning for Text Classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), p.328-339, 2018.
Klemmer, E.; et al. Evaluating Voice-based Assistant for Visually Impaired Users. Proceedings of the ACM on Human-Computer Interaction, v.4, n.CSCW2, p.1-23, 2020.
LM Studio. Discover, download, and run local LLMs. Disponível em: [link]. Acesso em: 18 set. 2023.
nickolaslivero/phidata. Phidata Repository. Disponível em: [link]. Acesso em: 18 set. 2023.
Phidata. LLM OS Architecture. Disponível em: [link]. Acesso em: 18 set. 2023.
Radford, A.; et al. Language Models are Unsupervised Multitask Learners. OpenAI Blog, 2019.
Rafat, M. I. AI-powered Legal Virtual Assistant: Utilizing LLM Optimized by RAG for Housing Dispute Resolution in Finland. Artificial Intelligence and Law, v.31, p.67-88, 2023.
Simeoni, I.; Torroni, P. Empathic Voice: Enabling Emotional Intelligence in Virtual Assistants. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), p.1021-1032, 2021.
Touvron, H.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971, 2023.
Vaswani, A.; et al. Attention Is All You Need. In: Advances in Neural Information Processing Systems, v.30, p.5998-6008, 2017.
Vu, M. D.; et al. GPTVoiceTasker: AI-Powered Voice Assistant for Smartphones. Mobile Computing and Communications Review, v.27, n.3, p.44-59, 2023.
Published
2024-11-04
How to Cite
LIVERO, Nickolas J. S.; SILVA, Fabio S..
Development of a Voice-Based Virtual Assistant Powered by LLMs to Facilitate Interaction of Visually Impaired Students with Operating Systems. In: BRAZILIAN SYMPOSIUM ON COMPUTERS IN EDUCATION (SBIE), 35. , 2024, Rio de Janeiro/RJ.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 2987-2998.
DOI: https://doi.org/10.5753/sbie.2024.244662.
