Toward Development of A.D.A. – Advanced Distributed Assistant

Fernando Freire; Thatiane Rosa; Guilherme Feulo; Carlos Elmadjian; Renato Cordeiro; Shayenne Moura; Acácio Andrade; Lucy Anne de Omena; Augusto Vicente; Felipe Marques; Aléxia Sheffer; Otávio Hideki; Patrícia Nascimento; Daniel Cordeiro; Alfredo Goldman

doi:10.5753/wscad.2020.14070

Fernando Freire USP
Thatiane Rosa USP / IFTO
Guilherme Feulo USP
Carlos Elmadjian USP
Renato Cordeiro USP
Shayenne Moura USP
Acácio Andrade USP
Lucy Anne de Omena USP
Augusto Vicente USP
Felipe Marques USP
Aléxia Sheffer USP
Otávio Hideki USP
Patrícia Nascimento USP
Daniel Cordeiro USP
Alfredo Goldman USP

DOI: https://doi.org/10.5753/wscad.2020.14070

Resumo

The A.D.A. – Advanced Distributed Assistant – project aims to build a smart distributed personal assistant, that is, a virtual agent that can interact with the user through an ecosystem of devices, such as IoT (Internet of Things), by voice commands in Portuguese. The project is divided into six scientiﬁc initiation subprojects from different areas of computer science, where each one is co-advised by a graduate student. An open source proof of concept is being created in order to demonstrate the assistant capabilities and its applications in public and private domains.

Referências

Abdul-Kader, S. A. and Woods, J. (2015). Survey on chatbot design techniques in speech conversation systems. International Journal of Advanced Computer Science and Applications, 6(7).

Ashby, S., Barbosa, S., Brandão, S., Ferreira, J. P., Janssen, M., Silva, C., and Viaro, M. E. (2012). A rule based pronunciation generator and regional accent databank for portuguese. In Thirteenth Annual Conference of the International Speech Communication Association.

Batista, C. T., Dias, A. L., and Neto, N. C. S. (2018). Baseline acoustic models for brazilian portuguese using kaldi tools. In IberSPEECH, pages 77–81.

Bonér, J. (2016). Reactive microservices architecture : design principles for distributed systems. O'Reilly Media.

Cardellini, V., Colajanni, M., and Yu, P. (1999). Dynamic load balancing on Web-server systems. IEEE Internet Computing, 3(3):28–39.

Coucke, A., Saade, A., Ball, A., Bluche, T., Caulier, A., Leroy, D., Doumouro, C., Gisselbrecht, T., Caltagirone, F., Lavril, T., et al. (2018). Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190.

Crispin, L. and Gregory, J. (2009). Agile testing: A practical guide for testers and agile teams. Pearson Education.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Felix Richter (2016). Digital Assistants Always at Your Service. last accessed on 02/06/20 www.statista.com/chart/5621/users-of-virtual-digital-assistants/.

Fiannaca, A. J., Paradiso, A., Campbell, J., and Morris, M. R. (2018). Voicesetting. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems CHI '18, pages 1–12, New York, New York, USA. ACM Press.

Gamma, E. (1995). Design patterns : elements of reusable object-oriented software. Addison-Wesley.

Grosz, B. J. and Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Computational linguistics, 12(3):175–204.

Howard, J. and Ruder, S. (2018). Universal language model ne-tuning for text classication. arXiv preprint arXiv:1801.06146.

Lovelace, A. K. and Toole, B. A. B. A. (1992). Ada, the enchantress of numbers : a selection from the letters of Lord Byron's daughter and her description of the rst computer. Strawberry Press.

Luz, F. F. (2019). Deep neural semantic parsing: translating from natural language into SPARQL. PhD thesis, Universidade de São Paulo.

Mitchell, T. M., Caruana, R., Freitag, D., McDermott, J., and Zabowski, D. (1994). Experience with a learning personal assistant. Communications of the ACM, 37(7):80–91.

Myers, C., Furqan, A., Nebolsky, J., Caro, K., and Zhu, J. (2018). Patterns for How Users Overcome Obstacles in Voice User Interfaces. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems CHI '18, pages 1–7, New York, New York, USA. ACM Press.

Newman, S. (2015). Building Microservices: Designing Fine-Grained Systems. O'Reilly Media, Sebastopol USA.

Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and ZettlearXiv preprint moyer, L. (2018). Deep contextualized word representations. arXiv:1802.05365.

Ping, W., Peng, K., and Chen, J. (2019). Clarinet: Parallel wave generation in end-to-end text-to-speech. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.

Ping, W., Peng, K., Gibiansky, A., Arik, S. ¨O., Kannan, A., Narang, S., Raiman, J., and Miller, J. (2018). Deep voice 3: Scaling text-to-speech with convolutional sequence learning. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 May 3, 2018, Conference Track Proceedings. OpenReview.net.

Porcheron, M., Fischer, J. E., Reeves, S., and Sharples, S. (2018). Voice Interfaces in In Proceedings of the 2018 CHI Conference on Human Factors in Everyday Life. Computing Systems CHI '18, pages 1–12, New York, New York, USA. ACM Press.

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al. (2011). The kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society.

Pradhan, A., Mehta, K., and Findlater, L. (2018). "Accessibility Came by Accident". In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems CHI '18, pages 1–13, New York, New York, USA. ACM Press.

Reichenspurner, H., Damiano, R. J., Mack, M., Boehm, D. H., Gulbins, H., Detter, C., Meiser, B., Ellgass, R., and Reichart, B. (1999). Use of the voice-controlled and computer-assisted surgical system zeus for endoscopic coronary artery bypass grafting. The Journal of Thoracic and Cardiovascular Surgery, 118(1):11–16.

Richardson, C. (2018). Microservice Patterns. Manning Pubns Co.

Roman, N. T. (2001). Estudo de dialogos orientados a tarefa usando a teoria de multiagentes. Master's thesis, Universidade Estadual de Campinas, São Paulo, Brazil.

Ronan de Renesse (2017). Virtual digital assistants to overtake world population by 2021.

Rong, X., Fourney, A., Brewer, R. N., Morris, M. R., and Bennett, P. N. (2017). Managing Uncertainty in Time Expressions for Virtual Assistants. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems CHI '17, pages 568–579, New York, New York, USA. ACM Press.

Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jaitly, N., Yang, Z., Chen, Z., Zhang, Y., Wang, Y., Ryan, R., Saurous, R. A., Agiomyrgiannakis, Y., and Wu, Y. (2018). Natural In 2018 TTS synthesis by conditioning wavenet on MEL spectrogram predictions. IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018, pages 4779–4783. IEEE.

Sotelo, J., Mehri, S., Kumar, K., Santos, J. F., Kastner, K., Courville, A. C., and Bengio, Y. (2017). Char2wav: End-to-end speech synthesis. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. OpenReview.net.

Springer, A. and Cramer, H. (2018). "Play PRBLMS". In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems CHI '18, pages 1–13, New York, New York, USA. ACM Press.

Tankard, C. (2016). What the GDPR means for businesses. Network Security, 2016(6):5– 8.

Truyen, E., Landuyt, D. V., Preuveneers, D., Lagaisse, B., and Joosen, W. (2019). A comprehensive feature comparison study of open-source container orchestration frameworks. Applied Sciences, 9(5):931.

Tur, G. and Deng, L. (2011). Intent Determination and Spoken Utterance Classication. In Spoken Language Understanding, pages 93–118. JohnWiley&Sons,Ltd.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, ., and Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.

Vtyurina, A. and Fourney, A. (2018). Exploring the Role of Conversational Cues in Guided Task Support with Virtual Assistants. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems CHI '18, pages 1–7, New York, New York, USA. ACM Press.

Wang, Y., Skerry-Ryan, R. J., Stanton, D., Wu, Y., Weiss, R. J., Jaitly, N., Yang, Z., Xiao, Y., Chen, Z., Bengio, S., Le, Q. V., Agiomyrgiannakis, Y., Clark, R., and Saurous, R. A. (2017). Tacotron: Towards end-to-end speech synthesis. In Lacerda, F., editor, Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017, pages 4006–4010. ISCA.

Weiser, M. and Mark (1993). Some computer science issues in ubiquitous computing. Communications of the ACM, 36(7):75–84.