Enhancing Designer Knowledge to Dialogue Management: A Comparison between Supervised and Reinforcement Learning Approaches

  • Bruno Eidi Nishimoto USP / Itaú Unibanco
  • Rogers Silva Cristo Itaú Unibanco
  • Alex Fernandes Mansano Itaú Unibanco
  • Eduardo Raul Hruschka USP
  • Vinicius Fernandes Caridá Itaú Unibanco
  • Anna Helena Reali Costa USP


Task-oriented dialogue systems are complex natural language applications employed in various fields such as health care, sales assistance, and digital customer servicing. Although the literature suggests several approaches to managing this type of dialogue system, only a few of them compares the performance of different techniques. From this perspective, in this paper we present a comparison between supervised learning, using the transformer architecture, and reinforcement learning using two flavors of Deep Q-Learning (DQN) algorithms. Our experiments use the MultiWOZ dataset and a real-world digital customer service dataset, from which we show that integrating expert pre-defined rules with DQN allows outperforming supervised approaches. Additionally, we also propose a method to make better usage of the designer knowledge by improving how interactions collected in warm-up are used in training phase. Our results indicate a reduction in training time by preserving the designer’s knowledge, expressed as pre-defined rules in memory during the initial steps of the DQN training procedure.


Bocklisch, T., Faulkner, J., Pawlowski, N., and Nichol, A. (2017). Rasa: Open source language understanding and dialogue management. ArXiv, abs/1712.05181.

Budzianowski, P., Wen, T.-H., Tseng, B.-H., Casanueva, I., Ultes, S., Ramadan, O., and Gašić, M. (2018). MultiWOZ a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016-5026, Brussels, Belgium. Association for Computational Linguistics.

Chen, H., Liu, X., Yin, D., and Tang, J. (2017). A Survey on Dialogue Systems: Recent Advances and New Frontiers. SIGKDD Explor. Newsl., 19(2):25-35.

Dai, Y., Yu, H., Jiang, Y., Tang, C., Li, Y., and Sun, J. (2020). A Survey on Dialog Management: Recent Advances and Challenges. CoRR, abs/2005.02233.

Eric, M., Goel, R., Paul, S., Sethi, A., Agarwal, S., Gao, S., and Hakkani-Tür, D. (2019). MultiWOZ 2.1: Multi-Domain Dialogue State Corrections and State Tracking Baselines. CoRR, abs/1907.01669.

Gordon-Hall, G., Gorinski, P. J., and Cohen, S. B. (2020). Learning Dialog Policies from Weak Demonstrations. In ACL, pages 1394-1405.

Griol, D., Hurtado, L. F., Segarra, E., and Sanchis, E. (2008). A statistical approach to spoken dialog systems design and evaluation. Speech Communication, 50(8):666-682.

Ham, D., Lee, J.-G., Jang, Y., and Kim, K.-E. (2020). End-to-End Neural Pipeline for Goal-Oriented Dialogue Systems using GPT-2. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 583-592, Online. Association for Computational Linguistics.

Hosseini-Asl, E., McCann, B., Wu, C.-S., Yavuz, S., and Socher, R. (2020). A Simple Language Model for Task-Oriented Dialogue. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 20179-20191. Curran Associates, Inc.

Li, X., Chen, Y.-N., and Li, L. (2017). End-to-End Task-Completion Neural Dialogue System. In Proceedings of the The 8th International Joint Conference on Natural Language Processing, page 733-743.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.

Mo, K., Zhang, Y., Li, S., Li, J., and Yang, Q. (2018). Personalizing a Dialogue System with Transfer Reinforcement Learning. In The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018).

Montenegro, J. L. Z., da Costa, C. A., and da Rosa Righi, R. (2019). Survey of conversational agents in health. Expert Systems with Applications, 129:56-67.

Nishimoto, B. E. and Reali Costa, A. H. (2019). Dialogue Management with Deep Reinforcement Learning: Balancing Exploration and Exploitation. In 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), pages 449-454.

Saha, T., Gupta, D., Saha, S., and Bhattacharyya, P. (2020). Towards integrated dialogue policy learning for multiple domains and intents using Hierarchical Deep Reinforcement Learning. Expert Systems with Applications, 162:113650.

Schatzmann, J. and Young, S. (2009). The Hidden Agenda User Simulation Model. IEEE Transactions on Audio, Speech, and Language Processing, 17(4):733-747.

Su, P., Gasic, M., Mrksic, N., Rojas-Barahona, L. M., Ultes, S., Vandyke, D., Wen, T., and Young, S. J. (2016). Continuously Learning Neural Dialogue Management. CoRR, abs/1606.02689.

Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press, Cambridge, MA, USA, 2nd edition.

Takanobu, R., Zhu, H., and Huang, M. (2019). Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog. In EMNLP-IJCNLP, pages 100-110.

Takanobu, R., Zhu, Q., Li, J., Peng, B., Gao, J., and Huang, M. (2020). Is your goal-oriented dialog model performing really well? empirical analysis of system-wise evaluation. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 297-310, 1st virtual meeting. Association for Computational Linguistics.

Vlasov, V., Drissner-Schmid, A., and Nichol, A. (2018). Few-Shot Generalization Across Dialogue Tasks. CoRR, abs/1811.11707.

Vlasov, V., Mosig, J. E. M., and Nichol, A. (2019). Dialogue Transformers. ArXiv, abs/1910.00486.

Wang, S., Zhou, K., Lai, K., and Shen, J. (2020). Task-completion dialogue policy learning via Monte Carlo tree search with dueling network. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3461-3471, Online. Association for Computational Linguistics.

Weisz, G., Budzianowski, P., Su, P., and Gašić, M. (2018). Sample Efficient Deep Reinforcement Learning for Dialogue Systems With Large Action Spaces. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(11):2083-2097.

Weizenbaum, J. (1966). ELIZA a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1):36-45.

Williams, J. D., Asadi, K., and Zweig, G. (2017). Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 665-677, Vancouver, Canada. Association for Computational Linguistics.

Yan, Z., Duan, N., Chen, P., Zhou, M., Zhou, J., and Li, Z. (2017). Building Task-Oriented Dialogue Systems for Online Shopping. In AAAI Conference on Artificial Intelligence.

Zhang, Z., Takanobu, R., Zhu, Q., Huang, M., and Zhu, X. (2020). Recent Advances and Challenges in Task-oriented Dialog System. ArXiv, abs/2003.07490.

Zhao, T. and Eskenazi, M. (2016). Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 1-10, Los Angeles. Association for Computational Linguistics.
NISHIMOTO, Bruno Eidi; CRISTO, Rogers Silva; MANSANO, Alex Fernandes; HRUSCHKA, Eduardo Raul; CARIDÁ, Vinicius Fernandes; COSTA, Anna Helena Reali. Enhancing Designer Knowledge to Dialogue Management: A Comparison between Supervised and Reinforcement Learning Approaches. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 19. , 2022, Campinas/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 364-376. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2022.227625.