skip to main content
10.1145/3592813.3592886acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbsiConference Proceedingsconference-collections
research-article

Assis: Online Semi-Automatic Dialog Annotation Tool

Published:26 June 2023Publication History

ABSTRACT

Context: Task-oriented conversational systems demand a high volume of data to understand human language. One of the major challenges of Natural Language Processing (NLP) is the lack of structured annotated data to improve and refine language models, therefore, institutions often generate or mine their own data and have to annotate it themselves.

Problem: The annotation process is time-consuming and costly process that usually results in errors due to human fatigue and often acts as the blocking phase for many smaller teams developing AI. Companies frequently report scarcity and poor data quality when developing these systems.

Solution: This paper presents Assis, a modular, adaptable tool for semi-automatic annotation (manual and AI annotation). The tool automates and organizes the intentions and entities in task-oriented conversations. Our proposal combines components that facilitate the visual assimilation of the annotation process. Assis can be embedded with continuously refined language models based on previously annotated sentences.

IS theory: Assis was developed with the idea of Design Theory in mind, using its base of knowledge to evaluate the existing and proposed tools to its goal of facilitating annotation.

Method: Empirical results from user experience in real-life case studies and satisfaction with both the annotation results as well as the user experience, in comparison to the same study groups conducting the annotation without tools or in another software, using a feedback form after use.

Results: During one of the case studies, the tool was used to annotate more than 800 messages, with user feedback relating a high satisfaction with the reduction of the required time.

Contributions and Impact in the IS area: The tool innovates with its deployless architecture, modularity and adaptability, while introducing two new concepts for text annotation: dialogue topics and entity propagation.

References

  1. Daniel Albright, Arrick Lanfranchi, Anwen Fredriksen, William F Styler IV, Colin Warner, Jena D Hwang, Jinho D Choi, Dmitriy Dligach, Rodney D Nielsen, James Martin, 2013. Towards comprehensive syntactic and semantic annotations of the clinical narrative. Journal of the American Medical Informatics Association 20, 5 (2013), 922–930.Google ScholarGoogle ScholarCross RefCross Ref
  2. Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sophia Ananiadou and Jun’ichi Tsujii. 2012. stav: text annotation visualiser. (2012).Google ScholarGoogle Scholar
  4. Pawel Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Inigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. 2018. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. arXiv preprint arXiv:1810.00278 (2018).Google ScholarGoogle Scholar
  5. Riccardo Coppola and Luca Ardito. 2021. Quality Assessment Methods for Textual Conversational Interfaces: A Multivocal Literature Review. Information 12, 11 (2021), 437.Google ScholarGoogle Scholar
  6. Jan-Christoph Klie. 2018. INCEpTION: Interactive machine-assisted annotation.. In DESIRES. 105.Google ScholarGoogle Scholar
  7. Kostiantyn Kucher, Andreas Kerren, Carita Paradis, and Magnus Sahlgren. 2016. Visual Analysis of Text Annotations for Stance Classification with ALVA.. In EuroVis (Posters). 49–51.Google ScholarGoogle Scholar
  8. Todd Lingren, Louise Deleger, Katalin Molnar, Haijun Zhai, Jareen Meinzen-Derr, Megan Kaiser, Laura Stoutenborough, Qi Li, and Imre Solti. 2014. Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. Journal of the American Medical Informatics Association 21, 3 (2014), 406–413.Google ScholarGoogle ScholarCross RefCross Ref
  9. Erinc Merdivan, Deepika Singh, Sten Hanke, Johannes Kropf, Andreas Holzinger, and Matthieu Geist. 2020. Human annotated dialogues dataset for natural conversational agents. Applied Sciences 10, 3 (2020), 762.Google ScholarGoogle ScholarCross RefCross Ref
  10. Thomas S Morton and Jeremy LaCivita. 2003. WordFreak: an open tool for linguistic annotation. In Companion Volume of the Proceedings of HLT-NAACL 2003-Demonstrations. 17–18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hiroki Nakayama, Takahiro Kubo, Junya Kamura, Yasufumi Taniguchi, and Xu Liang. 2018. doccano: Text Annotation Tool for Human. https://github.com/doccano/doccano Software available from https://github.com/doccano/doccano.Google ScholarGoogle Scholar
  12. Minh-Quoc Nghiem, Paul Baylis, and Sophia Ananiadou. 2021. Paladin: an annotation tool based on active and proactive learning. (2021), 238–243.Google ScholarGoogle Scholar
  13. Fredrik Olsson. 2008. Bootstrapping named entity annotation by means of active machine learning: a method for creating corpora. Ph. D. Dissertation.Google ScholarGoogle Scholar
  14. Alan Ritter, Colin Cherry, and Bill Dolan. 2011. Data-driven response generation in social media. In Empirical Methods in Natural Language Processing (EMNLP).Google ScholarGoogle Scholar
  15. Matheus Ferraroni Sanches, Jáder MC de Sá, Allan Mariano de Souza, Diego A Silva, Rafael R de Souza, Júlio Cesar dos Reis, and Leandro A Villas. 2022. MCCD: Generating Human Natural Language Conversational Datasets.. In ICEIS (2). 247–255.Google ScholarGoogle Scholar
  16. Maria Skeppstedt, Carita Paradis, and Andreas Kerren. 2017. PAL, a tool for pre-annotation and active learning. Journal for Language Technology and Computational Linguistics 31, 1 (2017), 91–110.Google ScholarGoogle Scholar
  17. Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou, and Jun’ichi Tsujii. 2012. BRAT: a web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. 102–107.Google ScholarGoogle Scholar
  18. Jie Yang, Yue Zhang, Linwei Li, and Xingxuan Li. 2017. YEDDA: A lightweight collaborative text span annotation tool. arXiv preprint arXiv:1711.03759 (2017).Google ScholarGoogle Scholar
  19. Seid Muhie Yimam, Iryna Gurevych, Richard Eckart de Castilho, and Chris Biemann. 2013. Webanno: A flexible, web-based and visually supported system for distributed annotations. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 1–6.Google ScholarGoogle Scholar
  20. Xiaoxue Zang, Abhinav Rastogi, Srinivas Sunkara, Raghav Gupta, Jianguo Zhang, and Jindong Chen. 2020. MultiWOZ 2.2: A dialogue dataset with additional annotation corrections and state tracking baselines. arXiv preprint arXiv:2007.12720 (2020).Google ScholarGoogle Scholar
  21. Wen Zhang, Heng Wang, Kaijun Ren, and Junqiang Song. 2016. Chinese sentence based lexical similarity measure for artificial intelligence chatbot. In 2016 8th International Conference on Electronics, Computers and Artificial Intelligence (ECAI). IEEE, 1–4.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Assis: Online Semi-Automatic Dialog Annotation Tool

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        SBSI '23: Proceedings of the XIX Brazilian Symposium on Information Systems
        May 2023
        490 pages

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 June 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate181of557submissions,32%
      • Article Metrics

        • Downloads (Last 12 months)49
        • Downloads (Last 6 weeks)3

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format