ABSTRACT
Context: Task-oriented conversational systems demand a high volume of data to understand human language. One of the major challenges of Natural Language Processing (NLP) is the lack of structured annotated data to improve and refine language models, therefore, institutions often generate or mine their own data and have to annotate it themselves.
Problem: The annotation process is time-consuming and costly process that usually results in errors due to human fatigue and often acts as the blocking phase for many smaller teams developing AI. Companies frequently report scarcity and poor data quality when developing these systems.
Solution: This paper presents Assis, a modular, adaptable tool for semi-automatic annotation (manual and AI annotation). The tool automates and organizes the intentions and entities in task-oriented conversations. Our proposal combines components that facilitate the visual assimilation of the annotation process. Assis can be embedded with continuously refined language models based on previously annotated sentences.
IS theory: Assis was developed with the idea of Design Theory in mind, using its base of knowledge to evaluate the existing and proposed tools to its goal of facilitating annotation.
Method: Empirical results from user experience in real-life case studies and satisfaction with both the annotation results as well as the user experience, in comparison to the same study groups conducting the annotation without tools or in another software, using a feedback form after use.
Results: During one of the case studies, the tool was used to annotate more than 800 messages, with user feedback relating a high satisfaction with the reduction of the required time.
Contributions and Impact in the IS area: The tool innovates with its deployless architecture, modularity and adaptability, while introducing two new concepts for text annotation: dialogue topics and entity propagation.
- Daniel Albright, Arrick Lanfranchi, Anwen Fredriksen, William F Styler IV, Colin Warner, Jena D Hwang, Jinho D Choi, Dmitriy Dligach, Rodney D Nielsen, James Martin, 2013. Towards comprehensive syntactic and semantic annotations of the clinical narrative. Journal of the American Medical Informatics Association 20, 5 (2013), 922–930.Google ScholarCross Ref
- Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–13.Google ScholarDigital Library
- Sophia Ananiadou and Jun’ichi Tsujii. 2012. stav: text annotation visualiser. (2012).Google Scholar
- Pawel Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Inigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. 2018. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. arXiv preprint arXiv:1810.00278 (2018).Google Scholar
- Riccardo Coppola and Luca Ardito. 2021. Quality Assessment Methods for Textual Conversational Interfaces: A Multivocal Literature Review. Information 12, 11 (2021), 437.Google Scholar
- Jan-Christoph Klie. 2018. INCEpTION: Interactive machine-assisted annotation.. In DESIRES. 105.Google Scholar
- Kostiantyn Kucher, Andreas Kerren, Carita Paradis, and Magnus Sahlgren. 2016. Visual Analysis of Text Annotations for Stance Classification with ALVA.. In EuroVis (Posters). 49–51.Google Scholar
- Todd Lingren, Louise Deleger, Katalin Molnar, Haijun Zhai, Jareen Meinzen-Derr, Megan Kaiser, Laura Stoutenborough, Qi Li, and Imre Solti. 2014. Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. Journal of the American Medical Informatics Association 21, 3 (2014), 406–413.Google ScholarCross Ref
- Erinc Merdivan, Deepika Singh, Sten Hanke, Johannes Kropf, Andreas Holzinger, and Matthieu Geist. 2020. Human annotated dialogues dataset for natural conversational agents. Applied Sciences 10, 3 (2020), 762.Google ScholarCross Ref
- Thomas S Morton and Jeremy LaCivita. 2003. WordFreak: an open tool for linguistic annotation. In Companion Volume of the Proceedings of HLT-NAACL 2003-Demonstrations. 17–18.Google ScholarDigital Library
- Hiroki Nakayama, Takahiro Kubo, Junya Kamura, Yasufumi Taniguchi, and Xu Liang. 2018. doccano: Text Annotation Tool for Human. https://github.com/doccano/doccano Software available from https://github.com/doccano/doccano.Google Scholar
- Minh-Quoc Nghiem, Paul Baylis, and Sophia Ananiadou. 2021. Paladin: an annotation tool based on active and proactive learning. (2021), 238–243.Google Scholar
- Fredrik Olsson. 2008. Bootstrapping named entity annotation by means of active machine learning: a method for creating corpora. Ph. D. Dissertation.Google Scholar
- Alan Ritter, Colin Cherry, and Bill Dolan. 2011. Data-driven response generation in social media. In Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
- Matheus Ferraroni Sanches, Jáder MC de Sá, Allan Mariano de Souza, Diego A Silva, Rafael R de Souza, Júlio Cesar dos Reis, and Leandro A Villas. 2022. MCCD: Generating Human Natural Language Conversational Datasets.. In ICEIS (2). 247–255.Google Scholar
- Maria Skeppstedt, Carita Paradis, and Andreas Kerren. 2017. PAL, a tool for pre-annotation and active learning. Journal for Language Technology and Computational Linguistics 31, 1 (2017), 91–110.Google Scholar
- Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou, and Jun’ichi Tsujii. 2012. BRAT: a web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. 102–107.Google Scholar
- Jie Yang, Yue Zhang, Linwei Li, and Xingxuan Li. 2017. YEDDA: A lightweight collaborative text span annotation tool. arXiv preprint arXiv:1711.03759 (2017).Google Scholar
- Seid Muhie Yimam, Iryna Gurevych, Richard Eckart de Castilho, and Chris Biemann. 2013. Webanno: A flexible, web-based and visually supported system for distributed annotations. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 1–6.Google Scholar
- Xiaoxue Zang, Abhinav Rastogi, Srinivas Sunkara, Raghav Gupta, Jianguo Zhang, and Jindong Chen. 2020. MultiWOZ 2.2: A dialogue dataset with additional annotation corrections and state tracking baselines. arXiv preprint arXiv:2007.12720 (2020).Google Scholar
- Wen Zhang, Heng Wang, Kaijun Ren, and Junqiang Song. 2016. Chinese sentence based lexical similarity measure for artificial intelligence chatbot. In 2016 8th International Conference on Electronics, Computers and Artificial Intelligence (ECAI). IEEE, 1–4.Google ScholarCross Ref
Index Terms
- Assis: Online Semi-Automatic Dialog Annotation Tool
Recommendations
Online annotation of text streams with structured entities
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementWe propose a framework and algorithm for annotating unbounded text streams with entities of a structured database. The algorithm allows one to correlate unstructured and dirty text streams from sources such as emails, chats and blogs, to entities stored ...
Semi-automatic semantic annotation of PubMed queries
Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a ...
Desiderata for ontologies to be used in semantic annotation of biomedical documents
A wealth of knowledge valuable to the translational research scientist is contained within the vast biomedical literature, but this knowledge is typically in the form of natural language. Sophisticated natural-language-processing systems are needed to ...
Comments