research-article

Assis: Online Semi-Automatic Dialog Annotation Tool

Authors:
Henrique Theodor Schutz Foerste

School of Electrical and Computer Engineering, State University of Campinas, Brazil

School of Electrical and Computer Engineering, State University of Campinas, Brazil

0009-0006-6750-6737
View Profile

,
Andreis Gustavo Malta Purim

Institute of Computing, State University of Campinas, Brazil

Institute of Computing, State University of Campinas, Brazil

0009-0001-5908-3213
View Profile

,
Rafael Roque Souza

Institute of Computing, State University of Campinas, Brazil

Institute of Computing, State University of Campinas, Brazil

0000-0003-1492-5816
View Profile

,
Julio Cesar Dos Reis

Institute of Computing, State University of Campinas, Brazil

Institute of Computing, State University of Campinas, Brazil

0000-0002-9545-2098
View Profile

SBSI '23: Proceedings of the XIX Brazilian Symposium on Information SystemsMay 2023Pages 37–44https://doi.org/10.1145/3592813.3592886

Published:26 June 2023Publication History

SBSI '23: Proceedings of the XIX Brazilian Symposium on Information Systems

Pages 37–44

ABSTRACT

Context: Task-oriented conversational systems demand a high volume of data to understand human language. One of the major challenges of Natural Language Processing (NLP) is the lack of structured annotated data to improve and refine language models, therefore, institutions often generate or mine their own data and have to annotate it themselves.

Problem: The annotation process is time-consuming and costly process that usually results in errors due to human fatigue and often acts as the blocking phase for many smaller teams developing AI. Companies frequently report scarcity and poor data quality when developing these systems.

Solution: This paper presents Assis, a modular, adaptable tool for semi-automatic annotation (manual and AI annotation). The tool automates and organizes the intentions and entities in task-oriented conversations. Our proposal combines components that facilitate the visual assimilation of the annotation process. Assis can be embedded with continuously refined language models based on previously annotated sentences.

IS theory: Assis was developed with the idea of Design Theory in mind, using its base of knowledge to evaluate the existing and proposed tools to its goal of facilitating annotation.

Method: Empirical results from user experience in real-life case studies and satisfaction with both the annotation results as well as the user experience, in comparison to the same study groups conducting the annotation without tools or in another software, using a feedback form after use.

Results: During one of the case studies, the tool was used to annotate more than 800 messages, with user feedback relating a high satisfaction with the reduction of the required time.

Contributions and Impact in the IS area: The tool innovates with its deployless architecture, modularity and adaptability, while introducing two new concepts for text annotation: dialogue topics and entity propagation.

References

Daniel Albright, Arrick Lanfranchi, Anwen Fredriksen, William F Styler IV, Colin Warner, Jena D Hwang, Jinho D Choi, Dmitriy Dligach, Rodney D Nielsen, James Martin, 2013. Towards comprehensive syntactic and semantic annotations of the clinical narrative. Journal of the American Medical Informatics Association 20, 5 (2013), 922–930.Google ScholarCross Ref
Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–13.Google ScholarDigital Library
Sophia Ananiadou and Jun’ichi Tsujii. 2012. stav: text annotation visualiser. (2012).Google Scholar
Pawel Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Inigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. 2018. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. arXiv preprint arXiv:1810.00278 (2018).Google Scholar
Riccardo Coppola and Luca Ardito. 2021. Quality Assessment Methods for Textual Conversational Interfaces: A Multivocal Literature Review. Information 12, 11 (2021), 437.Google Scholar
Jan-Christoph Klie. 2018. INCEpTION: Interactive machine-assisted annotation.. In DESIRES. 105.Google Scholar
Kostiantyn Kucher, Andreas Kerren, Carita Paradis, and Magnus Sahlgren. 2016. Visual Analysis of Text Annotations for Stance Classification with ALVA.. In EuroVis (Posters). 49–51.Google Scholar
Todd Lingren, Louise Deleger, Katalin Molnar, Haijun Zhai, Jareen Meinzen-Derr, Megan Kaiser, Laura Stoutenborough, Qi Li, and Imre Solti. 2014. Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. Journal of the American Medical Informatics Association 21, 3 (2014), 406–413.Google ScholarCross Ref
Erinc Merdivan, Deepika Singh, Sten Hanke, Johannes Kropf, Andreas Holzinger, and Matthieu Geist. 2020. Human annotated dialogues dataset for natural conversational agents. Applied Sciences 10, 3 (2020), 762.Google ScholarCross Ref
Thomas S Morton and Jeremy LaCivita. 2003. WordFreak: an open tool for linguistic annotation. In Companion Volume of the Proceedings of HLT-NAACL 2003-Demonstrations. 17–18.Google ScholarDigital Library
Hiroki Nakayama, Takahiro Kubo, Junya Kamura, Yasufumi Taniguchi, and Xu Liang. 2018. doccano: Text Annotation Tool for Human. https://github.com/doccano/doccano Software available from https://github.com/doccano/doccano.Google Scholar
Minh-Quoc Nghiem, Paul Baylis, and Sophia Ananiadou. 2021. Paladin: an annotation tool based on active and proactive learning. (2021), 238–243.Google Scholar
Fredrik Olsson. 2008. Bootstrapping named entity annotation by means of active machine learning: a method for creating corpora. Ph. D. Dissertation.Google Scholar
Alan Ritter, Colin Cherry, and Bill Dolan. 2011. Data-driven response generation in social media. In Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
Matheus Ferraroni Sanches, Jáder MC de Sá, Allan Mariano de Souza, Diego A Silva, Rafael R de Souza, Júlio Cesar dos Reis, and Leandro A Villas. 2022. MCCD: Generating Human Natural Language Conversational Datasets.. In ICEIS (2). 247–255.Google Scholar
Maria Skeppstedt, Carita Paradis, and Andreas Kerren. 2017. PAL, a tool for pre-annotation and active learning. Journal for Language Technology and Computational Linguistics 31, 1 (2017), 91–110.Google Scholar
Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou, and Jun’ichi Tsujii. 2012. BRAT: a web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. 102–107.Google Scholar
Jie Yang, Yue Zhang, Linwei Li, and Xingxuan Li. 2017. YEDDA: A lightweight collaborative text span annotation tool. arXiv preprint arXiv:1711.03759 (2017).Google Scholar
Seid Muhie Yimam, Iryna Gurevych, Richard Eckart de Castilho, and Chris Biemann. 2013. Webanno: A flexible, web-based and visually supported system for distributed annotations. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 1–6.Google Scholar
Xiaoxue Zang, Abhinav Rastogi, Srinivas Sunkara, Raghav Gupta, Jianguo Zhang, and Jindong Chen. 2020. MultiWOZ 2.2: A dialogue dataset with additional annotation corrections and state tracking baselines. arXiv preprint arXiv:2007.12720 (2020).Google Scholar
Wen Zhang, Heng Wang, Kaijun Ren, and Junqiang Song. 2016. Chinese sentence based lexical similarity measure for artificial intelligence chatbot. In 2016 8th International Conference on Electronics, Computers and Artificial Intelligence (ECAI). IEEE, 1–4.Google ScholarCross Ref

Index Terms

Assis: Online Semi-Automatic Dialog Annotation Tool
1. Applied computing
  1. Document management and text processing
    1. Document preparation
      1. Annotation
2. Information systems
  1. Information systems applications
    1. Collaborative and social computing systems and tools
      1. Open source software

Recommendations

Online annotation of text streams with structured entities
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

We propose a framework and algorithm for annotating unbounded text streams with entities of a structured database. The algorithm allows one to correlate unstructured and dirty text streams from sources such as emails, chats and blogs, to entities stored ...
Read More
Semi-automatic semantic annotation of PubMed queries

Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a ...
Read More
Desiderata for ontologies to be used in semantic annotation of biomedical documents

A wealth of knowledge valuable to the translational research scientist is contained within the vast biomedical literature, but this knowledge is typically in the form of natural language. Sophisticated natural-language-processing systems are needed to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SBSI '23: Proceedings of the XIX Brazilian Symposium on Information Systems
May 2023
490 pages
ISBN:9798400707599
DOI:10.1145/3592813
Editors:
Mônica Ximenes C. da Cunha
IFAL
,
Marcílio F. de Souza Júnior
UFRPE
,
Johnny C. Marques
ITA
,
Tadeu M. de Classe
UNIRIO
,
Rafael D. Araújo
UFU
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
NLP
active learning
annotation
online
tool
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate181of557submissions,32%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 49
  Total Downloads
- Downloads (Last 12 months)49
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Assis: Online Semi-Automatic Dialog Annotation Tool

SBSI '23: Proceedings of the XIX Brazilian Symposium on Information Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Online annotation of text streams with structured entities

Semi-automatic semantic annotation of PubMed queries

Desiderata for ontologies to be used in semantic annotation of biomedical documents

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Assis: Online Semi-Automatic Dialog Annotation Tool

SBSI '23: Proceedings of the XIX Brazilian Symposium on Information Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Online annotation of text streams with structured entities

Semi-automatic semantic annotation of PubMed queries

Desiderata for ontologies to be used in semantic annotation of biomedical documents

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media