A Crowdsourcing Tool for Data Augmentation in Visual Question Answering Tasks

  • Ramon Silva CEFET-RJ
  • Augusto Fonseca CEFET-RJ
  • Ronaldo Goldschmidt IME
  • Joel dos Santos CEFET-RJ
  • Eduardo Bezerra CEFET-RJ


Visual Question Answering (VQA) is a task that connects the fields of Computer Vision and Natural Language Processing. Taking as input an image I and a natural language question Q about I, a VQA model must be able to produce a coherent answer R (also in natural language) to Q. A particular type of visual question is one in which the question is binary (i.e., a question whose answer belongs to the set {yes, no}). Currently, deep neural networks correspond to the state of the art technique for training of VQA models. Despite its success, the application of neural networks to the VQA task requires a very large amount of data in order to produce models with adequate precision. Datasets currently used for the training of VQA models are the result of laborious manual labeling processes (i.e., made by humans). This context makes relevant the study of approaches to augment these datasets in order to train more accurate prediction models. This paper describes a crowdsourcing tool which can be used in a collaborative manner to augment an existing VQA dataset for binary questions. Our tool actively integrates candidate items from an external data source in order to optimize the selection of queries to be presented to curators.
Palavras-chave: Crowdsourcing, Human Computation, Data Augmentation, Image Annotation
Como Citar

Selecione um Formato
SILVA, Ramon; FONSECA, Augusto; GOLDSCHMIDT, Ronaldo; DOS SANTOS, Joel; BEZERRA, Eduardo. A Crowdsourcing Tool for Data Augmentation in Visual Question Answering Tasks. In: SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 24. , 2018, Salvador. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 137-140.