Integrating Crowdsourcing and Human Computation for Complex Video Annotation Tasks

  • Marcello N. Amorim UFES
  • Celso A. S. Santos UFES
  • Orivaldo L. Tavares UFES


Video annotation is an activity that aims to supplement this type of multimedia object with additional content or information about its context, nature, content, quality and other aspects. These annotations are the basis for building a variety of multimedia applications for various purposes ranging from entertainment to security. Manual annotation is a strategy that uses the intelligence and workforce of people in the annotation process and is an alternative to cases where automatic methods cannot be applied. However, manual video annotation can be a costly process because as the content to be annotated increases, so does the workload for annotating. Crowdsourcing appears as a viable solution strategy in this con- text because it relies on outsourcing the tasks to a multitude of workers, who perform specific parts of the work in a distributed way. However, as the complexity of required media annoyances increases, it becomes necessary to employ skilled labor, or willing to perform larger, more complicated, and more time-consuming tasks. This makes it challenging to use crowdsourcing, as experts demand higher pay, and recruiting tends to be a difficult activity. In order to overcome this problem, strategies based on the decom- position of the main problem into a set of simpler subtasks suitable for crowdsourcing processes have emerged. These smaller tasks are organized in a workflow so that the execution process can be formalized and controlled. In this sense, this thesis aims to present a new framework that allows the use of crowdsourcing to create applications that require complex video annotation tasks. The developed framework considers the whole process from the definition of the problem and the decomposition of the tasks, until the construction, execution, and management of the workflow. This framework, called CrowdWaterfall, contemplates the strengths of current proposals, incorporating new concepts, techniques, and resources to overcome some of its limitations.


Qaisar Abbas, Mostafa EA Ibrahim, and M Arfan Jaffar. 2017. Video scene analysis: an overview and challenges on deep learning algorithms. Multimedia Tools and Applications (2017), 1–39.

M. N. AMORIM, F. R. A. NETO, and C. A. S. SANTOS. 2018. Achieving Complex Media Annotation through Collective Wisdom and Effort from the Crowd. In 201825th International Conference on Systems, Signals and Image Processing (IWSSIP).1–5.

Marcello Novaes de Amorim, Estêvão Bissoli Saleme, Fábio Ribeiro de Assis Neto, Celso A. S. Santos, and Gheorghita Ghinea. 2019. Crowdsourcing authoring of sensory effects on videos. Multimedia Tools and Applications (08 Feb 2019).

Marcello N. de Amorim, Ricardo M.C. Segundo, Celso A.S. Santos, and Orivaldo de L. Tavares. 2017. Video Annotation by Cascading Microtasks: A Crowdsourcing Approach. In Proceedings of the 23rd Brazilian Symposium on Multimedia and the Web (Gramado, RS, Brazil) (WebMedia ’17). ACM, New York, NY, USA, 49–56.

Rada Mihalcea and Andras Csomai. 2007. Wikify!: Linking Documents to Encyclopedic Knowledge. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management (Lisbon, Portugal) (CIKM ’07). ACM, New York, NY, USA, 233–242.

M. Misanchuk and T. Anderson. 2001. Building Community in an Online Learning Environment: Communication, Cooperation and Collaboration. (2001).

G. Pal, S. Acharjee, D. Rudrapaul, A. S. Ashour, and N. Dey. 2015. Video segmentation using minimum ratio similarity measurement. International journal of image mining 1, 1 (2015), 87–110.

CAS Santos, Alexandre SANTOS, and TA Tavares. 2007. Uma estratégia para a construção de ambientes para a descrição semântica de vídeos.

Luis Von Ahn. 2005. Human Computation. Ph.D. Dissertation. Carnegie Mellon University, Pittsburgh, PA, USA. Advisor(s) Blum, Manuel. AAI3205378.

Meng Wang and Xian-Sheng Hua. 2011. Active Learning in Multimedia Annotation and Retrieval: A Survey. ACM Trans. Intell. Syst. Technol. 2, 2, Article 10(Feb. 2011), 21 pages.

Meng Wang, Xian-Sheng Hua, Jinhui Tang, and Richang Hong. 2009. Constructing Neighborhood Similarity for Video Annotation. Trans. Multi. 11, 3 (April2009), 465–476.

Mengyao Zhao and André van der Hoek. 2015. A brief perspective on micro task crowdsourcing workflows for interface design. In Proceedings of the Second Inter-national Workshop on CrowdSourcing in Software Engineering. IEEE Press, 45–46.

Tingting Zhu, Joachim Behar, Tasos Papastylianou, and Gari D Clifford. 2014.CrowdLabel: A crowdsourcing platform for electrophysiology. In Computing in Cardiology 2014. IEEE, 789–792.
Como Citar

Selecione um Formato
AMORIM, Marcello N.; SANTOS, Celso A. S.; TAVARES, Orivaldo L.. Integrating Crowdsourcing and Human Computation for Complex Video Annotation Tasks. In: CONCURSO DE TESES E DISSERTAÇÕES - SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 26. , 2020, São Luís. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 9-12. ISSN 2596-1683. DOI: