Integrating Crowdsourcing and Human Computation for Complex Video Annotation Tasks

Marcello N. Amorim; Celso A. S. Santos; Orivaldo L. Tavares

doi:10.5753/webmedia_estendido.2020.13053

Marcello N. Amorim UFES
Celso A. S. Santos UFES
Orivaldo L. Tavares UFES

DOI: https://doi.org/10.5753/webmedia_estendido.2020.13053

Resumo

Video annotation is an activity that aims to supplement this type of multimedia object with additional content or information about its context, nature, content, quality and other aspects. These annotations are the basis for building a variety of multimedia applications for various purposes ranging from entertainment to security. Manual annotation is a strategy that uses the intelligence and workforce of people in the annotation process and is an alternative to cases where automatic methods cannot be applied. However, manual video annotation can be a costly process because as the content to be annotated increases, so does the workload for annotating. Crowdsourcing appears as a viable solution strategy in this con- text because it relies on outsourcing the tasks to a multitude of workers, who perform specific parts of the work in a distributed way. However, as the complexity of required media annoyances increases, it becomes necessary to employ skilled labor, or willing to perform larger, more complicated, and more time-consuming tasks. This makes it challenging to use crowdsourcing, as experts demand higher pay, and recruiting tends to be a difficult activity. In order to overcome this problem, strategies based on the decom- position of the main problem into a set of simpler subtasks suitable for crowdsourcing processes have emerged. These smaller tasks are organized in a workflow so that the execution process can be formalized and controlled. In this sense, this thesis aims to present a new framework that allows the use of crowdsourcing to create applications that require complex video annotation tasks. The developed framework considers the whole process from the definition of the problem and the decomposition of the tasks, until the construction, execution, and management of the workflow. This framework, called CrowdWaterfall, contemplates the strengths of current proposals, incorporating new concepts, techniques, and resources to overcome some of its limitations.

Referências

Qaisar Abbas, Mostafa EA Ibrahim, and M Arfan Jaffar. 2017. Video scene analysis: an overview and challenges on deep learning algorithms. Multimedia Tools and Applications (2017), 1–39. https://doi.org/10.1007/s11042-017-5438-7

M. N. AMORIM, F. R. A. NETO, and C. A. S. SANTOS. 2018. Achieving Complex Media Annotation through Collective Wisdom and Effort from the Crowd. In 201825th International Conference on Systems, Signals and Image Processing (IWSSIP).1–5. https://doi.org/10.1109/IWSSIP.2018.8439402

Marcello Novaes de Amorim, Estêvão Bissoli Saleme, Fábio Ribeiro de Assis Neto, Celso A. S. Santos, and Gheorghita Ghinea. 2019. Crowdsourcing authoring of sensory effects on videos. Multimedia Tools and Applications (08 Feb 2019).https://doi.org/10.1007/s11042-019-7312-2

Marcello N. de Amorim, Ricardo M.C. Segundo, Celso A.S. Santos, and Orivaldo de L. Tavares. 2017. Video Annotation by Cascading Microtasks: A Crowdsourcing Approach. In Proceedings of the 23rd Brazilian Symposium on Multimedia and the Web (Gramado, RS, Brazil) (WebMedia ’17). ACM, New York, NY, USA, 49–56. https://doi.org/10.1145/3126858.3126897

Rada Mihalcea and Andras Csomai. 2007. Wikify!: Linking Documents to Encyclopedic Knowledge. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management (Lisbon, Portugal) (CIKM ’07). ACM, New York, NY, USA, 233–242. https://doi.org/10.1145/1321440.1321475

M. Misanchuk and T. Anderson. 2001. Building Community in an Online Learning Environment: Communication, Cooperation and Collaboration. (2001).

G. Pal, S. Acharjee, D. Rudrapaul, A. S. Ashour, and N. Dey. 2015. Video segmentation using minimum ratio similarity measurement. International journal of image mining 1, 1 (2015), 87–110. https://doi.org/10.1504/IJIM.2015.070027

CAS Santos, Alexandre SANTOS, and TA Tavares. 2007. Uma estratégia para a construção de ambientes para a descrição semântica de vídeos.

Luis Von Ahn. 2005. Human Computation. Ph.D. Dissertation. Carnegie Mellon University, Pittsburgh, PA, USA. Advisor(s) Blum, Manuel. AAI3205378.

Meng Wang and Xian-Sheng Hua. 2011. Active Learning in Multimedia Annotation and Retrieval: A Survey. ACM Trans. Intell. Syst. Technol. 2, 2, Article 10(Feb. 2011), 21 pages. https://doi.org/10.1145/1899412.1899414

Meng Wang, Xian-Sheng Hua, Jinhui Tang, and Richang Hong. 2009. Constructing Neighborhood Similarity for Video Annotation. Trans. Multi. 11, 3 (April2009), 465–476. https://doi.org/10.1109/TMM.2009.2012919

Mengyao Zhao and André van der Hoek. 2015. A brief perspective on micro task crowdsourcing workflows for interface design. In Proceedings of the Second Inter-national Workshop on CrowdSourcing in Software Engineering. IEEE Press, 45–46.https://doi.org/10.1109/CSI-SE.2015.16

Tingting Zhu, Joachim Behar, Tasos Papastylianou, and Gari D Clifford. 2014.CrowdLabel: A crowdsourcing platform for electrophysiology. In Computing in Cardiology 2014. IEEE, 789–792.