Extracting Textual Features from Video Streaming Services Publications to Predict their Popularity

  • Sidney Loyola de Sá UFF
  • Aline Paes UFF
  • Antonio A. de A. Rocha UFF


The Internet's popularization has increased the amount of content produced and consumed on the Web. To take advantage of this new market, major content producers such as Netflix and Amazon Prime have emerged focusing on video streaming services. However, despite the large number and diversity of videos made available by these content providers, few of them attract most users' attention. For example, in the data explored in this paper, only 6% of the most popular videos are responsible for 85% of the total views. Finding out in advance which videos will be popular is not trivial, specially because of the large amount of influencing variables. Nevertheless, a tool with this ability would be of great value to help dimensioning network infrastructure and to properly recommend new content to users. In this work, we propose two approaches to obtaining features to classify the popularity of a video before it is published. The first one builds upon predictive attributes defined by feature engineering. The second leverages word embeddings from the descriptions and titles of the videos. We experiment with the proposed approaches on a set of videos from GloboPlay, the largest provider of video streaming services in Latin America. A combination of both engineered features and the embeddings using Random Forest machine learning algorithm reached the best result, with an accuracy of 87%.
Palavras-chave: popularity prediction, video, machine learning, word embeddings
Como Citar

Selecione um Formato
DE SÁ, Sidney Loyola; PAES, Aline; ROCHA, Antonio A. de A.. Extracting Textual Features from Video Streaming Services Publications to Predict their Popularity. In: SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 1. , 2021, Minas Gerais. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 113-120.

Artigos mais lidos do(s) mesmo(s) autor(es)