Extracting Textual Features from Video Streaming Services Publications to Predict their Popularity

Sidney Loyola de Sá; Aline Paes; Antonio A. de A. Rocha

Sidney Loyola de Sá UFF
Aline Paes UFF
Antonio A. de A. Rocha UFF

Resumo

The Internet's popularization has increased the amount of content produced and consumed on the Web. To take advantage of this new market, major content producers such as Netflix and Amazon Prime have emerged focusing on video streaming services. However, despite the large number and diversity of videos made available by these content providers, few of them attract most users' attention. For example, in the data explored in this paper, only 6% of the most popular videos are responsible for 85% of the total views. Finding out in advance which videos will be popular is not trivial, specially because of the large amount of influencing variables. Nevertheless, a tool with this ability would be of great value to help dimensioning network infrastructure and to properly recommend new content to users. In this work, we propose two approaches to obtaining features to classify the popularity of a video before it is published. The first one builds upon predictive attributes defined by feature engineering. The second leverages word embeddings from the descriptions and titles of the videos. We experiment with the proposed approaches on a set of videos from GloboPlay, the largest provider of video streaming services in Latin America. A combination of both engineered features and the embeddings using Random Forest machine learning algorithm reached the best result, with an accuracy of 87%.

Palavras-chave: popularity prediction, video, machine learning, word embeddings

Extracting Textual Features from Video Streaming Services Publications to Predict their Popularity

Resumo

Artigos mais lidos do(s) mesmo(s) autor(es)