Predicting the Learning Rate of Gradient Descent for Accelerating Matrix Factorization
Keywords:Gradient Descent, Learning Rate, Matrix Factorization, Recommender Systems
Matrix Factorization (MF) has become the predominant technique in recommender systems. The model parameters are usually learned by means of numerical methods, such as gradient descent. The learning rate of gradient descent is typically set to lower values in order to ensure that the algorithm will not miss a local optimum. As a consequence, the algorithm may take several iterations to converge. Ideally, one wants to find the learning rate that will lead to a local optimum in the first iterations, but that is very difficult to achieve given the high complexity of the search space. Starting with an exploratory analysis on several recommender systems datasets, we observed that there is an overall linear relationship between the learning rate and the number of iterations needed until convergence. Another key observation is that this relationship holds across the different recommender datasets chosen. From this, we propose to use simple linear regression models for predicting, for an unknown dataset, a good learning rate to start with. The idea is to estimate a learning rate that will get us as close as possible to a local optimal in the first iteration, without overshooting it. We evaluate our approach on 8 real-world recommender datasets and compare it against the standard learning algorithm, that uses a fixed learning rate, and adaptive learning rate strategies from the literature. We show that, for some datasets, we can reduce the number of iterations up to 40% when compared to the standard approach.