A Complete Framework for Offline and Counterfactual Evaluations of Interactive Recommendation Systems
Resumo
Interactive recommendation has been recognized as a Multi-Armed Bandit (MAB) problem. Items are arms to be pulled (i.e., recommended) and the user’s satisfaction is the reward to be maximized. Despite the advances, there is still a lack of consensus on the best practices to evaluate such solutions. Recently, two complementary frameworks were proposed to evaluate bandit solutions more accurately: iRec and OBP. The first one has a complete set of offline metrics and bandit models that allows us to perform an comparisons with several evaluation policies. The second one provides a huge set of bandit models to be evaluated through several counterfactual estimators. However, there is a room to be explored when joining these two frameworks. We propose and evaluate an integration between both, demonstrating the potential and richness of such combination.
Palavras-chave:
Contextual Bandits, Offline Evaluation, Counterfactual Evaluation
Referências
Marc Abeille and Alessandro Lazaric. 2017. Linear thompson sampling revisited. In Artificial Intelligence and Statistics. PMLR, 176–184. https://doi.org/10.48550/arXiv.1611.06534
Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3, Nov (2002), 397–422. https://doi.org/10.1162/153244303321897663
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2-3 (2002), 235–256. https://doi.org/10.1023/A:1013689704352
Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. In Advances in neural information processing systems. 2249–2257
Jaya Kawale, Hung H Bui, Branislav Kveton, Long T Thanh, and Sanjay Chawla. 2015. Efficient thompson sampling for online matrix-factorization recommendation. Advances in Neural Information Processing Systems 28 (2015), 1297–1305. https://doi.org/10.5555/2969239.2969384
Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. 2016. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 539–548.
Yaxu Liu, Jui-Nan Yen, Bowen Yuan, Rundong Shi, Peng Yan, and Chih-Jen Lin. 2022. Practical Counterfactual Policy Learning for Top-K Recommendations. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1141–1151.
Weishen Pan, Sen Cui, Hongyi Wen, Kun Chen, Changshui Zhang, and Fei Wang. 2021. Correcting the User Feedback-Loop Bias for Recommendation Systems. arXiv preprint arXiv:2109.06037 (2021)
Yuta Saito, Shunsuke Aihara, Megumi Matsutani, and Yusuke Narita. 2020. Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation. arXiv preprint arXiv:2008.07146 (2020)
Javier Sanz-Cruzado, Pablo Castells, and Esther López. 2019. A simple multi-armed nearest-neighbor bandit for interactive recommendation. In Proceedings of the 13th ACM Conference on Recommender Systems. 358–362.
Sulthana Shams, Daron Anderson, and Douglas Leith. 2021. Cluster-Based Bandits: Fast Cold-Start for Recommender System New Users. (2021)
Nicollas Silva, Thiago Silva, Heitor Werneck, Leonardo Rocha, and Adriano Pereira. 2023. User Cold-Start Problem in Multi-Armed Bandits: When the First Recommendations Guide the User’s Experience. ACM Trans. Recomm. Syst. 1, 1 (2023). https://doi.org/10.1145/3554819
Nícollas Silva, Heitor Werneck, Thiago Silva, Adriano C. M. Pereira, and Leonardo Rocha. 2021. A contextual approach to improve the user’s experience in interactive recommendation systems. In WebMedia ’21: Brazilian Symposium on Multimedia and the Web, Belo Horizonte, Minas Gerais, Brazil, November 5-12, 2021, Adriano César Machado Pereira and Leonardo Chaves Dutra da Rocha (Eds.). ACM, 89–96. https://doi.org/10.1145/3470482.3479621
Thiago Silva, Nícollas Silva, Carlos Mito, Adriano C. M. Pereira, and Leonardo Rocha. 2022. Interactive POI Recommendation: applying a Multi-Armed Bandit framework to characterise and create new models for this scenario. In WebMedia ’22: Brazilian Symposium on Multimedia and Web, Curitiba, Brazil, November 7 - 11, 2022, Thiago Henrique Silva, Leyza Baldo Dorini, Jussara M. Almeida, and Humberto Torres Marques-Neto (Eds.). ACM, 211–221. https://doi.org/10.1145/3539637.3557060
Thiago Silva, Nícollas Silva, Heitor Werneck, Carlos Mito, Adriano CM Pereira, and Leonardo Rocha. 2022. Irec: An interactive recommendation framework. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3165–3175.
Qing Wang, Chunqiu Zeng, Wubai Zhou, Tao Li, S Sitharama Iyengar, Larisa Shwartz, and Genady Ya Grabarnik. 2018. Online interactive collaborative filtering using multi-armed bandit with dependent arms. IEEE Transactions on Knowledge and Data Engineering 31, 8 (2018), 1569–1580.
Qingyun Wu, Naveen Iyer, and Hongning Wang. 2018. Learning contextual bandits in a non-stationary environment. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 495–504.
Yanming Yang, Xin Xia, David Lo, and John Grundy. 2022. A survey on deep learning for software engineering. ACM Computing Surveys (CSUR) 54, 10s (2022), 1–73.
Xiaoxue Zhao, Weinan Zhang, and Jun Wang. 2013. Interactive collaborative filtering. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 1411–1420.
Sijin Zhou, Xinyi Dai, Haokun Chen, Weinan Zhang, Kan Ren, Ruiming Tang, Xiuqiang He, and Yong Yu. 2020. Interactive recommender system via knowledge graph-enhanced reinforcement learning. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 179–188.
Lixin Zou, Long Xia, Yulong Gu, Xiangyu Zhao, Weidong Liu, Jimmy Xiangji Huang, and Dawei Yin. 2020. Neural Interactive Collaborative Filtering. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 749–758.
Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3, Nov (2002), 397–422. https://doi.org/10.1162/153244303321897663
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2-3 (2002), 235–256. https://doi.org/10.1023/A:1013689704352
Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. In Advances in neural information processing systems. 2249–2257
Jaya Kawale, Hung H Bui, Branislav Kveton, Long T Thanh, and Sanjay Chawla. 2015. Efficient thompson sampling for online matrix-factorization recommendation. Advances in Neural Information Processing Systems 28 (2015), 1297–1305. https://doi.org/10.5555/2969239.2969384
Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. 2016. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 539–548.
Yaxu Liu, Jui-Nan Yen, Bowen Yuan, Rundong Shi, Peng Yan, and Chih-Jen Lin. 2022. Practical Counterfactual Policy Learning for Top-K Recommendations. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1141–1151.
Weishen Pan, Sen Cui, Hongyi Wen, Kun Chen, Changshui Zhang, and Fei Wang. 2021. Correcting the User Feedback-Loop Bias for Recommendation Systems. arXiv preprint arXiv:2109.06037 (2021)
Yuta Saito, Shunsuke Aihara, Megumi Matsutani, and Yusuke Narita. 2020. Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation. arXiv preprint arXiv:2008.07146 (2020)
Javier Sanz-Cruzado, Pablo Castells, and Esther López. 2019. A simple multi-armed nearest-neighbor bandit for interactive recommendation. In Proceedings of the 13th ACM Conference on Recommender Systems. 358–362.
Sulthana Shams, Daron Anderson, and Douglas Leith. 2021. Cluster-Based Bandits: Fast Cold-Start for Recommender System New Users. (2021)
Nicollas Silva, Thiago Silva, Heitor Werneck, Leonardo Rocha, and Adriano Pereira. 2023. User Cold-Start Problem in Multi-Armed Bandits: When the First Recommendations Guide the User’s Experience. ACM Trans. Recomm. Syst. 1, 1 (2023). https://doi.org/10.1145/3554819
Nícollas Silva, Heitor Werneck, Thiago Silva, Adriano C. M. Pereira, and Leonardo Rocha. 2021. A contextual approach to improve the user’s experience in interactive recommendation systems. In WebMedia ’21: Brazilian Symposium on Multimedia and the Web, Belo Horizonte, Minas Gerais, Brazil, November 5-12, 2021, Adriano César Machado Pereira and Leonardo Chaves Dutra da Rocha (Eds.). ACM, 89–96. https://doi.org/10.1145/3470482.3479621
Thiago Silva, Nícollas Silva, Carlos Mito, Adriano C. M. Pereira, and Leonardo Rocha. 2022. Interactive POI Recommendation: applying a Multi-Armed Bandit framework to characterise and create new models for this scenario. In WebMedia ’22: Brazilian Symposium on Multimedia and Web, Curitiba, Brazil, November 7 - 11, 2022, Thiago Henrique Silva, Leyza Baldo Dorini, Jussara M. Almeida, and Humberto Torres Marques-Neto (Eds.). ACM, 211–221. https://doi.org/10.1145/3539637.3557060
Thiago Silva, Nícollas Silva, Heitor Werneck, Carlos Mito, Adriano CM Pereira, and Leonardo Rocha. 2022. Irec: An interactive recommendation framework. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3165–3175.
Qing Wang, Chunqiu Zeng, Wubai Zhou, Tao Li, S Sitharama Iyengar, Larisa Shwartz, and Genady Ya Grabarnik. 2018. Online interactive collaborative filtering using multi-armed bandit with dependent arms. IEEE Transactions on Knowledge and Data Engineering 31, 8 (2018), 1569–1580.
Qingyun Wu, Naveen Iyer, and Hongning Wang. 2018. Learning contextual bandits in a non-stationary environment. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 495–504.
Yanming Yang, Xin Xia, David Lo, and John Grundy. 2022. A survey on deep learning for software engineering. ACM Computing Surveys (CSUR) 54, 10s (2022), 1–73.
Xiaoxue Zhao, Weinan Zhang, and Jun Wang. 2013. Interactive collaborative filtering. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 1411–1420.
Sijin Zhou, Xinyi Dai, Haokun Chen, Weinan Zhang, Kan Ren, Ruiming Tang, Xiuqiang He, and Yong Yu. 2020. Interactive recommender system via knowledge graph-enhanced reinforcement learning. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 179–188.
Lixin Zou, Long Xia, Yulong Gu, Xiangyu Zhao, Weidong Liu, Jimmy Xiangji Huang, and Dawei Yin. 2020. Neural Interactive Collaborative Filtering. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 749–758.
Publicado
23/10/2023
Como Citar
ANDRADE, Yan; SILVA, Nícollas; SILVA, Thiago; PEREIRA, Adriano; DIAS, Diego; ALBERGARIA, Elisa T.; ROCHA, Leonardo.
A Complete Framework for Offline and Counterfactual Evaluations of Interactive Recommendation Systems. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 29. , 2023, Ribeirão Preto/SP.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2023
.
p. 193–197.