Extrapolation-Based Data Augmentation for Sequential Recommendation
Resumo
With the advent of deep learning in the recommendation field, a lot ofwork has been and still is being done to bring deep learning-based models to their full potential. One line of work, Self-Supervised Learning (SSL), focuses on extracting the maximum potential of datasets to improve recommendation performance, and at the same time, attempts to diminish data-related problems, such as data sparsity, that are commonly seen in machine learning techniques. One branch of SSL for recommender systems uses predictive strategies to create new labels and examples for training by, for instance, adding new interactions to the user’s history of interactions. However, the existing models that explore this idea are somewhat limited. They focus on adding new interactions at the start of the sequences, ignoring the performance improvements that could be achieved by adding interactions in the middle and end of the sequences. We propose Extrapolation-based Sequence Augmentation for Sequential Recommendation (ESA4SRec), a model that uses the sequence reconstruction capabilities of BERT4Rec to generate new data at any position of a sequence by extrapolating the existing knowledge to unknown, novel interactions. The resulting augmented dataset is then used as input to a modelagnostic sequential recommender system. We compare our approach to related models and demonstrate the performance improvements when compared with the original datasets and the overall best performance of our method. ESA4SRec’s code available at https://github.com/viniciusgm000/ESA4SRec.
Referências
Jae-Won Chung and Sung Min Cho. 2019. BERT4Rec-VAE-Pytorch. [link].
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the Fourth ACM Conference on Recommender Systems (Barcelona, Spain) (RecSys ’10). Association for Computing Machinery, New York, NY, USA, 39–46. DOI: 10.1145/1864708.1864721
Yizhou Dang, Enneng Yang, Yuting Liu, Guibing Guo, Linying Jiang, Jianzhe Zhao, and Xingwei Wang. 2024. Data Augmentation for Sequential Recommendation: A Survey. arXiv:2409.13545 [cs.IR] [link]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. DOI: 10.18653/v1/N19-1423
Zhicheng Ding, Jiahao Tian, Zhenkai Wang, Jinman Zhao, and Siyang Li. 2024. Data Imputation using Large Language Model to Accelerate Recommendation System. arXiv:2407.10078 [cs.IR] [link]
Hui Fang, Danning Zhang, Yiheng Shu, and Guibing Guo. 2020. Deep Learning for Sequential Recommendation: Algorithms, Influential Factors, and Evaluations. ACM Trans. Inf. Syst. 39, 1, Article 10 (Nov. 2020), 42 pages. DOI: 10.1145/3426723
Maxwell F. Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 5, 4, Article 19 (Dec. 2015), 19 pages. DOI: 10.1145/2827872
Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In Proceedings of the 25th International Conference on World Wide Web (Montréal, Québec, Canada) (WWW ’16). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 507–517. DOI: 10.1145/2872427.2883037
Balázs Hidasi and Alexandros Karatzoglou. 2018. Recurrent Neural Networks with Top-k Gains for Session-based Recommendations. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy) (CIKM ’18). Association for Computing Machinery, New York, NY, USA, 843–852. DOI: 10.1145/3269206.3271761
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. arXiv:1511.06939 [cs.LG] [link]
Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In 2008 Eighth IEEE International Conference on Data Mining. Institute of Electrical and Electronics Engineers, New York, NY, USA, 263–272. DOI: 10.1109/ICDM.2008.22
Jian Huang, Junyi Chai, and Stella Cho. 2020. Deep learning in finance and banking: A literature review and classification. Frontiers of Business Research in China 14, 1 (2020), 13. DOI: 10.1186/s11782-020-00082-6
Won-Seok Hwang, Shaoyu Li, Sang-Wook Kim, and Kichun Lee. 2018. Data imputation using a trust network for recommendation via matrix factorization. Computer Science and Information Systems 15, 2 (2018), 347–368.
Juyong Jiang, Peiyan Zhang, Yingtao Luo, Chaozhuo Li, Jae Boum Kim, Kai Zhang, Senzhang Wang, Sunghun Kim, and Philip S. Yu. 2025. Improving Sequential Recommendations via Bidirectional Temporal Data Augmentation With Pre-Training. IEEE Transactions on Knowledge and Data Engineering 37, 5 (2025), 2652–2664. DOI: 10.1109/TKDE.2025.3546035
Andreas Kamilaris and Francesc X. Prenafeta-Boldú. 2018. Deep learning in agriculture: A survey. Computers and Electronics in Agriculture 147 (2018), 70–90. DOI: 10.1016/j.compag.2018.02.016
Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recommendation. In 2018 IEEE International Conference on Data Mining (ICDM). Institute of Electrical and Electronics Engineers, New York, NY, USA, 197–206. DOI: 10.1109/ICDM.2018.00035
Kibum Kim, Dongmin Hyun, Sukwon Yun, and Chanyoung Park. 2023. MELT: Mutual Enhancement of Long-Tailed User and Item for Sequential Recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 68–77. DOI: 10.1145/3539618.3591725
Yehuda Koren, Steffen Rendle, and Robert Bell. 2022. Advances in Collaborative Filtering. Springer US, New York, NY, 91–142. DOI: 10.1007/978-1-0716-2197-4_3
Chaoliu Li, Lianghao Xia, Xubin Ren, Yaowen Ye, Yong Xu, and Chao Huang. 2023. Graph Transformer for Recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 1680–1689. DOI: 10.1145/3539618.3591723
Jiacheng Li, Yujie Wang, and Julian McAuley. 2020. Time Interval Aware Self-Attention for Sequential Recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining (Houston, TX, USA) (WSDM ’20). Association for Computing Machinery, New York, NY, USA, 322–330. DOI: 10.1145/3336191.3371786
Lei Li, Yongfeng Zhang, and Li Chen. 2021. Personalized Transformer for Explainable Recommendation. arXiv:2105.11601 [cs.IR] [link]
Yujie Lin, Chenyang Wang, Zhumin Chen, Zhaochun Ren, Xin Xin, Qiang Yan, Maarten de Rijke, Xiuzhen Cheng, and Pengjie Ren. 2023. A Self-Correcting Sequential Recommender. In Proceedings of the ACMWeb Conference 2023 (Austin, TX, USA) (WWW ’23). Association for Computing Machinery, New York, NY, USA, 1283–1293. DOI: 10.1145/3543507.3583479
Qidong Liu, Jiaxi Hu, Yutian Xiao, Xiangyu Zhao, Jingtong Gao, Wanyu Wang, Qing Li, and Jiliang Tang. 2024. Multimodal Recommender Systems: A Survey. ACM Comput. Surv. 57, 2, Article 26 (Oct. 2024), 17 pages. DOI: 10.1145/3695461
Zhiwei Liu, Yongjun Chen, Jia Li, Philip S. Yu, Julian McAuley, and Caiming Xiong. 2021. Contrastive Self-supervised Sequential Recommendation with Robust Augmentation. arXiv:2108.06479 [cs.IR] [link]
Zhiwei Liu, Ziwei Fan, Yu Wang, and Philip S. Yu. 2021. Augmenting Sequential Recommendation with Pseudo-Prior Items via Reversely Pre-training Transformer. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, Canada) (SIGIR ’21). Association for Computing Machinery, New York, NY, USA, 1608–1612. DOI: 10.1145/3404835.3463036
Jianxin Ma, Chang Zhou, Hongxia Yang, Peng Cui, Xin Wang, and Wenwu Zhu. 2020. Disentangled Self-Supervision in Sequential Recommenders. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Virtual Event, CA, USA) (KDD ’20). Association for Computing Machinery, New York, NY, USA, 483–491. DOI: 10.1145/3394486.3403091
Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-Based Recommendations on Styles and Substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (Santiago, Chile) (SIGIR ’15). Association for Computing Machinery, New York, NY, USA, 43–52. DOI: 10.1145/2766462.2767755
Riccardo Miotto, Fei Wang, Shuang Wang, Xiaoqian Jiang, and Joel T Dudley. 2017. Deep learning for healthcare: review, opportunities and challenges. Briefings in Bioinformatics 19, 6 (05 2017), 1236–1246. arXiv: [link] DOI: 10.1093/bib/bbx044
Narjes Nikzad-Khasmakhi, Mohammad Ali Balafar, Mohammad Reza Feizi-Derakhshi, and Cina Motamed. 2021. BERTERS: Multimodal representation learning for expert recommendation system with transformers and graph embeddings. Chaos, Solitons & Fractals 151 (2021), 111260. DOI: 10.1016/j.chaos.2021.111260
Aleksandr Petrov and Craig Macdonald. 2022. A Systematic Review and Replicability Study of BERT4Rec for Sequential Recommendation. In Proceedings of the 16th ACM Conference on Recommender Systems (Seattle, WA, USA) (Rec-Sys ’22). Association for Computing Machinery, New York, NY, USA, 436–447. DOI: 10.1145/3523227.3548487
Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (Virtual Event, AZ, USA) (WSDM ’22). Association for Computing Machinery, New York, NY, USA, 813–823. DOI: 10.1145/3488560.3498433
Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018. Sequence-Aware Recommender Systems. ACM Comput. Surv. 51, 4, Article 66 (July 2018), 36 pages. DOI: 10.1145/3190616
Shaina Raza and Chen Ding. 2019. Progress in context-aware recommender systems — An overview. Computer Science Review 31 (2019), 84–97. DOI: 10.1016/j.cosrev.2019.01.001
Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized Markov chains for next-basket recommendation. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW ’10). Association for Computing Machinery, New York, NY, USA, 811–820. DOI: 10.1145/1772690.1772773
Kyuyong Shin, Hanock Kwak, Kyung-Min Kim, Minkyu Kim, Young-Jin Park, Jisu Jeong, and Seungjae Jung. 2021. One4all User Representation for Recommender Systems in E-commerce. arXiv:2106.00573 [cs.IR] [link]
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Beijing, China) (CIKM ’19). Association for Computing Machinery, New York, NY, USA, 1441–1450. DOI: 10.1145/3357384.3357895
Jiaxi Tang and Ke Wang. 2018. Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (Marina Del Rey, CA, USA) (WSDM ’18). Association for Computing Machinery, New York, NY, USA, 565–573. DOI: 10.1145/3159652.3159656
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.
HongweiWang, Miao Zhao, Xing Xie,Wenjie Li, and Minyi Guo. 2019. Knowledge Graph Convolutional Networks for Recommender Systems. In The World Wide Web Conference (San Francisco, CA, USA) (WWW’19). Association for Computing Machinery, New York, NY, USA, 3307–3313. DOI: 10.1145/3308558.3313417
JianlingWang, Ya Le, Bo Chang, YuyanWang, Ed H. Chi, and Minmin Chen. 2022. Learning to Augment for Casual User Recommendation. In Proceedings of the ACM Web Conference 2022 (Virtual Event, Lyon, France) (WWW ’22). Association for Computing Machinery, New York, NY, USA, 2183–2194. DOI: 10.1145/3485447.3512147
Wei Wei, Xubin Ren, Jiabin Tang, Qinyong Wang, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. 2024. LLMRec: Large Language Models with Graph Augmentation for Recommendation. In Proceedings of the 17th ACM International Conference onWeb Search and Data Mining (Merida, Mexico) (WSDM ’24). Association for Computing Machinery, New York, NY, USA, 806–815. DOI: 10. 1145/3616855.3635853
AlexanderWettig, Tianyu Gao, Zexuan Zhong, and Danqi Chen. 2023. Should You Mask 15% in Masked Language Modeling?. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Andreas Vlachos and IsabelleAugenstein (Eds.).Association for Computational Linguistics, Dubrovnik, Croatia, 2985–3000. DOI: 10.18653/v1/2023.eacl-main.217
ChuhanWu, FangzhaoWu, Tao Qi, Jianxun Lian, Yongfeng Huang, and Xing Xie. 2020. PTUM: Pre-training User Model from Unlabeled User Behaviors via Selfsupervision. In Findings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 1939–1944. DOI: 10.18653/v1/2020.findings-emnlp.174
Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. 2022. Contrastive Learning for Sequential Recommendation. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). Institute of Electrical and Electronics Engineers, New York, NY, USA, 1259–1273. DOI: 10.1109/ICDE53745.2022.00099
Joo yeong Song and Bongwon Suh. 2022. Data Augmentation Strategies for Improving Sequential Recommender Systems. arXiv:2203.14037 [cs.IR] DOI: 10.48550/ARXIV.2203.14037
Mingjia Yin, HaoWang,Wei Guo, Yong Liu, Suojuan Zhang, Sirui Zhao, Defu Lian, and Enhong Chen. 2024. Dataset Regeneration for Sequential Recommendation. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Barcelona, Spain) (KDD ’24). Association for Computing Machinery, New York, NY, USA, 3954–3965. DOI: 10.1145/3637528.3671841
Junliang Yu, Hongzhi Yin, Xin Xia, Tong Chen, Jundong Li, and Zi Huang. 2024. Self-Supervised Learning for Recommender Systems: A Survey. IEEE Transactions on Knowledge and Data Engineering 36, 1 (2024), 335–355. DOI: 10.1109/TKDE.2023.3282907
Jie Zou, Evangelos Kanoulas, Pengjie Ren, Zhaochun Ren, Aixin Sun, and Cheng Long. 2022. Improving Conversational Recommender Systems via Transformerbased Sequential Modelling. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (Madrid, Spain) (SIGIR ’22).Association for Computing Machinery, NewYork,NY, USA, 2319–2324. DOI: 10.1145/3477495.3531852
