Multi-agent Perspective of Fake Feedback Attacks on Stochastic Multi-armed Bandits

Charles A. N. Costa; Célia Ghedini Ralha

doi:10.5753/wesaac.2024.33451

Charles A. N. Costa UnB
Célia Ghedini Ralha UnB

DOI: https://doi.org/10.5753/wesaac.2024.33451

Resumo

The problem of false feedback attacks on stochastic Multi-Armed Bandits (MAB) algorithms is the focus of this article from the perspective of Multi-Agent Systems (MAS). We present the roles, beliefs, interactions, and goals of the agents involved in false feedback attacks according to performance metrics to evaluate the MAB algorithms’ performance. Three types of attacks illustrate the problem: constant, adaptive, and Jun’s adversarial attacks relaxed version, specifically built to exploit vulnerabilities in the e-Greedy and UCB1 algorithms. Experiments exploiting the vulnerabilities inherent in stochastic MAB algorithms present results revealing the useful characteristics for those who design MAS requiring defenses against false feedback attacks.

Referências

Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2/3):235–256.

Bastani, H. and Bayati, M. (2020). Online decision making with high-dimensional co-variates. Operations Research, 68(1):276–294.

Bellifemine, F., Caire, G., and Greenwood, D. (2007). Developing Multi-Agent Systems with JADE. John Wiley & Sons, Ltd, Chichester, West Sussex, England.

Bouneffouf, D., Rish, I., and Aggarwal, C. (2020). Survey on applications of multi-armed and contextual bandits. In 2020 IEEE Congress on Evolutionary Computation (CEC), page 1–8. IEEE Press.

Guan, Z., Ji, K., Bucci Jr, D. J., Hu, T. Y., Palombo, J., Liston, M., and Liang, Y. (2020). Robust stochastic bandit algorithms under probabilistic unbounded adversarial attack. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 4036–4043, New York, USA. AAAI Press.

Jun, K.-S., Li, L., Ma, Y., and Zhu, J. (2018). Adversarial attacks on stochastic bandits. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 31, Montréal, Canada. Curran Associates, Inc.

Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670.

Liu, F. and Shroff, N. (2019). Data poisoning attacks on stochastic bandits. In International Conference on Machine Learning, pages 4042–4050. PMLR.

Lykouris, T., Mirrokni, V., and Leme, R. P. (2018). Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, page 114–122, Los Angeles, USA. ACM.

Niss, L. and Tewari, A. (2020). What you see may not be what you get: Ucb bandit algorithms robust to ε-contamination. In Peters, J. and Sontag, D., editors, Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), volume 124 of Proceedings of Machine Learning Research, pages 450–459, Virtual. PMLR.

Rangi, A., Tran-Thanh, L., Xu, H., and Franceschetti, M. (2022). Saving stochastic bandits from poisoning attacks via limited data verification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36-7, pages 8054–8061, New York, USA. AAAI Press.

Schwartz, E. M., Bradlow, E. T., and Fader, P. S. (2017). Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Science, 36(4):500–522.

Shen, W., Wang, J., Jiang, Y.-G., and Zha, H. (2015). Portfolio choices with orthogonal bandit learning. In Twenty-fourth international joint conference on artificial intelligence.

Slivkins, A. (2019). Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12(1-2):1–286.

Vallée, T., Bonnet, G., and Bourdon, F. (2014). Multi-armed bandit policies for reputation systems. In Y., D., F., Z., J.M., C., and J., B., editors, Advances in Practical Applications of Heterogeneous Multi-Agent Systems. The PAAMS Collection, volume 8473 of Lecture Notes in Computer Science. Springer, Cham, Salamanca, Spain.

Vermorel, J. and Mohri, M. (2005). Multi-armed bandit algorithms and empirical evaluation. In Proceedings of the 16th European Conference on Machine Learning, ECML’05, page 437–448, Berlin, Heidelberg. Springer-Verlag.

Wooldridge, M. (2009). An Introduction to Multiagent Systems. Wiley, Chichester, UK, 2 edition.

Zhang, Y., Bian, J., and Zhu, W. (2013). Trust fraud: A crucial challenge for china’s e-commerce market. Electronic Commerce Research and Applications, 12(5):299–308.