MARVA: Modular Architecture for Robust Visual Agents

  • Vanessa Schenkel Unisinos
  • Gabriel de Oliveira Ramos Unisinos

Resumo


Generalization in visual RL is challenging: small visual shifts can degrade performance. We present MARVA, a dual-regularization extension of MaDi combining a GRL-based discriminator and a contrastive (InfoNCE) loss on masked views. On walker-walk, MARVA matches baseline performance in easier domains and improves robustness in video_hard and DistractingCS.

Referências

Bertoin, D., Zouitine, A., Zouitine, M., and Rachelson, E. (2022). Look where you look! saliency-guided q-networks for generalization in visual reinforcement learning. Advances in neural information processing systems, 35:30693–30706.

Grooten, B., Tomilin, T., Vasan, G., Taylor, M. E., Mahmood, R. A., Fang, M., Pechenizkiy, M., and Mocanu, D. C. (2024). Madi: Learning to mask distractions for generalization in visual deep reinforcement learning. In AAMAS’24: 2024 International Conference on Autonomous Agents and Multiagent Systems. IFAAMAS.

Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870.

Hansen, N., Su, H., and Wang, X. (2021). Stabilizing deep q-learning with convnets and vision transformers under data augmentation. Advances in neural information processing systems, 34:3680–3693.

Laskin, M., Lee, K., Stooke, A., Pinto, L., Abbeel, P., and Srinivas, A. (2020). Reinforcement learning with augmented data. Advances in neural information processing systems, 33:19884–19895.

Li, B., François-Lavet, V., Doan, T., and Pineau, J. (2021). Domain adversarial reinforcement learning. arXiv preprint arXiv:2102.07097.

Pinto, L., Davidson, J., Sukthankar, R., and Gupta, A. (2017). Robust adversarial reinforcement learning. In Intl. Conf. on Machine Learning, pages 2817–2826. PMLR.

Yarats, D., Kostrikov, I., and Fergus, R. (2021). Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In International conference on learning representations.
Publicado
12/11/2025
SCHENKEL, Vanessa; RAMOS, Gabriel de Oliveira. MARVA: Modular Architecture for Robust Visual Agents. In: ESCOLA REGIONAL DE APRENDIZADO DE MÁQUINA E INTELIGÊNCIA ARTIFICIAL DA REGIÃO SUL (ERAMIA-RS), 1. , 2025, Porto Alegre/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 184-187. DOI: https://doi.org/10.5753/eramiars.2025.16789.