Mikayel Samvelyan

Mikayel
Samvelyan

Senior Research Scientist — Autonomous Scientific Discovery & Self-Improvement

I'm a Senior Research Scientist at Google DeepMind where I lead a research effort on autonomous scientific discovery and self-improvement. I'm also a member of the European Laboratory for Learning and Intelligent Systems (ELLIS).

Before joining Google DeepMind, I was a PhD Researcher at Meta FAIR and a PhD student at University College London (UCL). At UCL, I was supervised by Tim Rocktäschel at the UCL DARK Lab, while also being a part of the ELLIS PhD & Postdoc Program. Prior to this, I earned an MSc in Computer Science from the University of Oxford, working in the Whiteson Research Lab advised by Shimon Whiteson, following my Master's and Bachelor's degrees in Informatics and Applied Mathematics from Yerevan State University. I have held research and development roles at Reddit, Mentor, Toptal, and the USC Information Sciences Institute.

Research

My current research centers on autonomous scientific discovery and self-improvement. My long-term goal is to develop methods for autonomous, open-ended learning that exhibit scalable alignment and robustness, particularly as they become increasingly capable of assisting and accelerating scientific discovery and safety-critical research.

My previous research has primarily focused on reinforcement learning (RL), multi-agent learning, and open-endedness. My early works include widely-used tools for multi-agent RL, such as the QMIX method and SMAC benchmark. Much of my follow-up work has focused on using open-ended learning to train generally capable RL agents and diagnose their robustness. More recently, I used these techniques to enhance the safety of LLMs with approaches like Rainbow Teaming, which identifies vulnerabilities and generates synthetic data to improve LLM robustness, and also contributed to Meta Llama 3.

News

More

Highlighted Works

thesis
Robust Agents in Open-Ended Worlds

M Samvelyan

PhD Thesis, 2024

@misc{samvelyan_thesis,
      title={Robust Agents in Open-Ended Worlds},
      author={Mikayel Samvelyan},
      year={2025},
      eprint={2512.08139},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2512.08139},
}
llama3
The Llama 3 Herd of Models

Llama Team

arXiv, 2024

@inproceedings{llama3,
   title={The Llama 3 Herd of Models},
   author={Llama Team},
   year={2024},
   url={https://llama.meta.com/}
}
rainbow
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

M Samvelyan*, S Raparthy*, A Lupu*, E Hambro, A Markosyan, M Bhatt, Y Mao, M Jiang, J Parker-Holder, J Foerster, T Rocktäschel, R Raileanu

NeurIPS, 2024

@misc{samvelyan2024rainbow,
   title={Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts},
   author={Mikayel Samvelyan and Sharath Chandra Raparthy and Andrei Lupu and Eric Hambro and Aram H. Markosyan and Manish Bhatt and Yuning Mao and Minqi Jiang and Jack Parker-Holder and Jakob Foerster and Tim Rocktäschel and Roberta Raileanu},
   year={2024},
   eprint={2402.16822},
   archivePrefix={arXiv},
   primaryClass={cs.CL}
}
madrid
Multi-Agent Diagnostics for Robustness via Illuminated Diversity

M Samvelyan*, D Paglieri*, M Jiang, J Parker-Holder, T Rocktäschel

AAMAS, 2024 Oral

@misc{samvelyan2024multiagent,
   title={Multi-Agent Diagnostics for Robustness via Illuminated Diversity},
   author={Mikayel Samvelyan and Davide Paglieri and Minqi Jiang and Jack Parker-Holder and Tim Rocktäschel},
   year={2024},
   eprint={2401.13460},
   archivePrefix={arXiv},
   primaryClass={cs.LG}
}
MAESTRO
MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning

M Samvelyan, A Khan, M Dennis, M Jiang, J Parker-Holder, J Foerster, R Raileanu, T Rocktäschel

ICLR, 2023

@inproceedings{samvelyan2023maestro,
   title={{MAESTRO}: Open-Ended Environment Design for Multi-Agent Reinforcement Learning},
   author={Mikayel Samvelyan and Akbir Khan and Michael D Dennis and Minqi Jiang and Jack Parker-Holder and Jakob Nicolaus Foerster and Roberta Raileanu and Tim Rockt{\"a}schel},
   booktitle={International Conference on Learning Representations},
   year={2023},
   url={https://openreview.net/forum?id=sKWlRDzPfd7}
}
MiniHack
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

M Samvelyan, R Kirk, V Kurin, J Parker-Holder, M Jiang, E Hambro, F Petroni, H Küttler, E Grefenstette, T Rocktäschel

NeurIPS, 2021

@inproceedings{samvelyan2021minihack,
   title={MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research},
   author={Mikayel Samvelyan and Robert Kirk and Vitaly Kurin and Jack Parker-Holder and Minqi Jiang and Eric Hambro and Fabio Petroni and Heinrich Kuttler and Edward Grefenstette and Tim Rockt{\"a}schel},
   booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
   year={2021},
   url={https://openreview.net/forum?id=skFwlyefkWJ}
}
qmix_journal
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

T Rashid*, M Samvelyan*, C Schroeder de Witt, G Farquhar, J Foerster, S Whiteson

Journal of Machine Learning Research (JMLR), 2020

@article{rashid20monotonic,
   author  = {Tabish Rashid and Mikayel Samvelyan and Christian Schroeder de Witt and Gregory Farquhar and Jakob Foerster and Shimon Whiteson},
   title   = {Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning},
   journal = {Journal of Machine Learning Research},
   year    = {2020},
   volume  = {21},
   number  = {178},
   pages   = {1--51},
}
SMAC
The StarCraft Multi-Agent Challenge

M Samvelyan*, T Rashid*, C Schroeder de Witt, G Farquhar, N Nardelli, T Rudner, C Hung, P Torr, J Foerster, S Whiteson

AAMAS, 2019

@inproceedings{samvelyan2019smac,
   title = {{The} {StarCraft} {Multi}-{Agent} {Challenge}},
   author = {Samvelyan, Mikayel and Rashid, Tabish and Schroeder de Witt, Christian and Farquhar, Gregory and Nardelli, Nantas and Rudner, Tim G. J. and Hung, Chia-Man and Torr, Philip H. S. and Foerster, Jakob and Whiteson, Shimon},
   booktitle = {Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems},
   pages = {2186--2188},
   year = {2019},
}
QMIX
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

T Rashid*, M Samvelyan*, C Schroeder de Witt, G Farquhar, J Foerster, S Whiteson

ICML, 2018

@inproceedings{rashid18qmix,
   title = {{QMIX}: {Monotonic} {Value} {Function} {Factorisation} {for} {Deep} {Multi}-{Agent} {Reinforcement} {Learning}},
   author = {Rashid, Tabish and Samvelyan, Mikayel and Schroeder, Christian and Farquhar, Gregory and Foerster, Jakob and Whiteson, Shimon},
   booktitle = {Proceedings of the 35th International Conference on Machine Learning},
   publisher = {PMLR},
   volume = {80},
   pages = {4295--4304},
   year = {2018},
}

Other Selected Works

See Google Scholar for more publications.

sparq
SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms

A Havrilla, E Hughes, M Samvelyan, J Abernethy

arXiv, 2025

@misc{havrilla2025sparq,
      title={SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms},
      author={Alex Havrilla and Edward Hughes and Mikayel Samvelyan and Jacob Abernethy},
      year={2025},
      eprint={2506.06499},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2506.06499},
}
jaxmarl
JaxMARL: Multi-Agent RL Environments and Algorithms in JAX

A Rutherford, B Ellis, M Gallici, J Cook, A Lupu, G Ingvarsson, T Willi, A Khan, C Schroeder de Witt, A Souly, S Bandyopadhyay, M Samvelyan, M Jiang, R Lange, S Whiteson, B Lacerda, N Hawes, T Rocktäschel, C Lu, J Foerster

NeurIPS, 2024

@misc{rutherford2023jaxmarl,
   title={JaxMARL: Multi-Agent RL Environments in JAX},
   author={Alexander Rutherford and Benjamin Ellis and Matteo Gallici and Jonathan Cook and Andrei Lupu and Gardar Ingvarsson and Timon Willi and Akbir Khan and Christian Schroeder de Witt and Alexandra Souly and Saptarashmi Bandyopadhyay and Mikayel Samvelyan and Minqi Jiang and Robert Tjarko Lange and Shimon Whiteson and Bruno Lacerda and Nick Hawes and Tim Rocktaschel and Chris Lu and Jakob Nicolaus Foerster},
   year={2023},
   eprint={2311.10090},
   archivePrefix={arXiv},
   primaryClass={cs.LG}
}
craftax
Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

M Matthews, M Beukmans, B Ellis, M Samvelyan, M Jackson, S Coward, J Foerster

ICML, 2024 Spotlight

@article{matthews2024craftax,
   title={Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning},
   author={Michael Matthews and Michael Beukman and Benjamin Ellis and Mikayel Samvelyan and Matthew Jackson and Samuel Coward and Jakob Foerster},
   journal={arXiv preprint},
   year={2024},
}
SMACv2
SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning

B Ellis, J Cook, S Moalla, M Samvelyan, M Sun, A Mahajan, J Foerster, S Whiteson

NeurIPS, 2023

 @inproceedings{ellis2023smacv2,
    title={{SMAC}v2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning},
    author={Benjamin Ellis and Jonathan Cook and Skander Moalla and Mikayel Samvelyan and Mingfei Sun and Anuj Mahajan and Jakob Nicolaus Foerster and Shimon Whiteson},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    year={2023},
    url={https://openreview.net/forum?id=5OjLGiJW3u}
}
mixme
Mix-ME: Quality-Diversity for Multi-Agent Learning

G Ingvarsson, M Samvelyan, M Flageat, B Lim, A Cully, T Rocktäschel

ALOE Workshop @ NeurIPS, 2023

@inproceedings{ingvarsson2023mixme,
   title={Mix-{ME}: Quality-Diversity for Multi-Agent Learning},
   author={Gar{\dh}ar Ingvarsson and Mikayel Samvelyan and Manon Flageat and Bryan Lim and Antoine Cully and Tim Rockt{\"a}schel},
   booktitle={Second Agent Learning in Open-Endedness Workshop},
   year={2023},
   url={https://openreview.net/forum?id=acD8BxMjwV}
}
GriddlyJS
GriddlyJS: A Web IDE for Reinforcement Learning

C Bamford, M Jiang, M Samvelyan, T Rocktäschel

NeurIPS, 2022

@inproceedings{bamford2022griddlyjs,
   title={Griddly{JS}: A Web {IDE} for Reinforcement Learning},
   author={Christopher Bamford and Minqi Jiang and Mikayel Samvelyan and Tim Rockt{\"a}schel},
   booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
   year={2022},
   url={https://openreview.net/forum?id=YmacJv0i_UR}
}
Accel
Evolving Curricula with Regret-Based Environment Design

J Parker-Holder*, M Jiang*, M Dennis, M Samvelyan, J Foerster, E Grefenstette, T Rocktäschel

ICML, 2022

@article{parkerholder2022evolving,
   title={Evolving Curricula with Regret-Based Environment Design},
   author={Parker-Holder, Jack and Jiang, Minqi and Dennis, Michael D and Samvelyan, Mikayel and Foerster, Jakob Nicolaus and Grefenstette, Edward and Rockt{\"a}schel, Tim},
   journal={arXiv preprint arXiv:2203.01302},
   year={2022}
}
SkillHack
Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning

M Matthews, M Samvelyan, J Parker-Holder, E Grefenstette, T Rocktäschel

CoLLAs, 2022

@misc{matthews2022hierarchical,
   url = {https://arxiv.org/abs/2207.11584},
   author = {Matthews, Michael and Samvelyan, Mikayel and Parker-Holder, Jack and Grefenstette, Edward and Rocktäschel, Tim},
   keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
   title = {Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning},
   publisher = {arXiv},
   year = {2022},
}
genmas
Generalization in Cooperative Multi-Agent Systems

A Mahajan, M Samvelyan, T Gupta, B Ellis, M Sun, T Rocktäschel, S Whiteson

arXiv, 2022

@article{mahajan2022generalization,
   title={Generalization in Cooperative Multi-Agent Systems},
   author={Mahajan, Anuj and Samvelyan, Mikayel and Gupta, Tarun and Ellis, Benjamin and Sun, Mingfei and Rockt{\"a}schel, Tim and Whiteson, Shimon},
   journal={arXiv preprint arXiv:2202.00104},
   year={2022},
}
Tesseract
Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

A Mahajan, M Samvelyan, L Mao, V Makoviychuk, A Garg, J Kossaifi, S Whiteson, Y Zhu, A Anandkumar

ICML, 2021

@inproceedings{mahajan2021tesseract,
   title = {Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning},
   author = {Mahajan, Anuj and Samvelyan, Mikayel and Mao, Lei and Makoviychuk, Viktor and Garg, Animesh and Kossaifi, Jean and Whiteson, Shimon and Zhu, Yuke and Anandkumar, Animashree},
   booktitle = {Proceedings of the 38th International Conference on Machine Learning},
   publisher = {PMLR},
   volume = {139},
   pages = {7301--7312},
   year = {2021},
}
MAVEN
MAVEN: Multi-Agent Variational Exploration

A Mahajan, T Rashid, M Samvelyan, S Whiteson

NeurIPS, 2019

@incollection{mahajan2019maven,
   title = {{MAVEN}: {Multi}-{Agent} {Variational} {Exploration}},
   author = {Mahajan, Anuj and Rashid, Tabish and Samvelyan, Mikayel and Whiteson, Shimon},
   booktitle = {Advances in Neural Information Processing Systems 32},
   pages = {7611--7622},
   year = {2019},
}

Libraries

Blogposts

Teaching

  • Spring 2020 — Data Structures (TA)
  • Fall 2019 — Machine Learning (Lecturer)
  • Fall 2018 — Machine Learning (Lecturer)
  • Fall 2018 — Operating Systems (TA)
  • Fall 2018 — Artificial Intelligence (Guest Lecturer and TA)

Professional Service

Workshop and Competition Organization

Area Chair

Reviewing