Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning

Giovanni Varricchione; Natasha Alechina; Mehdi Dastani; Brian Logan

doi:10.1007/978-3-031-43264-4_21

Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning

Giovanni Varricchione^* (Corresponding Author), Natasha Alechina, Mehdi Dastani, Brian Logan

^*Corresponding author for this work

Utrecht University

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Abstract

Reward machines have recently been proposed as a means of encoding team tasks in cooperative multi-agent reinforcement learning. The resulting multi-agent reward machine is then decomposed into individual reward machines, one for each member of the team, allowing agents to learn in a decentralised manner while still achieving the team task. However, current work assumes the multi-agent reward machine to be given. In this paper, we show how reward machines for team tasks can be synthesised automatically from an Alternating-Time Temporal Logic specification of the desired team behaviour and a high-level abstraction of the agents’ environment. We present results suggesting that our automated approach has comparable, if not better, sample efficiency than reward machines generated by hand for multi-agent tasks.

Original language	English
Title of host publication	Multi-Agent Systems - 20th European Conference, EUMAS 2023, Proceedings
Editors	Vadim Malvone, Aniello Murano
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	328-344
Number of pages	17
ISBN (Print)	9783031432637
DOIs	https://doi.org/10.1007/978-3-031-43264-4_21
Publication status	Published - 7 Sept 2023
Event	Proceedings of the 20th European Conference on Multi-Agent Systems, EUMAS 2023 - Naples, Italy Duration: 14 Sept 2023 → 15 Sept 2023

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	14282 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	Proceedings of the 20th European Conference on Multi-Agent Systems, EUMAS 2023
Country/Territory	Italy
City	Naples
Period	14/09/23 → 15/09/23

Bibliographical note

Code is available at github.com/giovannivarr/SynthesisingRMsMARL.

Access to Document

10.1007/978-3-031-43264-4_21Licence: Unspecified

Cite this

Varricchione, G., Alechina, N., Dastani, M., & Logan, B. (2023). Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning. In V. Malvone, & A. Murano (Eds.), Multi-Agent Systems - 20th European Conference, EUMAS 2023, Proceedings (pp. 328-344). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14282 LNAI). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-43264-4_21

Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning. / Varricchione, Giovanni (Corresponding Author); Alechina, Natasha; Dastani, Mehdi et al.
Multi-Agent Systems - 20th European Conference, EUMAS 2023, Proceedings. ed. / Vadim Malvone; Aniello Murano. Springer Science and Business Media Deutschland GmbH, 2023. p. 328-344 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14282 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Varricchione, G, Alechina, N, Dastani, M & Logan, B 2023, Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning. in V Malvone & A Murano (eds), Multi-Agent Systems - 20th European Conference, EUMAS 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14282 LNAI, Springer Science and Business Media Deutschland GmbH, pp. 328-344, Proceedings of the 20th European Conference on Multi-Agent Systems, EUMAS 2023, Naples, Italy, 14/09/23. https://doi.org/10.1007/978-3-031-43264-4_21

Varricchione G, Alechina N, Dastani M, Logan B. Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning. In Malvone V, Murano A, editors, Multi-Agent Systems - 20th European Conference, EUMAS 2023, Proceedings. Springer Science and Business Media Deutschland GmbH. 2023. p. 328-344. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-43264-4_21

Varricchione, Giovanni ; Alechina, Natasha ; Dastani, Mehdi et al. / Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning. Multi-Agent Systems - 20th European Conference, EUMAS 2023, Proceedings. editor / Vadim Malvone ; Aniello Murano. Springer Science and Business Media Deutschland GmbH, 2023. pp. 328-344 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{c6b6942ec54b441cb62287bcced73ea0,

title = "Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning",

abstract = "Reward machines have recently been proposed as a means of encoding team tasks in cooperative multi-agent reinforcement learning. The resulting multi-agent reward machine is then decomposed into individual reward machines, one for each member of the team, allowing agents to learn in a decentralised manner while still achieving the team task. However, current work assumes the multi-agent reward machine to be given. In this paper, we show how reward machines for team tasks can be synthesised automatically from an Alternating-Time Temporal Logic specification of the desired team behaviour and a high-level abstraction of the agents{\textquoteright} environment. We present results suggesting that our automated approach has comparable, if not better, sample efficiency than reward machines generated by hand for multi-agent tasks.",

author = "Giovanni Varricchione and Natasha Alechina and Mehdi Dastani and Brian Logan",

note = "Code is available at github.com/giovannivarr/SynthesisingRMsMARL.; Proceedings of the 20th European Conference on Multi-Agent Systems, EUMAS 2023 ; Conference date: 14-09-2023 Through 15-09-2023",

year = "2023",

month = sep,

day = "7",

doi = "10.1007/978-3-031-43264-4_21",

language = "English",

isbn = "9783031432637",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "328--344",

editor = "Vadim Malvone and Aniello Murano",

booktitle = "Multi-Agent Systems - 20th European Conference, EUMAS 2023, Proceedings",

address = "Germany",

}

TY - GEN

T1 - Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning

AU - Varricchione, Giovanni

AU - Alechina, Natasha

AU - Dastani, Mehdi

AU - Logan, Brian

N1 - Code is available at github.com/giovannivarr/SynthesisingRMsMARL.

PY - 2023/9/7

Y1 - 2023/9/7

N2 - Reward machines have recently been proposed as a means of encoding team tasks in cooperative multi-agent reinforcement learning. The resulting multi-agent reward machine is then decomposed into individual reward machines, one for each member of the team, allowing agents to learn in a decentralised manner while still achieving the team task. However, current work assumes the multi-agent reward machine to be given. In this paper, we show how reward machines for team tasks can be synthesised automatically from an Alternating-Time Temporal Logic specification of the desired team behaviour and a high-level abstraction of the agents’ environment. We present results suggesting that our automated approach has comparable, if not better, sample efficiency than reward machines generated by hand for multi-agent tasks.

AB - Reward machines have recently been proposed as a means of encoding team tasks in cooperative multi-agent reinforcement learning. The resulting multi-agent reward machine is then decomposed into individual reward machines, one for each member of the team, allowing agents to learn in a decentralised manner while still achieving the team task. However, current work assumes the multi-agent reward machine to be given. In this paper, we show how reward machines for team tasks can be synthesised automatically from an Alternating-Time Temporal Logic specification of the desired team behaviour and a high-level abstraction of the agents’ environment. We present results suggesting that our automated approach has comparable, if not better, sample efficiency than reward machines generated by hand for multi-agent tasks.

UR - http://www.scopus.com/inward/record.url?scp=85171998898&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-43264-4_21

DO - 10.1007/978-3-031-43264-4_21

M3 - Published conference contribution

AN - SCOPUS:85171998898

SN - 9783031432637

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 328

EP - 344

BT - Multi-Agent Systems - 20th European Conference, EUMAS 2023, Proceedings

A2 - Malvone, Vadim

A2 - Murano, Aniello

PB - Springer Science and Business Media Deutschland GmbH

T2 - Proceedings of the 20th European Conference on Multi-Agent Systems, EUMAS 2023

Y2 - 14 September 2023 through 15 September 2023

ER -