Maximally Permissive Reward Machines

Giovanni Varricchione, Natasha Alechina, Mehdi Dastani, Brian Logan

Research output: Contribution to conferenceUnpublished paperpeer-review

Abstract

Reward machines allow the definition of rewards for temporally extended tasks and behaviors. Specifying “informative” reward machines can be challenging. One way to address this is to generate a reward machine from a high-level abstract description of the learning environment, using techniques such as AI planning. However previous planning-based approaches generate a reward machine based on a single (sequential or partial-order) plan, and do not allow maximum flexibility to the learning agent. In this paper we propose a new approach to synthesising reward machines which is based on the set of partial order plans for a goal. We prove that learning using such “maximally permissive” reward machines results in higher rewards than learning using RMs based on a single plan. We present experimental results which support our theoretical claims by showing that our approach obtains higher rewards than the single-plan approach in practice.
Original languageEnglish
Publication statusAccepted/In press - 2 Sept 2024
EventECAI 2024: European Conference on Artificial Intelligence - Santiago de Compostela, Spain
Duration: 19 Oct 202424 Oct 2024
https://www.ecai2024.eu/

Conference

ConferenceECAI 2024
Abbreviated titleECAI
Country/TerritorySpain
CitySantiago de Compostela
Period19/10/2424/10/24
Internet address

Cite this