Abstract
Reward machines allow the definition of rewards for temporally extended tasks and behaviors. Specifying “informative” reward machines can be challenging. One way to address this is to generate a reward machine from a high-level abstract description of the learning environment, using techniques such as AI planning. However previous planning-based approaches generate a reward machine based on a single (sequential or partial-order) plan, and do not allow maximum flexibility to the learning agent. In this paper we propose a new approach to synthesising reward machines which is based on the set of partial order plans for a goal. We prove that learning using such “maximally permissive” reward machines results in higher rewards than learning using RMs based on a single plan. We present experimental results which support our theoretical claims by showing that our approach obtains higher rewards than the single-plan approach in practice.
Original language | English |
---|---|
Publication status | Accepted/In press - 2 Sept 2024 |
Event | ECAI 2024: European Conference on Artificial Intelligence - Santiago de Compostela, Spain Duration: 19 Oct 2024 → 24 Oct 2024 https://www.ecai2024.eu/ |
Conference
Conference | ECAI 2024 |
---|---|
Abbreviated title | ECAI |
Country/Territory | Spain |
City | Santiago de Compostela |
Period | 19/10/24 → 24/10/24 |
Internet address |