Pure-Past Action Masking

Giovanni Varricchione, Natasha Alechina, Mehdi Dastani, Giuseppe De Giacomo, Brian Logan, Giuseppe Perelli

Research output: Chapter in Book/Report/Conference proceedingPublished conference contribution

2 Citations (Scopus)
5 Downloads (Pure)

Abstract

We present Pure-Past Action Masking (PPAM), a lightweight approach to action masking for safe reinforcement learning. In PPAM, actions are disallowed (“masked”) according to specifications expressed in Pure-Past Linear Temporal Logic (PPLTL). PPAM can enforce non-Markovian constraints, i.e., constraints based on the history of the system, rather than just the current state of the (possibly hidden) MDP. The features used in the safety constraint need not be the same as those used by the learning agent, allowing a clear separation
of concerns between the safety constraints and reward specifications of the (learning) agent. We prove formally that an agent trained with PPAM can learn any optimal policy that satisfies the safety constraints, and that they are as expressive as shields, another approach to enforce non-Markovian constraints in RL. Finally, we provide empirical results showing how PPAM can guarantee constraint satisfaction in practice.
Original languageEnglish
Title of host publicationAAAI Conference and Symposium Proceedings
PublisherAAAI Press
Pages21646-21655
Number of pages10
Volume38
Edition19
ISBN (Electronic)978-1-57735-887-9
DOIs
Publication statusPublished - 25 Mar 2024
EventThe 38th Annual AAAI Conference on Artificial Intelligence - Vancouver Convention Centre, Vancouver, Canada
Duration: 20 Feb 202427 Feb 2024
Conference number: 38
https://aaai.org/aaai-conference/

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Conference

ConferenceThe 38th Annual AAAI Conference on Artificial Intelligence
Abbreviated titleAAAI
Country/TerritoryCanada
CityVancouver
Period20/02/2427/02/24
Internet address

Funding

This work was supported by PNRR MUR project PE0000013-FAIR, partially supported by ERC Advanced Grant WhiteMech (No. 834228), EU ICT-48 2020 project TAILOR (No. 952215), the ONRG project N62909-22-1- 2005, the InDAM-GNCS project “Strategic Reasoning in Mechanism Design”, and the project OCENW.M.21.377 funded by the Dutch Research Council (NWO). For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.

FundersFunder number
Ministero dell'Università e della RicercaPE0000013-FAIR
European Research Council834228
European Commission952215
Office of Naval Research GlobalN62909-22-1- 2005
Gruppo Nazionale per il Calcolo Scientifico
Istituto Nazionale di Alta Matematica
The Dutch Research Council OCENW.M.21.377

    Keywords

    • General

    Fingerprint

    Dive into the research topics of 'Pure-Past Action Masking'. Together they form a unique fingerprint.

    Cite this