Pure-Past Action Masking

Giovanni Varricchione; Natasha  Alechina; Mehdi Dastani; Giuseppe  De Giacomo; Brian Logan; Giuseppe  Perelli

Pure-Past Action Masking

Giovanni Varricchione, Natasha Alechina, Mehdi Dastani, Giuseppe De Giacomo, Brian Logan, Giuseppe Perelli

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Abstract

We present Pure-Past Action Masking (PPAM), a lightweight approach to action masking for safe reinforcement learning. In PPAM, actions are disallowed (“masked”) according to specifications expressed in Pure-Past Linear Temporal Logic (PPLTL). PPAM can enforce non-Markovian constraints, i.e., constraints based on the history of the system, rather than just the current state of the (possibly hidden) MDP. The features used in the safety constraint need not be the same as those used by the learning agent, allowing a clear separation
of concerns between the safety constraints and reward specifications of the (learning) agent. We prove formally that an agent trained with PPAM can learn any optimal policy that satisfies the safety constraints, and that they are as expressive as shields, another approach to enforce non-Markovian constraints in RL. Finally, we provide empirical results showing how PPAM can guarantee constraint satisfaction in practice.

Original language	English
Title of host publication	AAAI Conference and Symposium Proceedings
Publisher	AAAI Press
Publication status	Accepted/In press - 18 Feb 2024
Event	The 38th Annual AAAI Conference on Artificial Intelligence - Vancouver Convention Centre, Vancouver, Canada Duration: 20 Feb 2024 → 27 Feb 2024 Conference number: 38 https://aaai.org/aaai-conference/

Conference

Conference	The 38th Annual AAAI Conference on Artificial Intelligence
Abbreviated title	AAAI
Country/Territory	Canada
City	Vancouver
Period	20/02/24 → 27/02/24
Internet address	https://aaai.org/aaai-conference/

Access to Document

Varricchione_etal_AAAI_Pure_Past_Action_AAM
This manuscript has been made open access under a Creative Commons Attribution (CC BY) licence under the terms of the University of Aberdeen Research Publications Policy. https://creativecommons.org/licenses/by/4.0/
Accepted author manuscript, 431 KBLicence: CC BY

https://aaai.org/aaai-publications/aaai-conference-proceedings/Licence: Unspecified

Cite this

@inproceedings{08a7bed2a1ef4f1791ec44e335984cd0,

title = "Pure-Past Action Masking",

abstract = "We present Pure-Past Action Masking (PPAM), a lightweight approach to action masking for safe reinforcement learning. In PPAM, actions are disallowed (“masked”) according to specifications expressed in Pure-Past Linear Temporal Logic (PPLTL). PPAM can enforce non-Markovian constraints, i.e., constraints based on the history of the system, rather than just the current state of the (possibly hidden) MDP. The features used in the safety constraint need not be the same as those used by the learning agent, allowing a clear separationof concerns between the safety constraints and reward specifications of the (learning) agent. We prove formally that an agent trained with PPAM can learn any optimal policy that satisfies the safety constraints, and that they are as expressive as shields, another approach to enforce non-Markovian constraints in RL. Finally, we provide empirical results showing how PPAM can guarantee constraint satisfaction in practice.",

author = "Giovanni Varricchione and Natasha Alechina and Mehdi Dastani and {De Giacomo}, Giuseppe and Brian Logan and Giuseppe Perelli",

year = "2024",

month = feb,

day = "18",

language = "English",

booktitle = "AAAI Conference and Symposium Proceedings",

publisher = "AAAI Press",

note = "The 38th Annual AAAI Conference on Artificial Intelligence, AAAI ; Conference date: 20-02-2024 Through 27-02-2024",

url = "https://aaai.org/aaai-conference/",

}

TY - GEN

T1 - Pure-Past Action Masking

AU - Varricchione, Giovanni

AU - Alechina, Natasha

AU - Dastani, Mehdi

AU - De Giacomo, Giuseppe

AU - Logan, Brian

AU - Perelli, Giuseppe

N1 - Conference code: 38

PY - 2024/2/18

Y1 - 2024/2/18

N2 - We present Pure-Past Action Masking (PPAM), a lightweight approach to action masking for safe reinforcement learning. In PPAM, actions are disallowed (“masked”) according to specifications expressed in Pure-Past Linear Temporal Logic (PPLTL). PPAM can enforce non-Markovian constraints, i.e., constraints based on the history of the system, rather than just the current state of the (possibly hidden) MDP. The features used in the safety constraint need not be the same as those used by the learning agent, allowing a clear separationof concerns between the safety constraints and reward specifications of the (learning) agent. We prove formally that an agent trained with PPAM can learn any optimal policy that satisfies the safety constraints, and that they are as expressive as shields, another approach to enforce non-Markovian constraints in RL. Finally, we provide empirical results showing how PPAM can guarantee constraint satisfaction in practice.

AB - We present Pure-Past Action Masking (PPAM), a lightweight approach to action masking for safe reinforcement learning. In PPAM, actions are disallowed (“masked”) according to specifications expressed in Pure-Past Linear Temporal Logic (PPLTL). PPAM can enforce non-Markovian constraints, i.e., constraints based on the history of the system, rather than just the current state of the (possibly hidden) MDP. The features used in the safety constraint need not be the same as those used by the learning agent, allowing a clear separationof concerns between the safety constraints and reward specifications of the (learning) agent. We prove formally that an agent trained with PPAM can learn any optimal policy that satisfies the safety constraints, and that they are as expressive as shields, another approach to enforce non-Markovian constraints in RL. Finally, we provide empirical results showing how PPAM can guarantee constraint satisfaction in practice.

M3 - Published conference contribution

BT - AAAI Conference and Symposium Proceedings

PB - AAAI Press

T2 - The 38th Annual AAAI Conference on Artificial Intelligence

Y2 - 20 February 2024 through 27 February 2024

ER -

Pure-Past Action Masking

Abstract

Conference

Access to Document

Fingerprint

Cite this