TY - JOUR
T1 - Q-Table compression for reinforcement learning
AU - Amado, Leonardo
AU - Meneguzzi, Felipe
N1 - This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de NivelSuperior – Brasil (CAPES) – Finance Code 001
PY - 2018/12
Y1 - 2018/12
N2 - Reinforcement learning (RL) algorithms are often used to compute agents capable of acting in environments without prior knowledge of the environment dynamics. However, these algorithms struggle to converge in environments with large branching factors and their large resulting state-spaces. In this work, we develop an approach to compress the number of entries in a Q-value table using a deep auto-encoder. We develop a set of techniques to mitigate the large branching factor problem. We present the application of such techniques in the scenario of a real-time strategy (RTS) game, where both state space and branching factor are a problem. We empirically evaluate an implementation of the technique to control agents in an RTS game scenario where classical RL fails and provide a number of possible avenues of further work on this problem.
AB - Reinforcement learning (RL) algorithms are often used to compute agents capable of acting in environments without prior knowledge of the environment dynamics. However, these algorithms struggle to converge in environments with large branching factors and their large resulting state-spaces. In this work, we develop an approach to compress the number of entries in a Q-value table using a deep auto-encoder. We develop a set of techniques to mitigate the large branching factor problem. We present the application of such techniques in the scenario of a real-time strategy (RTS) game, where both state space and branching factor are a problem. We empirically evaluate an implementation of the technique to control agents in an RTS game scenario where classical RL fails and provide a number of possible avenues of further work on this problem.
UR - https://doi.org/10.1017/S0269888918000280
U2 - 10.1017/S0269888918000280
DO - 10.1017/S0269888918000280
M3 - Article
VL - 33
SP - 1
EP - 21
JO - The Knowledge Engineering Review
JF - The Knowledge Engineering Review
M1 - e22
ER -