Description
Apologies in advance if this is deliberate behavior, but it seemed odd to me that TicTacToeEnv
allows illegal moves. For example:
env = TicTacToeEnv()
env(1)
env(1)
allows o and x to go in the same square (top-left). Note that a call to:
is_terminated(env)
will now error with the message:
ERROR: KeyError: key TicTacToeEnv([0 1 1; 1 1 1; 1 1 1;;; 1 0 0; 0 0 0; 0 0 0;;; 1 0 0; 0 0 0; 0 0 0], ReinforcementLearningEnvironments.Cross()) not found
An implication of this behavior is:
env = TicTacToeEnv()
env(1)
env(2)
env(1)
will now place the board in a state where there is one o and one x, but it is o's turn.
I'm brand new to this package, but it seems to me that the fix should just be to add a check to the top of the function:
function (env::TicTacToeEnv)(action::CartesianIndex{2})
env.board[action, 1] = false
env.board[action, Base.to_index(env, env.player)] = true
env.player = !env.player
end
maybe something like !env.board[action,1] && error("some message")
, since if that square is already false, it means a move has already been played there. Or perhaps there is some other desired return when an illegal move is played. I'm not very familiar with this package (or topic) yet.
Cheers,
Colin