Description
Describe the bug
When running an env using TerminateIllegalWrapper and not using the action mask, the agent selection becomes corrupted when an illegal move is made.
Here is an example from tictactoe (code below). Notice in the first game that player 1 starts (as expected) and it alternates between players 1 and 2 (as expected) until player 1 makes an illegal move, which is caught by the wrapper.
However, in the second game, player 1 makes two moves in a row. That should not happen. Also note that the illegal move flagged is not actually illegal per the game rules.
This behaviour has been reported for other games.
New game
--------
calling reset(seed=42)
player_1 is making action: 1 current board:[0, 0, 0, 0, 0, 0, 0, 0, 0]
player_2 is making action: 5 current board:[0, 1, 0, 0, 0, 0, 0, 0, 0]
player_1 is making action: 7 current board:[0, 1, 0, 0, 0, 2, 0, 0, 0]
player_2 is making action: 4 current board:[0, 1, 0, 0, 0, 2, 0, 1, 0]
player_1 is making action: 1 current board:[0, 1, 0, 0, 2, 2, 0, 1, 0]
[WARNING]: Illegal move made, game terminating with current player losing.
obs['action_mask'] contains a mask of all legal moves that can be chosen.
New game
--------
calling reset(seed=42)
player_1 is making action: 5 current board:[0, 0, 0, 0, 0, 0, 0, 0, 0]
player_1 is making action: 0 current board:[0, 0, 0, 0, 0, 1, 0, 0, 0]
[WARNING]: Illegal move made, game terminating with current player losing.
obs['action_mask'] contains a mask of all legal moves that can be chosen.
Code example
from pettingzoo.classic import tictactoe_v3
env = tictactoe_v3.env()
def do_game(seed):
print("\nNew game")
print("--------")
print(f"calling reset(seed={seed})")
env.reset(seed)
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()
if termination or truncation:
env.step(None)
else:
mask = observation["action_mask"]
# this is where you would insert your policy
action = env.action_space(agent).sample() # **no action_mask applied**
print(f"{env.agent_selection} is making action: {action} current board:{env.board}")
env.step(action)
do_game(42)
do_game(42)
System info
>>> import sys; sys.version
'3.9.12 (main, Apr 5 2022, 06:56:58) \n[GCC 7.5.0]'
>>> pettingzoo.__version__
'1.24.3'
Additional context
No response
Checklist
- I have checked that there is no similar issue in the repo