Skip to content

[Bug Report] TerminateIllegalWrapper breaks agent selection #1176

@dm-ackerman

Description

@dm-ackerman

Describe the bug

When running an env using TerminateIllegalWrapper and not using the action mask, the agent selection becomes corrupted when an illegal move is made.

Here is an example from tictactoe (code below). Notice in the first game that player 1 starts (as expected) and it alternates between players 1 and 2 (as expected) until player 1 makes an illegal move, which is caught by the wrapper.
However, in the second game, player 1 makes two moves in a row. That should not happen. Also note that the illegal move flagged is not actually illegal per the game rules.
This behaviour has been reported for other games.

New game
--------
calling reset(seed=42)
player_1 is making action: 1 current board:[0, 0, 0, 0, 0, 0, 0, 0, 0]
player_2 is making action: 5 current board:[0, 1, 0, 0, 0, 0, 0, 0, 0]
player_1 is making action: 7 current board:[0, 1, 0, 0, 0, 2, 0, 0, 0]
player_2 is making action: 4 current board:[0, 1, 0, 0, 0, 2, 0, 1, 0]
player_1 is making action: 1 current board:[0, 1, 0, 0, 2, 2, 0, 1, 0]
[WARNING]: Illegal move made, game terminating with current player losing. 
obs['action_mask'] contains a mask of all legal moves that can be chosen.

New game
--------
calling reset(seed=42)
player_1 is making action: 5 current board:[0, 0, 0, 0, 0, 0, 0, 0, 0]
player_1 is making action: 0 current board:[0, 0, 0, 0, 0, 1, 0, 0, 0]
[WARNING]: Illegal move made, game terminating with current player losing. 
obs['action_mask'] contains a mask of all legal moves that can be chosen.

Code example

from pettingzoo.classic import tictactoe_v3

env = tictactoe_v3.env()

def do_game(seed):
    print("\nNew game")
    print("--------")
    
    print(f"calling reset(seed={seed})")
    env.reset(seed)
    for agent in env.agent_iter():
        observation, reward, termination, truncation, info = env.last()

        if termination or truncation:
            env.step(None)
        else:
            mask = observation["action_mask"]
            # this is where you would insert your policy
            action = env.action_space(agent).sample()  # **no action_mask applied**
            print(f"{env.agent_selection} is making action: {action} current board:{env.board}")
            env.step(action)

do_game(42)
do_game(42)

System info

>>> import sys; sys.version
'3.9.12 (main, Apr  5 2022, 06:56:58) \n[GCC 7.5.0]'
>>> pettingzoo.__version__
'1.24.3'

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions