-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Describe the bug
The rainy taxi has an 80% change to move as intended, a 10% chance to go left of course, 10% chance to go right.
When going east or west, the left/right options are only calculated if the east/west move is possible. If there is a wall in the way of the east/west move, it will never go left or right (even if those spots are open)
However, when going north or south, the left/right options are always calculated even if the north/south move is not possible. If there is a wall in the way of the north/south move, it will go left or right with 10% chance (if those spots are open).
These should be consistent. I don't know which behavior is "correct" and looking at the original paper, It's not clear there. I'm leaning towards the east/west behavior: if it can't move to the desired spot, no movement of any kind happens.
Example:
Consider this case, where the taxi is in the blue spot, it cannot move south or west:
However, a south move gives a chance of moving to a neighboring spot where as a west move does not.
The transition options from the code are:
If moving SOUTH, outcomes are
With probability 0.8 move ahead to (4, 3) (no change)
With probability 0.1 move left to (4, 3) (no change)
With probability 0.1 move right to (4, 4) <---- Can move even though main move is forbidden
If moving NORTH, outcomes are
With probability 0.8 move ahead to (3, 3)
With probability 0.1 move left to (4, 3) (no change)
With probability 0.1 move right to (4, 4)
If moving EAST, outcomes are
With probability 0.8 move ahead to (4, 4)
With probability 0.1 move left to (4, 3) (no change)
With probability 0.1 move right to (4, 3) (no change)
If moving WEST, outcomes are
With probability 0.8 move ahead to (4, 3) (no change)
With probability 0.1 move left to (4, 3) (no change)
With probability 0.1 move right to (4, 3) (no change) <---- Can NOT move because main move is forbidden
Code example
from gymnasium.envs.toy_text.taxi import TaxiEnv
env = TaxiEnv(is_rainy=True)
taxi_loc = (4, 3) # blue spot
pax_loc = 0
pax_dest = 1
state = env.encode(taxi_loc[0], taxi_loc[1], pax_loc, pax_dest)
actions = {'SOUTH': 0, 'NORTH': 1, 'EAST': 2, 'WEST': 3}
directions = ["ahead", "left ", "right"]
for action_name, action in actions.items():
print(f"If moving {action_name}, outcomes are")
for direction, (prob, new_state, _, _) in zip(directions, env.P[state][action]):
new_taxi_x, new_taxi_y, _, _ = env.decode(new_state)
new_taxi_loc = (new_taxi_x, new_taxi_y)
note = "(no change)" if new_taxi_loc == taxi_loc else ""
print(f" With probability {prob} move {direction} to {new_taxi_loc} {note}")System info
installed from source
gymnasium.version = '1.2.3'
Additional context
I can fix this, I just want input on what people consider the "correct" behavior.
Checklist
- I have checked that there is no similar issue in the repo