Open
Description
Proposal
Code Overview
-
Q-Learning Agent (
QLearningAgent
class):- Implements a Q-learning algorithm with epsilon-greedy exploration
- Maintains a Q-table to learn state-action values
- Features include:
- Epsilon decay for reducing exploration over time
- Handling of action masks (valid actions)
- Learning rate and discount factor configuration
-
Training Function (
train_taxi()
):- Trains the agent for a specified number of episodes
- Uses a progress bar to track training
- Tracks and stores episode rewards
- Periodically reports average reward and current epsilon value
-
Testing Function (
test_agent()
):- Evaluates the trained agent in the Taxi environment
- Renders the environment for visual demonstration
- Prints total reward for each episode
Environment Details
The Taxi-v3 environment is a grid-world problem where an agent must:
- Pick up a passenger at one of four locations
- Drop the passenger at another specified location
- Navigate efficiently while avoiding invalid moves
Motivation
Training agents improvement and I can expand it to the other agents, such as Cliff Walking Agent
Pitch
No response
Alternatives
No response
Additional context
No response
Checklist
- I have checked that there is no similar issue in the repo