Skip to content

Unify training logic inside agent #135

Merged
muralx merged 7 commits intojonbinney:mainfrom
muralx:unify_logic
Mar 28, 2025
Merged

Unify training logic inside agent #135
muralx merged 7 commits intojonbinney:mainfrom
muralx:unify_logic

Conversation

@muralx
Copy link
Collaborator

@muralx muralx commented Mar 28, 2025

One more refactoring :)

  • All training logic is now inside the trainable agent and can be tweak independent of the environment
  • Added hook methods to the Trainable agent, and calls from standard arena to allow arena to be reused for training purposes
  • Removed most logic from train.py except from model saving and rendering of rewards/losses (done with arena plugins: Thanks Cucu for those plug ins)
  • Using 1, -1 for win/loss. In theory we can add something in the agent to scale numbers if needed
  • I changed the step_rewards to user board_size^2, which seem to work better

Side note
After the refactoring I was able to train DExp(Diego Experimental) Agent on a 5x5b 0w which has a win ration of .95. In theory, you should be able to reproduce it due to (-i 42).

Training cmd: python3 deep_quoridor/src/train.py -N 5 -W 0 -d 0.9999 -e 1000 -i 42 -s
Play cmd: python3 deep_quoridor/src/play.py -N 5 -W 0 -t 100 -r matchresults -p simple dexppretrained

@muralx
Copy link
Collaborator Author

muralx commented Mar 28, 2025

@alejandromarcu I removed the renderers as parameter of the play and replay method. Renderers are already part of the arena, so not sure you moved them as args of the method. Please let me know what I missed.

@muralx muralx marked this pull request as ready for review March 28, 2025 16:31
Copy link
Collaborator

@alejandromarcu alejandromarcu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, great use of plug ins!

"""Returns True if the agent is a learning agent, False otherwise."""
return False

def start_game(self, game, player_id):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, actually I was working on an agent and noticed that the agent didn't know if it's player 1 or 2, so it doesn't explicitly know where to go. I was adding it in the reset method but this works as well. I think in the future we may want to standarize all this to use Player.ONE from quoridor.py, but we can do that separately

@muralx muralx merged commit 3458e19 into jonbinney:main Mar 28, 2025
1 check passed
@muralx muralx deleted the unify_logic branch March 28, 2025 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants