Unify training logic inside agent by muralx · Pull Request #135 · jonbinney/deep_rabbit_hole

muralx · 2025-03-28T16:07:59Z

One more refactoring :)

All training logic is now inside the trainable agent and can be tweak independent of the environment
Added hook methods to the Trainable agent, and calls from standard arena to allow arena to be reused for training purposes
Removed most logic from train.py except from model saving and rendering of rewards/losses (done with arena plugins: Thanks Cucu for those plug ins)
Using 1, -1 for win/loss. In theory we can add something in the agent to scale numbers if needed
I changed the step_rewards to user board_size^2, which seem to work better

Side note
After the refactoring I was able to train DExp(Diego Experimental) Agent on a 5x5b 0w which has a win ration of .95. In theory, you should be able to reproduce it due to (-i 42).

Training cmd: python3 deep_quoridor/src/train.py -N 5 -W 0 -d 0.9999 -e 1000 -i 42 -s
Play cmd: python3 deep_quoridor/src/play.py -N 5 -W 0 -t 100 -r matchresults -p simple dexppretrained

…he board as divisor for step rewards

muralx · 2025-03-28T16:31:11Z

@alejandromarcu I removed the renderers as parameter of the play and replay method. Renderers are already part of the arena, so not sure you moved them as args of the method. Please let me know what I missed.

alejandromarcu

Very nice, great use of plug ins!

deep_quoridor/src/arena.py

alejandromarcu · 2025-03-28T17:11:57Z

deep_quoridor/src/agents/core/agent.py

+        """Returns True if the agent is a learning agent, False otherwise."""
+        return False
+
+    def start_game(self, game, player_id):


Nice, actually I was working on an agent and noticed that the agent didn't know if it's player 1 or 2, so it doesn't explicitly know where to go. I was adding it in the reset method but this works as well. I think in the future we may want to standarize all this to use Player.ONE from quoridor.py, but we can do that separately

muralx added 7 commits March 28, 2025 14:56

Initial refatoring:

8273c6c

allow seed, remove big reward from ennvironment, user the square of t…

025d300

…he board as divisor for step rewards

corrections to make it work, 5x5 winning all games

52ccf31

fix naming

8025f66

fix swap players

45a8bca

fix seed that got repeated during reactoring

69cab7b

fixes

209ed4c

muralx requested review from adamantivm and alejandromarcu March 28, 2025 16:29

muralx marked this pull request as ready for review March 28, 2025 16:31

alejandromarcu approved these changes Mar 28, 2025

View reviewed changes

muralx merged commit 3458e19 into jonbinney:main Mar 28, 2025
1 check passed

muralx deleted the unify_logic branch March 28, 2025 18:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify training logic inside agent #135

Unify training logic inside agent #135
muralx merged 7 commits intojonbinney:mainfrom
muralx:unify_logic

muralx commented Mar 28, 2025 •

edited

Loading

Uh oh!

muralx commented Mar 28, 2025

Uh oh!

alejandromarcu left a comment

Uh oh!

Uh oh!

alejandromarcu Mar 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

muralx commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muralx commented Mar 28, 2025

Uh oh!

alejandromarcu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alejandromarcu Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

muralx commented Mar 28, 2025 •

edited

Loading