Unify training logic inside agent #135
Merged
muralx merged 7 commits intojonbinney:mainfrom Mar 28, 2025
Merged
Conversation
…he board as divisor for step rewards
Collaborator
Author
|
@alejandromarcu I removed the renderers as parameter of the play and replay method. Renderers are already part of the arena, so not sure you moved them as args of the method. Please let me know what I missed. |
alejandromarcu
approved these changes
Mar 28, 2025
Collaborator
alejandromarcu
left a comment
There was a problem hiding this comment.
Very nice, great use of plug ins!
| """Returns True if the agent is a learning agent, False otherwise.""" | ||
| return False | ||
|
|
||
| def start_game(self, game, player_id): |
Collaborator
There was a problem hiding this comment.
Nice, actually I was working on an agent and noticed that the agent didn't know if it's player 1 or 2, so it doesn't explicitly know where to go. I was adding it in the reset method but this works as well. I think in the future we may want to standarize all this to use Player.ONE from quoridor.py, but we can do that separately
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
One more refactoring :)
Side note
After the refactoring I was able to train DExp(Diego Experimental) Agent on a 5x5b 0w which has a win ration of .95. In theory, you should be able to reproduce it due to (-i 42).
Training cmd: python3 deep_quoridor/src/train.py -N 5 -W 0 -d 0.9999 -e 1000 -i 42 -s
Play cmd: python3 deep_quoridor/src/play.py -N 5 -W 0 -t 100 -r matchresults -p simple dexppretrained