ref: remove opponent policy from strategy evolution#440
Merged
Conversation
Split game budget between random and minimax opponents internally, eliminating the need to pass opponent policy as a parameter. This simplifies the strategy scoring interface and reduces parameter complexity in different function.
Yagth
reviewed
Apr 28, 2026
Yagth
reviewed
Apr 28, 2026
Yagth
reviewed
Apr 28, 2026
Yagth
requested changes
Apr 28, 2026
Yagth
approved these changes
Apr 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR refactors the strategy evolution process to address a critical limitation in how strategies are currently evaluated and evolved. Previously, the strategy evolution would take one adversary at a time and find strategies optimized to win against that particular opponent. While this approach works in isolated environments where the adversary is known beforehand, it results in strategies that are not versatile enough to perform well against different types of opponents. To address this issue, this PR removes the explicit opponent policy parameter from the strategy evolution process and instead implements game-splitting function which automatically divides the total number of games between two different opponent random player and minimax player. This mixed-opponent evaluation approach ensures that evolved strategies are versatile enough to both beat weaker random opponents while also surviving and drawing stronger minimax opponents. This change makes the strategy scoring process more maintainable while producing more robust and generalizable strategies that can handle diverse opponent types. All related functions have been updated to work with this new parameter structure, and test cases have been adjusted accordingly. I use PeTTa v1.0.0 to test the functionality and highly recommend using this version for testing this new functionality.
Motivation and Context
How Has This Been Tested?
Types of changes
Checklist: