[RLlib] Working implementation of pettingzoo_shared_value_function.py#56309
[RLlib] Working implementation of pettingzoo_shared_value_function.py#56309MatthewCWeston wants to merge 27 commits intoray-project:masterfrom
Conversation
Signed-off-by: Matthew <mweston3@illinois.edu>
Signed-off-by: Matthew <mweston3@illinois.edu>
There was a problem hiding this comment.
Code Review
This pull request introduces a working implementation of pettingzoo_shared_value_function.py, providing a valuable example of a multi-agent setup with a shared critic using a MAPPO-style algorithm. The implementation is well-structured across several new files, adhering to RLlib's new API stack conventions. My review focuses on improving maintainability, consistency with best practices, and documentation accuracy. Key suggestions include inheriting MAPPOConfig from PPOConfig to reduce redundancy, using the logging module for error handling instead of print statements, and correcting an inaccurate docstring.
Signed-off-by: Matthew <mweston3@illinois.edu>
Signed-off-by: Matthew <mweston3@illinois.edu>
Signed-off-by: Matthew <mweston3@illinois.edu>
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
Ran a manual merge on BUILD, since it was updated elsewhere and this PR un-comments the relevant test. Checked everything again and found it working as expected. |
Signed-off-by: Matthew <mweston3@illinois.edu>
Signed-off-by: Matthew <mweston3@illinois.edu>
rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Matthew <mweston3@illinois.edu>
rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py
Show resolved
Hide resolved
|
This will be very helpful. A working PettingZoo shared value function example (MAPPO on the new RLlib API) is exactly what many of us need to implement centralized critics (as a guide). |
|
@pseudo-rnd-thoughts Just checking in - is there anything further that I need to do to prepare this PR for a merge? It's had the "go" label for a while. |
cbaae89 to
e7b02c4
Compare
rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py
Show resolved
Hide resolved
Signed-off-by: Matthew <mweston3@illinois.edu>
Signed-off-by: Matthew <mweston3@illinois.edu>
|
Come to think of it, given the degree to which MAPPO has been developed here, it could serve as a general use algorithm, akin to APPO. It's a pretty ubiquitous MARL benchmark algorithm, after all. I could then solve the issue with file count by porting the algorithm implementation over from Would this be a desirable change? |
…ton/rllib_current_attention into pettingzoo_shared_vf
rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py
Show resolved
Hide resolved
| return MAPPOConfig() | ||
|
|
||
|
|
||
| class MAPPOConfig(AlgorithmConfig): # AlgorithmConfig -> PPOConfig -> MAPPO |
There was a problem hiding this comment.
Is it possible to inherit MAPPOConfig from PPOConfig? I see you have made a comment but not inheriting it from PPOconfig. I see there is a significant overlap in the MAPPO and PPO implementation (60%-70% according to co-pilot).
There was a problem hiding this comment.
I tried it out, but, unfortunately, in practice, every method implemented in MAPPOConfig wants to avoid things done in its PPOConfig equivalent, so it doesn't end up saving any space. The natural inheritance hierarchy would go the other way around (which can be done, if we want to add MAPPO's implementation to rllib/algorithms rather than rllib/examples).
Signed-off-by: Matthew <mweston3@illinois.edu>
|
@HassamSheikh Do the changes made above seem about right? Let me know if there are any other changes I should make. |
rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py
Show resolved
Hide resolved
|
@HassamSheikh No worries if things are busy, but please let me know if there's anything needed on my end to facilitate a merge. |
Signed-off-by: Matthew <mweston3@illinois.edu>
…ton/rllib_current_attention into pettingzoo_shared_vf
Signed-off-by: Matthew <mweston3@illinois.edu>
Signed-off-by: Matthew <mweston3@illinois.edu>
Why are these changes needed?
At present,
pettingzoo_shared_value_function.pyis a placeholder file that returns an error message when run. This PR implements it in a way that permits direct comparison between itself,pettingzoo_parameter_sharing.py, andpettingzoo_independent_learning.py.My motivation for this is that, when I first saw the placeholder file name, I hoped to use it as a reference when working on my current project, which requires a shared critic. Having completed that part of my project, I cleaned up and generalized the code so that I could share it with other RLlib developers.
In brief, this PR implements the placeholder example file by including an implementation of MAPPO in the
examplessection, in a similar vein to how other examples implement VPG, AlphaStar-style league-based training, and MobileNet. This provides a working example of how a shared value function, used in a wide variety of multi agent reinforcement learning papers, can be implemented in RLlib using the new API stack.Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.Note
Introduces a Torch MAPPO with a shared critic and implements the PettingZoo Waterworld shared-value-function example, activating its CI test.
examples/multi_agent/pettingzoo_shared_value_function.pyusing a shared critic (MAPPO) for PettingZoo Waterworld; wiresMultiRLModuleSpecwith per-agent policies and ashared_critic.examples/multi_agent/shared_encoder_cartpole.py(expected reward text).examples/algorithms/mappo/:mappo.py+MAPPOConfig(training params, defaults, validators).mappo_learner.pybase learner (entropy/KL schedulers, GAE connector hookup).torch/mappo_torch_learner.py(policy loss, critic loss, KL/entropy metrics).default_mappo_rl_module.py+torch/default_mappo_torch_rl_module.py.shared_critic_rl_module.py+torch/shared_critic_torch_rl_module.pywithshared_critic_catalog.py.mappo_catalog.py; GAE connector:connectors/general_advantage_estimation.py.py_testforexamples/multi_agent/pettingzoo_shared_value_functioninrllib/BUILD.bazel.Written by Cursor Bugbot for commit 2ec32d4. This will update automatically on new commits. Configure here.