Skip to content

[RLlib] Working implementation of pettingzoo_shared_value_function.py#56309

Open
MatthewCWeston wants to merge 27 commits intoray-project:masterfrom
MatthewCWeston:pettingzoo_shared_vf
Open

[RLlib] Working implementation of pettingzoo_shared_value_function.py#56309
MatthewCWeston wants to merge 27 commits intoray-project:masterfrom
MatthewCWeston:pettingzoo_shared_vf

Conversation

@MatthewCWeston
Copy link
Contributor

@MatthewCWeston MatthewCWeston commented Sep 7, 2025

Why are these changes needed?

At present, pettingzoo_shared_value_function.py is a placeholder file that returns an error message when run. This PR implements it in a way that permits direct comparison between itself, pettingzoo_parameter_sharing.py, and pettingzoo_independent_learning.py.

My motivation for this is that, when I first saw the placeholder file name, I hoped to use it as a reference when working on my current project, which requires a shared critic. Having completed that part of my project, I cleaned up and generalized the code so that I could share it with other RLlib developers.

In brief, this PR implements the placeholder example file by including an implementation of MAPPO in the examples section, in a similar vein to how other examples implement VPG, AlphaStar-style league-based training, and MobileNet. This provides a working example of how a shared value function, used in a wide variety of multi agent reinforcement learning papers, can be implemented in RLlib using the new API stack.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Release tests

Note

Introduces a Torch MAPPO with a shared critic and implements the PettingZoo Waterworld shared-value-function example, activating its CI test.

  • Examples (Multi-agent):
    • Implement examples/multi_agent/pettingzoo_shared_value_function.py using a shared critic (MAPPO) for PettingZoo Waterworld; wires MultiRLModuleSpec with per-agent policies and a shared_critic.
    • Minor doc tweak in examples/multi_agent/shared_encoder_cartpole.py (expected reward text).
  • Algorithm Scaffolding (MAPPO, Torch):
    • Add MAPPO core under examples/algorithms/mappo/:
      • mappo.py + MAPPOConfig (training params, defaults, validators).
      • mappo_learner.py base learner (entropy/KL schedulers, GAE connector hookup).
      • Torch impl: torch/mappo_torch_learner.py (policy loss, critic loss, KL/entropy metrics).
      • Default policy module: default_mappo_rl_module.py + torch/default_mappo_torch_rl_module.py.
      • Shared critic: shared_critic_rl_module.py + torch/shared_critic_torch_rl_module.py with shared_critic_catalog.py.
      • Catalogs: mappo_catalog.py; GAE connector: connectors/general_advantage_estimation.py.
  • Build/Test:
    • Activate py_test for examples/multi_agent/pettingzoo_shared_value_function in rllib/BUILD.bazel.

Written by Cursor Bugbot for commit 2ec32d4. This will update automatically on new commits. Configure here.

Signed-off-by: Matthew <mweston3@illinois.edu>
Signed-off-by: Matthew <mweston3@illinois.edu>
@MatthewCWeston MatthewCWeston requested a review from a team as a code owner September 7, 2025 00:57
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a working implementation of pettingzoo_shared_value_function.py, providing a valuable example of a multi-agent setup with a shared critic using a MAPPO-style algorithm. The implementation is well-structured across several new files, adhering to RLlib's new API stack conventions. My review focuses on improving maintainability, consistency with best practices, and documentation accuracy. Key suggestions include inheriting MAPPOConfig from PPOConfig to reduce redundancy, using the logging module for error handling instead of print statements, and correcting an inaccurate docstring.

@ray-gardener ray-gardener bot added rllib RLlib related issues docs An issue or change related to documentation community-contribution Contributed by the community labels Sep 7, 2025
Signed-off-by: Matthew <mweston3@illinois.edu>
Signed-off-by: Matthew <mweston3@illinois.edu>
Signed-off-by: Matthew <mweston3@illinois.edu>
@MatthewCWeston MatthewCWeston changed the title Working implementation of pettingzoo_shared_value_function.py [RLlib] Working implementation of pettingzoo_shared_value_function.py Sep 10, 2025
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Sep 29, 2025
@MatthewCWeston
Copy link
Contributor Author

Ran a manual merge on BUILD, since it was updated elsewhere and this PR un-comments the relevant test. Checked everything again and found it working as expected.

cursor[bot]

This comment was marked as outdated.

Signed-off-by: Matthew <mweston3@illinois.edu>
@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Sep 29, 2025
@HassamSheikh HassamSheikh self-assigned this Oct 9, 2025
@pseudo-rnd-thoughts pseudo-rnd-thoughts added the go add ONLY when ready to merge, run all tests label Nov 10, 2025
Signed-off-by: Matthew <mweston3@illinois.edu>
Signed-off-by: Matthew <mweston3@illinois.edu>
@juanm4morales
Copy link

This will be very helpful. A working PettingZoo shared value function example (MAPPO on the new RLlib API) is exactly what many of us need to implement centralized critics (as a guide).

@MatthewCWeston
Copy link
Contributor Author

@pseudo-rnd-thoughts Just checking in - is there anything further that I need to do to prepare this PR for a merge? It's had the "go" label for a while.

@pseudo-rnd-thoughts pseudo-rnd-thoughts removed the go add ONLY when ready to merge, run all tests label Dec 11, 2025
Signed-off-by: Matthew <mweston3@illinois.edu>
@MatthewCWeston
Copy link
Contributor Author

Come to think of it, given the degree to which MAPPO has been developed here, it could serve as a general use algorithm, akin to APPO. It's a pretty ubiquitous MARL benchmark algorithm, after all. I could then solve the issue with file count by porting the algorithm implementation over from examples into algorithms, which would make pettingzoo_shared_value_function a single file, self-contained example script in line with the others.

Would this be a desirable change?

return MAPPOConfig()


class MAPPOConfig(AlgorithmConfig): # AlgorithmConfig -> PPOConfig -> MAPPO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to inherit MAPPOConfig from PPOConfig? I see you have made a comment but not inheriting it from PPOconfig. I see there is a significant overlap in the MAPPO and PPO implementation (60%-70% according to co-pilot).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it out, but, unfortunately, in practice, every method implemented in MAPPOConfig wants to avoid things done in its PPOConfig equivalent, so it doesn't end up saving any space. The natural inheritance hierarchy would go the other way around (which can be done, if we want to add MAPPO's implementation to rllib/algorithms rather than rllib/examples).

@MatthewCWeston
Copy link
Contributor Author

@HassamSheikh Do the changes made above seem about right? Let me know if there are any other changes I should make.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

@MatthewCWeston
Copy link
Contributor Author

@HassamSheikh No worries if things are busy, but please let me know if there's anything needed on my end to facilitate a merge.

Signed-off-by: Matthew <mweston3@illinois.edu>
Signed-off-by: Matthew <mweston3@illinois.edu>
Signed-off-by: Matthew <mweston3@illinois.edu>
Signed-off-by: Matthew <mweston3@illinois.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community docs An issue or change related to documentation rllib RLlib related issues unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants