[RLlib] Working implementation of pettingzoo_shared_value_function.py by MatthewCWeston · Pull Request #56309 · ray-project/ray

MatthewCWeston · 2025-09-07T00:57:25Z

Why are these changes needed?

At present, pettingzoo_shared_value_function.py is a placeholder file that returns an error message when run. This PR implements it in a way that permits direct comparison between itself, pettingzoo_parameter_sharing.py, and pettingzoo_independent_learning.py.

My motivation for this is that, when I first saw the placeholder file name, I hoped to use it as a reference when working on my current project, which requires a shared critic. Having completed that part of my project, I cleaned up and generalized the code so that I could share it with other RLlib developers.

In brief, this PR implements the placeholder example file by including an implementation of MAPPO in the examples section, in a similar vein to how other examples implement VPG, AlphaStar-style league-based training, and MobileNet. This provides a working example of how a shared value function, used in a wide variety of multi agent reinforcement learning papers, can be implemented in RLlib using the new API stack.

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Release tests

Note

Introduces a Torch MAPPO with a shared critic and implements the PettingZoo Waterworld shared-value-function example, activating its CI test.

Examples (Multi-agent):
- Implement examples/multi_agent/pettingzoo_shared_value_function.py using a shared critic (MAPPO) for PettingZoo Waterworld; wires MultiRLModuleSpec with per-agent policies and a shared_critic.
- Minor doc tweak in examples/multi_agent/shared_encoder_cartpole.py (expected reward text).
Algorithm Scaffolding (MAPPO, Torch):
- Add MAPPO core under examples/algorithms/mappo/:
  - mappo.py + MAPPOConfig (training params, defaults, validators).
  - mappo_learner.py base learner (entropy/KL schedulers, GAE connector hookup).
  - Torch impl: torch/mappo_torch_learner.py (policy loss, critic loss, KL/entropy metrics).
  - Default policy module: default_mappo_rl_module.py + torch/default_mappo_torch_rl_module.py.
  - Shared critic: shared_critic_rl_module.py + torch/shared_critic_torch_rl_module.py with shared_critic_catalog.py.
  - Catalogs: mappo_catalog.py; GAE connector: connectors/general_advantage_estimation.py.
Build/Test:
- Activate py_test for examples/multi_agent/pettingzoo_shared_value_function in rllib/BUILD.bazel.

^{Written by Cursor Bugbot for commit 2ec32d4. This will update automatically on new commits. Configure here.}

Signed-off-by: Matthew <mweston3@illinois.edu>

gemini-code-assist

Code Review

This pull request introduces a working implementation of pettingzoo_shared_value_function.py, providing a valuable example of a multi-agent setup with a shared critic using a MAPPO-style algorithm. The implementation is well-structured across several new files, adhering to RLlib's new API stack conventions. My review focuses on improving maintainability, consistency with best practices, and documentation accuracy. Key suggestions include inheriting MAPPOConfig from PPOConfig to reduce redundancy, using the logging module for error handling instead of print statements, and correcting an inaccurate docstring.

rllib/examples/algorithms/mappo/default_mappo_rl_module.py

rllib/examples/algorithms/mappo/mappo.py

rllib/examples/algorithms/mappo/shared_critic_rl_module.py

rllib/examples/algorithms/mappo/torch/mappo_torch_learner.py

Signed-off-by: Matthew <mweston3@illinois.edu>

github-actions · 2025-09-29T00:37:47Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

MatthewCWeston · 2025-09-29T04:51:51Z

Ran a manual merge on BUILD, since it was updated elsewhere and this PR un-comments the relevant test. Checked everything again and found it working as expected.

Signed-off-by: Matthew <mweston3@illinois.edu>

rllib/examples/algorithms/mappo/mappo.py

Signed-off-by: Matthew <mweston3@illinois.edu>

rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py

Signed-off-by: Matthew <mweston3@illinois.edu>

rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py

juanm4morales · 2025-11-11T09:32:26Z

This will be very helpful. A working PettingZoo shared value function example (MAPPO on the new RLlib API) is exactly what many of us need to implement centralized critics (as a guide).

MatthewCWeston · 2025-12-10T03:23:44Z

@pseudo-rnd-thoughts Just checking in - is there anything further that I need to do to prepare this PR for a merge? It's had the "go" label for a while.

rllib/examples/multi_agent/pettingzoo_shared_value_function.py

rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py

Signed-off-by: Matthew <mweston3@illinois.edu>

rllib/examples/algorithms/mappo/torch/mappo_torch_learner.py

Signed-off-by: Matthew <mweston3@illinois.edu>

MatthewCWeston · 2026-01-11T23:32:18Z

Come to think of it, given the degree to which MAPPO has been developed here, it could serve as a general use algorithm, akin to APPO. It's a pretty ubiquitous MARL benchmark algorithm, after all. I could then solve the issue with file count by porting the algorithm implementation over from examples into algorithms, which would make pettingzoo_shared_value_function a single file, self-contained example script in line with the others.

Would this be a desirable change?

Signed-off-by: Matthew <mweston3@illinois.edu>

…ton/rllib_current_attention into pettingzoo_shared_vf

rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py

rllib/examples/algorithms/mappo/shared_critic_catalog.py

HassamSheikh · 2026-01-13T22:51:30Z

rllib/examples/algorithms/mappo/mappo.py

+        return MAPPOConfig()
+
+
+class MAPPOConfig(AlgorithmConfig):  # AlgorithmConfig -> PPOConfig -> MAPPO


Is it possible to inherit MAPPOConfig from PPOConfig? I see you have made a comment but not inheriting it from PPOconfig. I see there is a significant overlap in the MAPPO and PPO implementation (60%-70% according to co-pilot).

I tried it out, but, unfortunately, in practice, every method implemented in MAPPOConfig wants to avoid things done in its PPOConfig equivalent, so it doesn't end up saving any space. The natural inheritance hierarchy would go the other way around (which can be done, if we want to add MAPPO's implementation to rllib/algorithms rather than rllib/examples).

rllib/examples/algorithms/mappo/mappo.py

rllib/examples/algorithms/mappo/mappo_catalog.py

Signed-off-by: Matthew <mweston3@illinois.edu>

MatthewCWeston · 2026-01-24T13:52:19Z

@HassamSheikh Do the changes made above seem about right? Let me know if there are any other changes I should make.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

rllib/examples/multi_agent/pettingzoo_shared_value_function.py

rllib/examples/algorithms/mappo/mappo.py

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

rllib/examples/algorithms/mappo/mappo.py

rllib/examples/algorithms/mappo/torch/mappo_torch_learner.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

rllib/examples/algorithms/mappo/mappo.py

rllib/examples/algorithms/mappo/torch/default_mappo_torch_rl_module.py

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

rllib/examples/algorithms/mappo/torch/mappo_torch_learner.py

MatthewCWeston · 2026-02-07T05:33:32Z

@HassamSheikh No worries if things are busy, but please let me know if there's anything needed on my end to facilitate a merge.

Signed-off-by: Matthew <mweston3@illinois.edu>

…ton/rllib_current_attention into pettingzoo_shared_vf

Signed-off-by: Matthew <mweston3@illinois.edu>

MatthewCWeston added 2 commits September 5, 2025 17:27

Initial commit of working code. Will lint later and then submit PR.

5071568

Signed-off-by: Matthew <mweston3@illinois.edu>

Linted and cleaned up the code.

1c842c3

Signed-off-by: Matthew <mweston3@illinois.edu>

MatthewCWeston requested a review from a team as a code owner September 7, 2025 00:57

gemini-code-assist bot reviewed Sep 7, 2025

View reviewed changes

ray-gardener bot added rllib RLlib related issues docs An issue or change related to documentation community-contribution Contributed by the community labels Sep 7, 2025

MatthewCWeston added 3 commits September 6, 2025 20:44

Minor formatting/debugging fixes to PR submission.

c002277

Signed-off-by: Matthew <mweston3@illinois.edu>

Fixed an inheritance issue and added global observation handling.

55e8989

Signed-off-by: Matthew <mweston3@illinois.edu>

Tabs-->spaces in BUILD to satisfy tests.

5756bf9

Signed-off-by: Matthew <mweston3@illinois.edu>

MatthewCWeston changed the title ~~Working implementation of pettingzoo_shared_value_function.py~~ [RLlib] Working implementation of pettingzoo_shared_value_function.py Sep 10, 2025

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Sep 29, 2025

Merge remote-tracking branch 'upstream/master' into pettingzoo_shared_vf

dc8fdc9

This comment was marked as outdated.

Sign in to view

Cleaned up two lines of code, one of which had a bug in an edge case.

2ec32d4

Signed-off-by: Matthew <mweston3@illinois.edu>

github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Sep 29, 2025

HassamSheikh self-assigned this Oct 9, 2025

Merge branch 'master' into pettingzoo_shared_vf

dc74d6f

pseudo-rnd-thoughts added the go add ONLY when ready to merge, run all tests label Nov 10, 2025

cursor bot reviewed Nov 10, 2025

View reviewed changes

rllib/examples/algorithms/mappo/mappo.py Outdated Show resolved Hide resolved

Re-linted and cleaned up a function signature.

bfa6c08

Signed-off-by: Matthew <mweston3@illinois.edu>

cursor bot reviewed Nov 11, 2025

View reviewed changes

rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py Outdated Show resolved Hide resolved

Deleted an unused line of code.

060e2c3

Signed-off-by: Matthew <mweston3@illinois.edu>

cursor bot reviewed Nov 11, 2025

View reviewed changes

rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py Show resolved Hide resolved

pseudo-rnd-thoughts removed the go add ONLY when ready to merge, run all tests label Dec 11, 2025

cursor bot reviewed Jan 11, 2026

View reviewed changes

rllib/examples/multi_agent/pettingzoo_shared_value_function.py Show resolved Hide resolved

MatthewCWeston force-pushed the pettingzoo_shared_vf branch from cbaae89 to e7b02c4 Compare January 11, 2026 14:19

cursor bot reviewed Jan 11, 2026

View reviewed changes

rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py Show resolved Hide resolved

Fixed actor module batching and loss masking.

fd7fa91

Signed-off-by: Matthew <mweston3@illinois.edu>

cursor bot reviewed Jan 11, 2026

View reviewed changes

rllib/examples/algorithms/mappo/torch/mappo_torch_learner.py Show resolved Hide resolved

MatthewCWeston and others added 2 commits January 11, 2026 09:24

Patched critic loss pooling.

5791729

Signed-off-by: Matthew <mweston3@illinois.edu>

Merge branch 'master' into pettingzoo_shared_vf

8a69229

MatthewCWeston added 2 commits January 11, 2026 17:40

Linted.

3f2f51b

Signed-off-by: Matthew <mweston3@illinois.edu>

Merge branch 'pettingzoo_shared_vf' of https://github.com/MatthewCWes…

f645709

…ton/rllib_current_attention into pettingzoo_shared_vf

cursor bot reviewed Jan 11, 2026

View reviewed changes

rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py Show resolved Hide resolved

rllib/examples/algorithms/mappo/shared_critic_catalog.py Show resolved Hide resolved

HassamSheikh reviewed Jan 13, 2026

View reviewed changes

rllib/examples/algorithms/mappo/mappo.py Outdated Show resolved Hide resolved

HassamSheikh reviewed Jan 13, 2026

View reviewed changes

rllib/examples/algorithms/mappo/mappo_catalog.py Outdated Show resolved Hide resolved

MatthewCWeston and others added 3 commits January 19, 2026 10:11

Cleaned up code and improved inheritance.

30f99db

Signed-off-by: Matthew <mweston3@illinois.edu>

Merge branch 'master' into pettingzoo_shared_vf

fca0c28

Merge branch 'master' into pettingzoo_shared_vf

14d48dd

Merge branch 'master' into pettingzoo_shared_vf

1932cbd

cursor bot reviewed Feb 6, 2026

View reviewed changes

rllib/examples/multi_agent/pettingzoo_shared_value_function.py Show resolved Hide resolved

rllib/examples/algorithms/mappo/mappo.py Outdated Show resolved Hide resolved

cursor bot reviewed Feb 6, 2026

View reviewed changes

rllib/examples/algorithms/mappo/mappo.py Outdated Show resolved Hide resolved

rllib/examples/algorithms/mappo/torch/mappo_torch_learner.py Show resolved Hide resolved

cursor bot reviewed Feb 6, 2026

View reviewed changes

rllib/examples/algorithms/mappo/connectors/general_advantage_estimation.py Show resolved Hide resolved

cursor bot reviewed Feb 6, 2026

View reviewed changes

rllib/examples/algorithms/mappo/mappo.py Show resolved Hide resolved

rllib/examples/algorithms/mappo/torch/default_mappo_torch_rl_module.py Show resolved Hide resolved

cursor bot reviewed Feb 6, 2026

View reviewed changes

rllib/examples/algorithms/mappo/torch/mappo_torch_learner.py Show resolved Hide resolved

rllib/examples/algorithms/mappo/torch/mappo_torch_learner.py Show resolved Hide resolved

MatthewCWeston added 5 commits February 9, 2026 00:41

Minor style and format patches.

69e6f96

Signed-off-by: Matthew <mweston3@illinois.edu>

Merge branch 'pettingzoo_shared_vf' of https://github.com/MatthewCWes…

72843d3

…ton/rllib_current_attention into pettingzoo_shared_vf

Removed redundant declarations.

9cbee83

Signed-off-by: Matthew <mweston3@illinois.edu>

lint

fcbf57e

Signed-off-by: Matthew <mweston3@illinois.edu>

Deleted a leftover line of code.

a488e33

Signed-off-by: Matthew <mweston3@illinois.edu>

		return MAPPOConfig()


		class MAPPOConfig(AlgorithmConfig): # AlgorithmConfig -> PPOConfig -> MAPPO

Conversation

MatthewCWeston commented Sep 7, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Checks

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 29, 2025

Uh oh!

MatthewCWeston commented Sep 29, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

juanm4morales commented Nov 11, 2025

Uh oh!

MatthewCWeston commented Dec 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MatthewCWeston commented Jan 11, 2026

Uh oh!

Uh oh!

Uh oh!

HassamSheikh Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

MatthewCWeston Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MatthewCWeston commented Jan 24, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MatthewCWeston commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MatthewCWeston commented Sep 7, 2025 •

edited by cursor bot

Loading