`MarkovVectorEnv` casts infos as a Python list throws error while training CleanRL's multi-agent PPO code

I am running the CleanRL's PPO code for a custom PettingZoo environment using the code [here](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_pettingzoo_ma_atari.py). In line 163, we wrap the environments with the `RecordEpisodeStatistics` Gymnasium wrapper, which is then used in lines 210-215 for logging each player's return after the episode has ended.

It turns out that when we invoke `pettingzoo_env_to_vec_env_v1`, it invokes the `MarkovVectorEnv` class. Here, [in line 59](https://github.com/Farama-Foundation/SuperSuit/blob/master/supersuit/vector/markov_vector_wrapper.py#L59) and also in lines 92 and 101, the infos are cast as a `list` instead of a usual `dict`.

Consequently, the aforementioned Gymnasium wrapper throws an error (tested on PZ's `Pistonball` environment):
```
----> 6     observations, rewards, terminations, truncations, infos = env.step(actions)
      7 env.close()

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/gymnasium/wrappers/record_episode_statistics.py:95, in RecordEpisodeStatistics.step(self, action)
     87 """Steps through the environment, recording the episode statistics."""
     88 (
     89     observations,
     90     rewards,
   (...)
     93     infos,
     94 ) = self.env.step(action)
---> 95 assert isinstance(
     96     infos, dict
     97 ), f"`info` dtype is {type(infos)} while supported dtype is `dict`. This may be due to usage of other wrappers in the wrong order."
     98 self.episode_returns += rewards
     99 self.episode_lengths += 1

AssertionError: `info` dtype is <class 'list'> while supported dtype is `dict`. This may be due to usage of other wrappers in the wrong order.
```

Can this please be fixed? If it matters, I am running the code on Lightning Studio with Python 3.10.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`MarkovVectorEnv` casts infos as a Python list throws error while training CleanRL's multi-agent PPO code #249

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

MarkovVectorEnv casts infos as a Python list throws error while training CleanRL's multi-agent PPO code #249

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`MarkovVectorEnv` casts infos as a Python list throws error while training CleanRL's multi-agent PPO code #249