The Actor Critic Structure in MAA2C

A little confused about your implementation of MAA2C. I don't think the input of the actor network is simply the ``joint state" of the agents. According to [1] the critic's input should be state of the environment (where agents' joint state is not necessarily defined) + the joint action of the agents, i.e., the critic here should be a Q-function for joint actions. And for the actor it should be something like a policy, where I am not quite understand why the actor network is implemented in this way. Appreciate if explained.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Actor Critic Structure in MAA2C #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The Actor Critic Structure in MAA2C #5

Description

Activity

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions