[Feature Request] MaskableRecurrentPPO

**Motivation**
MaskablePPO is great for large discrete action space that has many invalid actions at each step, while RecurrentPPO is useful for the agent to has a memory of previous observations and actions taken, which improves it's decision making. Right now, we have to choose between those 2 algorithms and cannot have features of both of them, which would greatly improve agents training when both action masking and sequence processing is helpful.

**Feature**
MaskableRecurrentPPO - An algorithm that is a combination of MaskablePPO and RecurrentPPO. Or action masking integration to PPO and RecurrentPPO.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] MaskableRecurrentPPO #101

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] MaskableRecurrentPPO #101

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions