Algorithms in qpolgrad have been organized to define functions for loss calculation. Those functions are then called in the update function for the algorithm. A2C and PPO need to be brought up to that same structure.
Specifically:
- Define
compute_policy_loss and compute_value_loss functions in A2C and PPO.
- Modify the update rules for both algorithms to call the loss computation functions.
- Update docstrings to reflect your changes! If there aren't docstrings (sorry), add them!
👍
Algorithms in
qpolgradhave been organized to define functions for loss calculation. Those functions are then called in theupdatefunction for the algorithm. A2C and PPO need to be brought up to that same structure.Specifically:
compute_policy_lossandcompute_value_lossfunctions in A2C and PPO.👍