Implementation of Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Added another branch for Soft Actor-Critic Algorithms and Applications -> SAC_V1.
Soft Q-Learning uses the following objective function instead of the conventional expected cumulative return:
The entropy term is also maximized which have two major benefits:- The exploration will be intelligently tuned and maximized as much as need, so the exploration/exploitation trade off is well satisfied.
- It prevent the learning procedure to get stuck in a local optima which results to a suboptimal policy.
| Humanoid-v2 | Walker2d-v2 | Hopper-v2 |
|---|---|---|
![]() |
![]() |
![]() |
| Humanoid-v2 | Walker2d-v2 | Hopper-v2 |
|---|---|---|
![]() |
![]() |
![]() |
- gym == 0.17.2
- mujoco-py == 2.0.2.13
- numpy == 1.19.1
- psutil == 5.4.2
- torch == 1.4.0
pip3 install -r requirements.txtpython3 main.py- You may use
Trainflag to specify whether to train your agent when it isTrueor test it when the flag isFalse. - There are some pre-trained weights in pre-trained models dir, you can test the agent by using them; put them on the root folder of the project and turn
Trainflag toFalse.
- Humanoid-v2
- Hopper-v2
- Walker2d-v2
- HalfCheetah-v2
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al., 2018
- Soft Actor-Critic Algorithms and Applications, Haarnoja et al., 2018
All credits goes to @pranz24 for his brilliant Pytorch implementation of SAC.
Special thanks to @p-christ for SAC.py
![J( \pi ) = \sum_{t=0}^T E_{( s_{t}, a_{t}) \sim{ \rho _{t} }} [r(s_{t}, a_{t}) + \alpha H(.|s_{t})]](/alirezakazemipour/SAC/raw/master/Result/Soft_q_learning.jpg)





