The performance of the SAC algorithm in the project is significantly worse than the performance of SAC in the stable baseline. The training of the slide cabinet subtask in the kitchen environment using the SAC algorithm in this project fails to converge, while the loss function tends to exponentially explode. I have carefully examined the code of the project and the SAC in stable baseline3 and found no reason for this anomaly.
https://github.com/clvrai/spirl/blob/master/spirl/rl/agents/ac_agent.py
https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/sac/sac.py
The performance of the SAC algorithm in the project is significantly worse than the performance of SAC in the stable baseline. The training of the slide cabinet subtask in the kitchen environment using the SAC algorithm in this project fails to converge, while the loss function tends to exponentially explode. I have carefully examined the code of the project and the SAC in stable baseline3 and found no reason for this anomaly.
https://github.com/clvrai/spirl/blob/master/spirl/rl/agents/ac_agent.py
https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/sac/sac.py