Open
Description
🐛 Bug
Hello. I am trying to reproduce some algorithms or experiments, to record some data. But some expectation happens, nan is generated for some unknown reasons. Any advice to solve?
To Reproduce
python -u ../../rl-baselines3-zoo-master/train.py --algo sac --env AntBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
python -u ../../rl-baselines3-zoo-master/train.py --algo sac --env HalfCheetahBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
python -u ../../rl-baselines3-zoo-master/train.py --algo tqc --env AntBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
python -u ../../rl-baselines3-zoo-master/train.py --algo tqc --env HalfCheetahBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
Relevant log output / Error message
python -u ../../rl-baselines3-zoo-master/train.py --algo sac --env AntBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
Traceback (most recent call last):
File "/share/home/zhangjundong/exp/sac-AntBulletEnv-v0/../../rl-baselines3-zoo-master/train.py", line 4, in <module>
train()
File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/train.py", line 272, in train
exp_manager.learn(model)
File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/exp_manager.py", line 240, in learn
model.learn(self.n_timesteps, **kwargs)
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/sac.py", line 307, in learn
return super().learn(
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/sac.py", line 219, in train
self.actor.reset_noise()
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/policies.py", line 145, in reset_noise
self.action_dist.sample_weights(self.log_std, batch_size=batch_size)
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/distributions.py", line 508, in sample_weights
self.weights_dist = Normal(th.zeros_like(std), std)
File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/normal.py", line 56, in __init__
super().__init__(batch_shape, validate_args=validate_args)
File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/distribution.py", line 68, in __init__
raise ValueError(
ValueError: Expected parameter scale (Tensor of shape (300, 8)) of distribution Normal(loc: torch.Size([300, 8]), scale: torch.Size([300, 8])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
grad_fn=<ExpBackward0>)
python -u ../../rl-baselines3-zoo-master/train.py --algo sac --env HalfCheetahBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
Traceback (most recent call last):
File "/share/home/zhangjundong/exp/sac-HalfCheetahBulletEnv-v0/../../rl-baselines3-zoo-master/train.py", line 4, in <module>
train()
File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/train.py", line 272, in train
exp_manager.learn(model)
File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/exp_manager.py", line 240, in learn
model.learn(self.n_timesteps, **kwargs)
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/sac.py", line 307, in learn
return super().learn(
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/sac.py", line 219, in train
self.actor.reset_noise()
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/policies.py", line 145, in reset_noise
self.action_dist.sample_weights(self.log_std, batch_size=batch_size)
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/distributions.py", line 508, in sample_weights
self.weights_dist = Normal(th.zeros_like(std), std)
File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/normal.py", line 56, in __init__
super().__init__(batch_shape, validate_args=validate_args)
File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/distribution.py", line 68, in __init__
raise ValueError(
ValueError: Expected parameter scale (Tensor of shape (300, 6)) of distribution Normal(loc: torch.Size([300, 6]), scale: torch.Size([300, 6])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
...,
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan]], device='cuda:0',
grad_fn=<ExpBackward0>)
python -u ../../rl-baselines3-zoo-master/train.py --algo tqc --env AntBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
Traceback (most recent call last):
File "/share/home/zhangjundong/exp/tqc-AntBulletEnv-v0/../../rl-baselines3-zoo-master/train.py", line 4, in <module>
train()
File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/train.py", line 272, in train
exp_manager.learn(model)
File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/exp_manager.py", line 240, in learn
model.learn(self.n_timesteps, **kwargs)
File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/tqc.py", line 302, in learn
return super().learn(
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/tqc.py", line 213, in train
self.actor.reset_noise()
File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/policies.py", line 144, in reset_noise
self.action_dist.sample_weights(self.log_std, batch_size=batch_size)
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/distributions.py", line 508, in sample_weights
self.weights_dist = Normal(th.zeros_like(std), std)
File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/normal.py", line 56, in __init__
super().__init__(batch_shape, validate_args=validate_args)
File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/distribution.py", line 68, in __init__
raise ValueError(
ValueError: Expected parameter scale (Tensor of shape (300, 8)) of distribution Normal(loc: torch.Size([300, 8]), scale: torch.Size([300, 8])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
grad_fn=<ExpBackward0>)
python -u ../../rl-baselines3-zoo-master/train.py --algo tqc --env HalfCheetahBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
Traceback (most recent call last):
File "/share/home/zhangjundong/exp/tqc-HalfCheetahBulletEnv-v0/../../rl-baselines3-zoo-master/train.py", line 4, in <module>
train()
File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/train.py", line 272, in train
exp_manager.learn(model)
File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/exp_manager.py", line 240, in learn
model.learn(self.n_timesteps, **kwargs)
File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/tqc.py", line 302, in learn
return super().learn(
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/tqc.py", line 213, in train
self.actor.reset_noise()
File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/policies.py", line 144, in reset_noise
self.action_dist.sample_weights(self.log_std, batch_size=batch_size)
File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/distributions.py", line 508, in sample_weights
self.weights_dist = Normal(th.zeros_like(std), std)
File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/normal.py", line 56, in __init__
super().__init__(batch_shape, validate_args=validate_args)
File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/distribution.py", line 68, in __init__
raise ValueError(
ValueError: Expected parameter scale (Tensor of shape (300, 6)) of distribution Normal(loc: torch.Size([300, 6]), scale: torch.Size([300, 6])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[0.0026, 0.0041, nan, 0.0036, 0.0046, 0.0034],
[0.0054, 0.0040, nan, 0.0035, 0.0053, 0.0054],
[0.0192, 0.0061, nan, 0.0105, 0.0105, 0.0105],
...,
[0.0257, 0.0262, nan, 0.0058, 0.0023, 0.0098],
[0.1410, 0.0130, nan, 0.1707, 0.1281, 0.0216],
[0.0494, 0.0480, nan, 0.0506, 0.0509, 0.0487]], device='cuda:0',
grad_fn=<ExpBackward0>)
System Info
- OS: Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.17 # 1 SMP Mon Oct 19 16:18:59 UTC 2020
- Python: 3.9.18
- Stable-Baselines3: 2.2.1
- PyTorch: 2.1.0+cu121
- GPU Enabled: True
- Numpy: 1.26.1
- Cloudpickle: 3.0.0
- Gymnasium: 0.29.1
- OpenAI Gym: 0.26.2
Checklist
- I have checked that there is no similar issue in the repo
- I have read the SB3 documentation
- I have read the RL Zoo documentation
- I have provided a minimal and working example to reproduce the bug
- I've used the markdown code blocks for both code and stack traces.