Skip to content

[Bug]: Nan Problems for SAC, TQC, for AntBulletEnv-v0, HalfCheetahBulletEnv-v0 #427

Open
@ZJEast

Description

@ZJEast

🐛 Bug

Hello. I am trying to reproduce some algorithms or experiments, to record some data. But some expectation happens, nan is generated for some unknown reasons. Any advice to solve?

To Reproduce

python -u ../../rl-baselines3-zoo-master/train.py --algo sac --env AntBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
python -u ../../rl-baselines3-zoo-master/train.py --algo sac --env HalfCheetahBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
python -u ../../rl-baselines3-zoo-master/train.py --algo tqc --env AntBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
python -u ../../rl-baselines3-zoo-master/train.py --algo tqc --env HalfCheetahBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs

Relevant log output / Error message

python -u ../../rl-baselines3-zoo-master/train.py --algo sac --env AntBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
Traceback (most recent call last):
  File "/share/home/zhangjundong/exp/sac-AntBulletEnv-v0/../../rl-baselines3-zoo-master/train.py", line 4, in <module>
    train()
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/train.py", line 272, in train
    exp_manager.learn(model)
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/exp_manager.py", line 240, in learn
    model.learn(self.n_timesteps, **kwargs)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/sac.py", line 307, in learn
    return super().learn(
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
    self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/sac.py", line 219, in train
    self.actor.reset_noise()
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/policies.py", line 145, in reset_noise
    self.action_dist.sample_weights(self.log_std, batch_size=batch_size)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/distributions.py", line 508, in sample_weights
    self.weights_dist = Normal(th.zeros_like(std), std)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/normal.py", line 56, in __init__
    super().__init__(batch_shape, validate_args=validate_args)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/distribution.py", line 68, in __init__
    raise ValueError(
ValueError: Expected parameter scale (Tensor of shape (300, 8)) of distribution Normal(loc: torch.Size([300, 8]), scale: torch.Size([300, 8])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       grad_fn=<ExpBackward0>)
python -u ../../rl-baselines3-zoo-master/train.py --algo sac --env HalfCheetahBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
Traceback (most recent call last):
  File "/share/home/zhangjundong/exp/sac-HalfCheetahBulletEnv-v0/../../rl-baselines3-zoo-master/train.py", line 4, in <module>
    train()
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/train.py", line 272, in train
    exp_manager.learn(model)
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/exp_manager.py", line 240, in learn
    model.learn(self.n_timesteps, **kwargs)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/sac.py", line 307, in learn
    return super().learn(
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
    self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/sac.py", line 219, in train
    self.actor.reset_noise()
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/sac/policies.py", line 145, in reset_noise
    self.action_dist.sample_weights(self.log_std, batch_size=batch_size)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/distributions.py", line 508, in sample_weights
    self.weights_dist = Normal(th.zeros_like(std), std)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/normal.py", line 56, in __init__
    super().__init__(batch_shape, validate_args=validate_args)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/distribution.py", line 68, in __init__
    raise ValueError(
ValueError: Expected parameter scale (Tensor of shape (300, 6)) of distribution Normal(loc: torch.Size([300, 6]), scale: torch.Size([300, 6])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        ...,
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan]], device='cuda:0',
       grad_fn=<ExpBackward0>)
python -u ../../rl-baselines3-zoo-master/train.py --algo tqc --env AntBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
Traceback (most recent call last):
  File "/share/home/zhangjundong/exp/tqc-AntBulletEnv-v0/../../rl-baselines3-zoo-master/train.py", line 4, in <module>
    train()
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/train.py", line 272, in train
    exp_manager.learn(model)
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/exp_manager.py", line 240, in learn
    model.learn(self.n_timesteps, **kwargs)
  File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/tqc.py", line 302, in learn
    return super().learn(
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
    self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
  File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/tqc.py", line 213, in train
    self.actor.reset_noise()
  File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/policies.py", line 144, in reset_noise
    self.action_dist.sample_weights(self.log_std, batch_size=batch_size)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/distributions.py", line 508, in sample_weights
    self.weights_dist = Normal(th.zeros_like(std), std)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/normal.py", line 56, in __init__
    super().__init__(batch_shape, validate_args=validate_args)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/distribution.py", line 68, in __init__
    raise ValueError(
ValueError: Expected parameter scale (Tensor of shape (300, 8)) of distribution Normal(loc: torch.Size([300, 8]), scale: torch.Size([300, 8])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       grad_fn=<ExpBackward0>)
python -u ../../rl-baselines3-zoo-master/train.py --algo tqc --env HalfCheetahBulletEnv-v0 --n-timesteps 20000000 --tensorboard-log tf-logs
Traceback (most recent call last):
  File "/share/home/zhangjundong/exp/tqc-HalfCheetahBulletEnv-v0/../../rl-baselines3-zoo-master/train.py", line 4, in <module>
    train()
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/train.py", line 272, in train
    exp_manager.learn(model)
  File "/share/home/zhangjundong/rl-baselines3-zoo-master/rl_zoo3/exp_manager.py", line 240, in learn
    model.learn(self.n_timesteps, **kwargs)
  File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/tqc.py", line 302, in learn
    return super().learn(
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
    self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
  File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/tqc.py", line 213, in train
    self.actor.reset_noise()
  File "/share/home/zhangjundong/stable-baselines3-contrib-master/sb3_contrib/tqc/policies.py", line 144, in reset_noise
    self.action_dist.sample_weights(self.log_std, batch_size=batch_size)
  File "/share/home/zhangjundong/stable-baselines3-master/stable_baselines3/common/distributions.py", line 508, in sample_weights
    self.weights_dist = Normal(th.zeros_like(std), std)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/normal.py", line 56, in __init__
    super().__init__(batch_shape, validate_args=validate_args)
  File "/share/home/zhangjundong/.local/lib/python3.9/site-packages/torch/distributions/distribution.py", line 68, in __init__
    raise ValueError(
ValueError: Expected parameter scale (Tensor of shape (300, 6)) of distribution Normal(loc: torch.Size([300, 6]), scale: torch.Size([300, 6])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[0.0026, 0.0041,    nan, 0.0036, 0.0046, 0.0034],
        [0.0054, 0.0040,    nan, 0.0035, 0.0053, 0.0054],
        [0.0192, 0.0061,    nan, 0.0105, 0.0105, 0.0105],
        ...,
        [0.0257, 0.0262,    nan, 0.0058, 0.0023, 0.0098],
        [0.1410, 0.0130,    nan, 0.1707, 0.1281, 0.0216],
        [0.0494, 0.0480,    nan, 0.0506, 0.0509, 0.0487]], device='cuda:0',
       grad_fn=<ExpBackward0>)

System Info

  • OS: Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.17 # 1 SMP Mon Oct 19 16:18:59 UTC 2020
  • Python: 3.9.18
  • Stable-Baselines3: 2.2.1
  • PyTorch: 2.1.0+cu121
  • GPU Enabled: True
  • Numpy: 1.26.1
  • Cloudpickle: 3.0.0
  • Gymnasium: 0.29.1
  • OpenAI Gym: 0.26.2

Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdocumentationImprovements or additions to documentationenhancementNew feature or requesthelp wantedHelp from contributors is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions