Skip to content

a problem with ddp strategy #25

@chan-yuu

Description

@chan-yuu
Traceback (most recent call last):
  File "/home/user/Documents/cyun/navsim/navsim/navsim/planning/script/run_training.py", line 134, in main
    trainer.fit(
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
    return function(*args, **kwargs)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 987, in _run
    results = self._run_stage()
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage
    self.fit_loop.run()
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 205, in run
    self.advance()
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 363, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 140, in run
    self.advance(data_fetcher)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 250, in advance
    batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 190, in run
    self._optimizer_step(batch_idx, closure)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 268, in _optimizer_step
    call._call_lightning_module_hook(
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 157, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/core/module.py", line 1303, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 152, in step
    step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 270, in optimizer_step
    optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 239, in optimizer_step
    return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/amp.py", line 80, in optimizer_step
    closure_result = closure()
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 144, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 129, in closure
    step_output = self._step_fn()
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 318, in _training_step
    training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 390, in training_step
    return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 642, in __call__
    wrapper_output = wrapper_module(*args, **kwargs)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/navsim/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1139, in forward
    if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
RuntimeError: It looks like your LightningModule has parameters that were not used in producing the loss returned by training_step. If this is intentional, you must enable the detection of unused parameters in DDP, either by setting the string value `strategy='ddp_find_unused_parameters_true'` or by setting the flag in the strategy with `strategy=DDPStrategy(find_unused_parameters=True)`.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Epoch 0:   0%| | 1/1330 [00:19<7:08:47,  0.05it/s, v_num=0, train/loss_step=0.180, train/reward

Why is there a problem with ddp configuration?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions