-
Notifications
You must be signed in to change notification settings - Fork 146
Description
Hi, thanks for your great work!
While training stage-1, I have a problem. How can I solve it?
run command (VAD tiny stage_1 with nuscenes mini dataset):
python -m torch.distributed.run --nproc_per_node=8 --master_port=2333 tools/train.py projects/configs/VAD/VAD_tiny_stage_1.py --launcher pytorch --deterministic --work-dir ./data/output/
output:
....
2025-01-25 09:08:08,961 - mmdet - INFO - Saving checkpoint at 47 epochs
2025-01-25 09:09:10,091 - mmdet - INFO - Saving checkpoint at 48 epochs
[ ] 0/81, elapsed: 0s, ETA:/usr/local/lib/python3.8/dist-packages/torch/tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
...
self.post_center_range = torch.tensor(
/usr/local/lib/python3.8/dist-packages/torch/tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
[>>>> ] 8/81, 1.9 task/s, elapsed: 4s, ETA: 39s/VAD/projects/mmdet3d_plugin/core/bbox/coders/fut_nms_free_coder.py:78: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad(True), rather than torch.tensor(sourceTensor).
self.post_center_range = torch.tensor(
/VAD/projects/mmdet3d_plugin/core/bbox/coders/map_nms_free_coder.py:82: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad(True), rather than torch.tensor(sourceTensor).
self.post_center_range = torch.tensor(
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 88/81, 14.5 task/s, elapsed: 6s, ETA: 0s
Traceback (most recent call last):
File "tools/train.py", line 266, in
main()
File "tools/train.py", line 255, in main
custom_train_model(
File "/VAD/projects/mmdet3d_plugin/VAD/apis/train.py", line 21, in custom_train_model
custom_train_detector(
File "/VAD/projects/mmdet3d_plugin/VAD/apis/mmdet_train.py", line 194, in custom_train_detector
runner.run(data_loaders, cfg.workflow)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
self.call_hook('after_train_epoch')
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/hooks/evaluation.py", line 267, in after_train_epoch
self._do_evaluate(runner)
File "/VAD/projects/mmdet3d_plugin/core/evaluation/eval_hooks.py", line 88, in _do_evaluate
key_score = self.evaluate(runner, results)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/hooks/evaluation.py", line 361, in evaluate
eval_res = self.dataloader.dataset.evaluate(
File "/VAD/projects/mmdet3d_plugin/datasets/nuscenes_vad_dataset.py", line 1781, in evaluate
all_metric_dict[key] += results[i]['metric_results'][key]
KeyError: 0
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 6263) of binary: /usr/bin/python
/usr/local/lib/python3.8/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py:367: UserWarning:
Thank you.