The question about the loss function

Hi, Thanks for sharing the codebase for your work. I am trying to train the network on custom data, but I got the following error：

/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [46,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [47,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [48,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [49,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [50,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [51,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [52,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [53,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [54,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [55,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [56,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [57,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [58,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [59,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [60,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [61,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [62,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [1,0,0], thread: [63,0,0] Assertion `input_val >= zero && input_val <= one` failed.
  0%|          | 1/15356 [00:01<5:05:31,  1.19s/it]
Traceback (most recent call last):
  File "/media/home/C/NgeNet/train.py", line 224, in <module>
    main()
  File "/media/home/C/NgeNet/train.py", line 124, in main
    loss_dict = model_loss(coords_src=coords_src,
  File "/home/anaconda3/envs/Negnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/home/C/NgeNet/losses/loss.py", line 130, in forward
    overlap_loss_v = 0.5 * self.overlap_loss(ol_scores_src, ol_gt_src) + \
  File "/media/home/C/NgeNet/losses/loss.py", line 65, in overlap_loss
    weights[ol_gt > 0.5] = 1 - ratio
RuntimeError: CUDA error: device-side assert triggered

Process finished with exit code 1

It seems that during the network training, the weight parameter is too large, cause the variables ‘q_feats_local’ is too large, cause the leaky_relu to be nan, which causes an error in the back propagation of loss. Have you ever encountered this situation during the experiment?





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The question about the loss function #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The question about the loss function #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions