-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
It appears that the current version of x-mobility does not support for lerobot, leading to a runtime error during training. A key discrepancy observed is in the number of semantic labels:
- lerobot provides 15 semantic labels
- The corresponding parquet data only includes 7 semantic labels
When running the training command (detailed below), x-mobility fails to handle the 15 labels from lerobot (treating them as out-of-range for the 7 labels), resulting in a CUDA device-side assert error.
Reproduction Steps
Execute the training script with the specified config and data paths:
python3 train.py -c configs/lerobot_base_train_config.gin \
-d /mnt/data/notebook3/x_mobility_isaac_sim_mobilitygen \
-o /mnt/data/notebook3/mobilitygen_outputError Log
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stack trace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:73: nll_loss2d_forward_no_reduce_kernel: block: [88,0,0], thread: [963,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
Metadata
Metadata
Assignees
Labels
No labels