-
Notifications
You must be signed in to change notification settings - Fork 304
Description
Hi, I am seeking help with an issue I’m currently facing. For the LiDAR backbone model (ONNX), the required input dimension is 1×5. Since LiDAR data is usually in an unstructured point cloud format, we need to voxelize it before passing it to the CNN sparse convolution layer, so that the structured data can be processed properly.
Typically, the input to a sparse convolution layer is features + coordinates. However, in this LiDAR backbone model, the input is expected to be 1×5, and I am unsure what these dimensions represent.
Does 1×5 mean one voxel with five features: (batch index, x index, y index, z index, and mean intensity of points inside the voxel)?
Or does it mean one voxel with: (x index, y index, z index, mean intensity, and number of points per voxel)? or other features different from these two?
I am stuck at this point and would like clarification on what the 1×5 input exactly represents. Also, does this mean the LiDAR backbone model only processes a single voxel at a time, or is this just an example format of how the model execution is structured?
Any guidance or explanation would be greatly appreciated.
Thanks!