Skip to content

What Does the 1×5 Input Dimension Represent in a LiDAR Backbone Model? BEVFusion #342

@dheepak-gan

Description

@dheepak-gan

Hi, I am seeking help with an issue I’m currently facing. For the LiDAR backbone model (ONNX), the required input dimension is 1×5. Since LiDAR data is usually in an unstructured point cloud format, we need to voxelize it before passing it to the CNN sparse convolution layer, so that the structured data can be processed properly.

Typically, the input to a sparse convolution layer is features + coordinates. However, in this LiDAR backbone model, the input is expected to be 1×5, and I am unsure what these dimensions represent.

Does 1×5 mean one voxel with five features: (batch index, x index, y index, z index, and mean intensity of points inside the voxel)?

Or does it mean one voxel with: (x index, y index, z index, mean intensity, and number of points per voxel)? or other features different from these two?

I am stuck at this point and would like clarification on what the 1×5 input exactly represents. Also, does this mean the LiDAR backbone model only processes a single voxel at a time, or is this just an example format of how the model execution is structured?

Any guidance or explanation would be greatly appreciated.
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions