What Does the 1×5 Input Dimension Represent in a LiDAR Backbone Model? BEVFusion

Hi, I am seeking help with an issue I’m currently facing. For the LiDAR backbone model (ONNX), the required input dimension is 1×5. Since LiDAR data is usually in an unstructured point cloud format, we need to voxelize it before passing it to the CNN sparse convolution layer, so that the structured data can be processed properly.

Typically, the input to a sparse convolution layer is features + coordinates. However, in this LiDAR backbone model, the input is expected to be 1×5, and I am unsure what these dimensions represent.

Does 1×5 mean one voxel with five features: (batch index, x index, y index, z index, and mean intensity of points inside the voxel)?

Or does it mean one voxel with: (x index, y index, z index, mean intensity, and number of points per voxel)? or other features different from these two?

I am stuck at this point and would like clarification on what the 1×5 input exactly represents. Also, does this mean the LiDAR backbone model only processes a single voxel at a time, or is this just an example format of how the model execution is structured?

Any guidance or explanation would be greatly appreciated.
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What Does the 1×5 Input Dimension Represent in a LiDAR Backbone Model? BEVFusion #342

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What Does the 1×5 Input Dimension Represent in a LiDAR Backbone Model? BEVFusion #342

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions