[Feature] RPC transport splits multimodal tensors incorrectly for multi-image samples

  When a sample contains multiple images, multimodal tensors like `pixel_values` and `image_grid_thw` may be treated as normal batched tensors by the RTensor/RPC transport layer.

  Their leading dimension does not match the outer batch layout, so they can be split or inferred incorrectly during RPC round-trip.
  - these multimodal payload tensors should be transportable as non-batched objects
  - they should not participate in outer batch layout inference
  - they should remain intact if worker methods return mutated batch structures

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] RPC transport splits multimodal tensors incorrectly for multi-image samples #1036

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] RPC transport splits multimodal tensors incorrectly for multi-image samples #1036

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions