When a sample contains multiple images, multimodal tensors like pixel_values and image_grid_thw may be treated as normal batched tensors by the RTensor/RPC transport layer.
Their leading dimension does not match the outer batch layout, so they can be split or inferred incorrectly during RPC round-trip.
- these multimodal payload tensors should be transportable as non-batched objects
- they should not participate in outer batch layout inference
- they should remain intact if worker methods return mutated batch structures