Skip to content

[Bug] Training/Inference crash with Qwen2/3-VL due to missing mm_token_type_ids in Collator #435

@Wangxiaoxiaoa

Description

@Wangxiaoxiaoa

Describe the bug

Currently, when using the ROLL framework to train or sample with latest multi-modal models like Qwen2-VL or Qwen3-VL, the system crashes during the data collection or forward pass.

The issue stems from two missing components in the core library:

  1. Collator Support: The DataCollatorWithPaddingForMM does not recognize or pad the mm_token_type_ids field produced by the Qwen Processor. This causes tensor creation errors or dimension mismatches when batching interleaved multimodal data.
  2. Positional Encoding Binding: Specifically for Qwen3-VL, the necessary dynamic methods for vision position ID generation are not bound in model_providers.py, leading to a KeyError when calculating 3D RoPE indices.

To Reproduce

Steps to reproduce the behavior:

  1. Initialize an RLVR or SFT pipeline using Qwen3-VL-8B-Base.
  2. Use a multimodal dataset with images.
  3. The pipeline will crash at the first sampling or training step.

Logs/Screenshots (Captured from real environment)

1 ray.exceptions.RayTaskError(AttributeError): ray::DynamicSamplingScheduler.get_batch() (pid=310859, ip=10.83.115.19)
2 File ".../roll/distributed/scheduler/generate_scheduler.py", line 351, in get_batch
3 batch = self.collator(samples)
4 File ".../roll/datasets/collator.py", line 180, in call
5 # KeyError triggered during model-specific rope index generation
6 KeyError: 'get_vision_position_ids'

Environment:

  • Hardware: NVIDIA H200
  • Model: Qwen3-VL / Qwen2-VL
  • Transformers Version: 4.45.0+ (and latest)

Additional context

Without proper handling of mm_token_type_ids, the 3D RoPE (Rotary Positional Embedding) cannot be calculated correctly for multi-image batches, resulting in either a crash or model output degradation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions