Extend `VisionAddOn` Pattern to Qwen2.5VL

Currently, the only multi-modal models that have been migrated to the "unified" architecture are Gemma3 and Pixtral:

https://github.com/lmstudio-ai/mlx-engine/blob/ecc2cf48e449478784ffd21f1d16745e11647e2f/mlx_engine/model_kit/model_kit.py#L35-L38

Extending this pattern to Qwen2.5VL/Qwen2VL is desired.

Relevant `mlx-vlm` components:
- https://github.com/Blaizzy/mlx-vlm/tree/2068970094c78878c77fd78677d1316933562ade/mlx_vlm/models/qwen2_5_vl
- https://github.com/Blaizzy/mlx-vlm/tree/main/mlx_vlm/models/qwen2_vl

Relevant `mlx-lm` components:
- https://github.com/ml-explore/mlx-lm/blob/77edf17bc0bf7c9313e0b970490db86a4f64bee4/mlx_lm/models/qwen2.py

This will likely look like:
1. Ensure Qwen2.5VL text model architecture is implemented correctly in `mlx-lm` (including MRoPE, see https://arxiv.org/abs/2502.13923 for details and [Apply PR #319 fixes to Qwen 2.5VL position id #349](https://github.com/Blaizzy/mlx-vlm/pull/349) for mlx-vlm in progress work)
2. Implement `Qwen2_5_VLVisionAddOn` and wire it in `ModelKit`
3. Ensure Qwen2.5VL tests in `mlx-engine` still pass

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend `VisionAddOn` Pattern to Qwen2.5VL #167

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	VISION_ADD_ON_MAP = {
	"gemma3": Gemma3VisionAddOn,
	"pixtral": PixtralVisionAddOn,
	}

Extend VisionAddOn Pattern to Qwen2.5VL #167

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Extend `VisionAddOn` Pattern to Qwen2.5VL #167