This guide is the practical checklist for adding new text models and VLMs to mlxcel.
It points to the concrete control surfaces that must stay consistent. If this repository later adds a maintainer workflow document, keep this checklist aligned with it.
- Keep new model additions predictable.
- Reuse existing control-plane helpers instead of adding new one-off branches.
- Add tests alongside the integration points that are easiest to regress.
- Identify the upstream reference:
- Text models: upstream
mlx-lmmodel implementations. - VLMs: upstream
mlx-vlmmodel implementations. - If local
references/checkouts are not present, clone or inspect the upstream repositories separately.
- Text models: upstream
- Decide whether the architecture is:
- A brand new model family
- A format alias of an existing family
- A VLM wrapper around an existing text model
- Check whether an existing loader helper already matches the new model:
src/model_metadata.rssrc/loading/mod.rssrc/loading/vlm.rssrc/models/mod.rs- Start with the converged registration surface in
src/model_metadata.rs. Standard text models should extend that registration table first, becausesrc/loading/config_backed.rsnow consumes the same source of truth.
- Check whether the change also touches shared execution policy:
src/execution/runtime.rsfor device/environment behaviorsrc/execution/sampling.rsfor user-facing sampling defaults and greedy-vs-sampled assembly
- Add the implementation file under
src/models/. - Register the module and re-export in
src/models/mod.rs. - Add a
ModelTypevariant insrc/models/mod.rs. - Extend
get_model_type()insrc/models/detection.rs.- Prefer shared helpers such as
detect_text_or_vlm()anddetect_hunyuan_model_type()when the new model fits an existing pattern.
- Prefer shared helpers such as
- Add the corresponding
LoadedModelvariant insrc/loaded_model.rs.- Prefer extending the existing dispatch helpers instead of adding new repeated match tables:
delegate_language_model!insrc/loaded_model.rsandVlmRuntimeRefinsrc/loaded_model_capabilities.rs
- Prefer extending the existing dispatch helpers instead of adding new repeated match tables:
- Wire loading in
src/loading/mod.rs.- Prefer existing helpers like
load_pair_from_dir()andload_owned_model_from_config!. - Update
src/model_metadata.rsso kind, adapter support, route selection, and standard config-backed registration stay centralized before touching the router. - If the model follows the standard text-model path, extend the shared
registration surface in
src/model_metadata.rsinstead of adding a parallel entry list insrc/loading/config_backed.rs.
- Prefer existing helpers like
- If LoRA/adapters are supported, verify
load_model_from_weights()insrc/loading/mod.rs.- Non-standard adapter paths should extend
src/loading/special.rsinstead of growingload_model_from_weights()directly.
- Non-standard adapter paths should extend
- Implement or reuse the vision encoder under
src/vision/encoders/. - Implement or reuse the connector under
src/vision/connectors/. - Implement or reuse the processor under
src/vision/processors/. - Add the VLM
ModelTypedetection insrc/models/detection.rs.- If the base text family has both text-only and VLM variants, prefer
detect_text_or_vlm().
- If the base text family has both text-only and VLM variants, prefer
- Add the loader entry in
src/loading/vlm.rsor the matching family module undersrc/loading/.- Prefer shared helpers for config parsing, token defaults, and weight remapping.
- Keep family-specific assembly grouped with its nearest peers:
src/loading/vlm_qwen.rs,src/loading/vlm_llava.rs,src/loading/vlm_gemma.rs,src/loading/vlm_pixtral.rs,src/loading/vlm_siglip.rs,src/loading/vlm_special.rs. - Update
src/model_metadata.rsso the router knows the family is multimodal and adapter loading policy remains explicit. - Register the directory entry point in
try_load_vlm_model_from_dir()insrc/loading/mod.rssoload_model()stays as a thin dispatcher.
- Add or reuse the
LoadedModelcapability helpers insrc/loaded_model_capabilities.rs.- Prefer extending
VlmRuntimeRefor an existing multimodal helper over adding family-specific getters that only CLI/server use.
- Prefer extending
- Reuse prompt helpers where possible:
- Qwen-VL token insertion:
src/multimodal/qwen_vl.rs - Generic image-token block expansion:
src/multimodal/vlm_prompt.rs - Phi3V prompt tag handling:
src/multimodal/phi3v_prompt.rs
- Qwen-VL token insertion:
Do not create a new file by default. Create one when the family has a distinct control-plane identity.
Create a new loader family module when:
- config normalization is not a small variant of an existing family
- token defaults or weight-key remapping need dedicated tests
- the VLM wrapper uses a different prompt/runtime assembly path
- adding the logic inline would obscure an existing family boundary
Keep the model in an existing module when:
- it is primarily an alias or small config delta
- the same loader tests already express the policy
- the family is still recognizable after the change
If you are unsure, extend the existing family module first and split only when the test file or router starts to lose a clear boundary.
src/models/mod.rs- Missing module export or
ModelTypevariant
- Missing module export or
src/models/detection.rs- Missing aliases in
get_model_type() - Text/VLM misclassification when
vision_configis present
- Missing aliases in
src/loading/mod.rs- Divergence between
load_model()andload_model_from_weights() - Adding a standard config-backed model as a one-off special case instead of the shared loader helpers
- Adding a VLM directly into
load_model()instead oftry_load_vlm_model_from_dir()
- Divergence between
src/model_metadata.rs- Forgetting to update text/VLM kind, adapter support, route policy, or the shared standard-text registration entry before wiring loaders
src/loading/config_backed.rs- Bypassing the shared registration surface and adding new one-off loader logic
- Forgetting wrapper constructors for models such as
Llama4,Gemma3, orMinistral3
src/loading/nonstandard.rs- Leaving directory-only loader families in
src/loading/mod.rsinstead of the non-standard registry
- Leaving directory-only loader families in
src/loading/special.rs- Adding adapter/owned-weight special handling inline instead of the special-weight registry
- Forgetting Qwen3.5 text-config normalization or owned-weight sanitization before construction
Keep src/loading/mod.rs focused on route selection. If a new model family adds
substantial construction logic, prefer a dedicated sibling module and call it
from the router instead of growing load_model() or load_model_from_weights()
inline.
For very large model families, extract internal helper hotspots into a focused sibling helper module when the code changes for different reasons than the main decoder stack. Current examples:
src/models/gemma3n_helpers.rssrc/models/llama4_helpers.rssrc/loading/vlm.rsand sibling family modules undersrc/loading/- Wrong default token IDs
- Missing top-level quantization inheritance
- Incorrect weight-key remapping between text and vision towers
src/loaded_model.rs/src/loaded_model_capabilities.rs- Missing dispatch arm for a new variant
- Missing capability wiring in
VlmRuntimeRef - Updating the all-model dispatch macro but forgetting the multimodal capability switchboard
Add tests in the same slice as the model/control-plane change.
- For model detection helpers:
- Add unit tests near
src/models/detection.rs
- Add unit tests near
- For sanitization helpers:
- Add unit tests near
src/models/sanitize.rs
- Add unit tests near
- For loader normalization or token-default logic:
- Add tests near
src/loading/tests.rsor the relevantsrc/loading/vlm*_tests.rs
- Add tests near
- For shared vision merge contracts:
- Add tests near
src/vision/merge_tests.rs
- Add tests near
- For prompt/token expansion logic:
- Add tests in dedicated helper test files such as
src/multimodal/qwen_vl_tests.rs,src/multimodal/vlm_prompt_tests.rs,src/multimodal/phi3v_prompt_tests.rs
- Add tests in dedicated helper test files such as
- For runtime validation:
- Run
scripts/run_quality_gate.sh - Add at least one local smoke test when a matching model exists
- If the slice touched MLX-heavy ignored helper tests, run them explicitly
with
--ignored --test-threads=1
- Run
When touching shared functions used by multiple model families, update the local usage comments in the shared helper files, especially under:
src/lib/mlxcel-core/src/layers.rssrc/lib/mlxcel-core/src/utils.rs
Those comments act as the retest list for future changes.
Keep entry-point policy in the shared execution layer when the behavior must be identical across CLI, server, and future frontends.
src/execution/runtime.rs- Environment-driven device selection (
MLXCEL_DEVICE) - GPU wired-memory limit setup
- Environment-driven device selection (
src/execution/sampling.rs- Centralized
SamplingConfigassembly from resolved request defaults - Shared greedy vs non-greedy branching
- Centralized
If a new frontend or request type needs different defaults, resolve those
defaults at the edge and keep the final conversion in src/execution/.
Keep CLI-only prompt formatting and terminal output behavior in
src/commands/generate.rs instead of moving it into shared loading or server
modules.
For server-only boot behavior, keep startup policy in src/server/startup.rs
instead of growing src/server/mod.rs:
- API key / chat-template resolution precedence
- startup-time normalization of CLI-compatible flags
- warmup behavior
- Unix-socket vs TCP binding
Keep shared server types in the focused modules as well:
src/server/config.rsfor request/default configuration structssrc/server/state.rsforAppStateand metrics containerssrc/server/model_provider.rsfor the public request/response channel APIsrc/server/model_worker.rsfor the long-lived worker thread, VLM request prep, and decode state
Keep server edge adapters out of the route files once more than one endpoint needs the same behavior:
src/server/chat_request.rsfor OpenAI chat message flattening and prompt fallback assemblysrc/server/request_options.rsfor request-default merging intoServerGenerateOptionssrc/server/media.rsfordata:/file://image-source parsingsrc/server/streaming.rsfor shared SSE channel and[DONE]emission helpers
This section validates that the current architecture actually reduces ambiguity when adding new model support.
Assume a new text model that follows the existing config-backed loading path.
Required surfaces today:
src/models/<family>.rssrc/models/mod.rssrc/models/detection.rssrc/model_metadata.rsthrough the converged registration surface plusstatic_model_descriptor()/model_load_policy()src/loading/config_backed.rsonly if shared config-backed loading behavior itself must changesrc/loaded_model.rssrc/loaded_model_capabilities.rsonly if the family changes multimodal capability exposure- tests near
src/models/detection_tests.rs,src/models/sanitize_tests.rs, andsrc/loading/tests.rs
What should not happen:
- no new one-off construction branch inside
load_model() - no direct CLI or server changes unless user-visible behavior changes
- no family-specific getter added to
LoadedModelif an existing capability is enough
Why this is better than the old path:
- route selection is centralized instead of duplicated across multiple loading matches
- adapter support is declared in one policy surface
- standard text constructor registration no longer lives in a separate parallel table
- the expected edit list is short enough to review before coding starts
Assume a new VLM family needs its own token defaults and weight-key remapping.
Required surfaces today:
- text model and/or VLM wrapper under
src/models/orsrc/vision/ src/models/mod.rssrc/models/detection.rssrc/loading/vlm_<family>.rssrc/loading/vlm.rssrc/model_metadata.rsthroughstatic_model_descriptor()/model_load_policy()src/loaded_model.rsthrough enum wiringsrc/loaded_model_capabilities.rsthroughVlmRuntimeRefsrc/multimodal/only if prompt/runtime preparation is truly new- tests near
src/loading/vlm_<family>_tests.rsand any new multimodal helper test file
What should not happen:
- no concrete model-type checks added to CLI or server request paths
- no family-specific loading logic added directly to
src/loading/mod.rs - no duplicated prompt-rewrite logic across CLI and server
Why this is better than the old path:
- the family router lives in
src/loading/vlm.rs - family assembly stays beside peer VLM loaders
- multimodal frontends depend on capabilities, not family names
Before opening a PR for a new model or VLM family, confirm:
- loading policy was updated through
src/model_metadata.rs LoadedModelwiring stayed insidesrc/loaded_model.rsandsrc/loaded_model_capabilities.rsrather than creating a new one-off family getter- CLI and server still depend on shared helpers rather than the concrete model type
- unit tests cover the new policy or normalization logic
- at least one smoke test exists when a local model is available
Prefer small checkpoints that isolate one control-plane surface:
- model detection
- loader normalization
- prompt preparation
- runtime initialization
This keeps regressions searchable and makes future model additions easier to compare against previous slices.