-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Summary
The candle-transformers/src/models/ directory has grown to contain 70+ flat module entries, mixing full and quantized implementations of the same model families. This makes the codebase harder to navigate and maintain.
Proposal: Group related models into family subdirectories, similar to the pattern demonstrated in SmolLM3 (#3180).
Current State
The models/mod.rs currently has a flat structure:
pub mod llama;
pub mod llama2_c;
pub mod llama2_c_weights;
pub mod quantized_llama;
pub mod quantized_llama2_c;
pub mod mistral;
pub mod quantized_mistral;
pub mod mixtral;
pub mod phi;
pub mod phi3;
pub mod quantized_phi;
pub mod quantized_phi3;
pub mod qwen2;
pub mod qwen2_moe;
pub mod qwen3;
pub mod qwen3_moe;
pub mod qwen3_vl;
pub mod quantized_qwen2;
pub mod quantized_qwen3;
// ... 50+ more entriesProblems:
- 70+ flat modules in a single directory
- Full and quantized versions scattered
- No clear model family grouping
- Harder to navigate and discover related implementations
- Difficult to see which models have quantized versions
Proposed Structure
Group models by family in subdirectories, similar to SmolLM3 (#3180):
models/
├── llama/
│ ├── mod.rs # Re-exports for backward compatibility
│ ├── llama.rs # Full precision
│ ├── llama2_c.rs # Llama2.c variant
│ ├── quantized_llama.rs
│ └── quantized_llama2_c.rs
├── mistral/
│ ├── mod.rs
│ ├── mistral.rs
│ ├── mixtral.rs
│ └── quantized_mistral.rs
├── phi/
│ ├── mod.rs
│ ├── phi.rs
│ ├── phi3.rs
│ ├── quantized_phi.rs
│ └── quantized_phi3.rs
├── qwen/
│ ├── mod.rs
│ ├── qwen2.rs
│ ├── qwen2_moe.rs
│ ├── qwen3.rs
│ ├── qwen3_moe.rs
│ ├── qwen3_vl.rs
│ ├── quantized_qwen2.rs
│ └── quantized_qwen3.rs
├── smol/ # Already implemented in #3180
│ ├── mod.rs
│ ├── smollm3.rs
│ └── quantized_smollm3.rs
└── ... other families
Benefits
Better Organization
- Related implementations grouped together
- Easy to see all variants of a model family
- Clear separation between families
- Easier to navigate codebase
Better Discoverability
- Users can find all Llama variants in one place
- Clear which models have quantized versions
- Easier to compare implementations within family
- Better for documentation generation
Backward Compatibility
- Re-export from module for existing imports
- No breaking changes for users
- Can migrate incrementally
Backward Compatibility Strategy
The reorganization maintains backward compatibility through re-exports. Using the Llama family as an example:
New Directory Structure
models/llama/
├── mod.rs
├── llama.rs
├── quantized_llama.rs
└── llama2_c.rs
Re-export Pattern
In models/llama/mod.rs:
// Declare submodules
pub mod llama;
pub mod quantized_llama;
pub mod llama2_c;
// Optional: re-export everything for convenience
pub use llama::*;
pub use quantized_llama::*;
pub use llama2_c::*;In models/mod.rs:
// New: expose the family module
pub mod llama;
// For backward compatibility: re-export submodules at top level
pub use llama::llama;
pub use llama::quantized_llama;
pub use llama::llama2_c;Three Import Patterns (All Work!)
Pattern 1: Legacy (backward compatible)
use candle_transformers::models::llama; // Old way still works!
use candle_transformers::models::quantized_llama; // Old way still works!Pattern 2: New nested (explicit)
use candle_transformers::models::llama::llama; // New explicit way
use candle_transformers::models::llama::quantized_llama;Pattern 3: Import whole family
use candle_transformers::models::llama::*; // Import entire familySmolLM3 Example
SmolLM3 (#3180) demonstrates this pattern:
Structure:
models/smol/
├── mod.rs
├── smollm3.rs
└── quantized_smollm3.rs
Current models/smol/mod.rs:
pub mod smollm3;
pub mod quantized_smollm3;In models/mod.rs:
pub mod smol;Migration Decision
Suggested Model Families
Based on the current modules, these natural groupings exist:
Core LLM Families:
llama/- llama, llama2_c, quantized variantsmistral/- mistral, mixtral, quantized_mistralphi/- phi, phi3, quantized variantsqwen/- qwen2, qwen3, MoE variants, VL, quantized versionsgemma/- quantized_gemma3, quantized_recurrent_gemma, paligemmampt/- mpt, quantized_mptstablelm/- quantized_stable_lm (if more variants added)t5/- t5, quantized_t5olmo/- olmo, olmo2
Vision/Multimodal:
llava/- llava variantsblip/- blip, quantized_blip, quantized_blip_textclip/- openclip, mobileclipmoondream/- moondream, quantized_moondreampixtral/- pixtral variants
Specialized Architectures:
mamba/- mamba variantsrwkv/- quantized_rwkv_v5, quantized_rwkv_v6mimi/- mimi variants
Audio/Speech:
parler_tts/- parler_tts variantsmetavoice/- metavoice, quantized_metavoice
Keep Standalone (for now):
- Single-model families or unique architectures that don't fit groups
References
- SmolLM3 PR: Add SmolLM3: Full and Quantized Implementation #3180 (demonstrates pattern)