Skip to content

Reorganize Transformers Module by Model Family #3182

@DrJesseGlass

Description

@DrJesseGlass

Summary

The candle-transformers/src/models/ directory has grown to contain 70+ flat module entries, mixing full and quantized implementations of the same model families. This makes the codebase harder to navigate and maintain.

Proposal: Group related models into family subdirectories, similar to the pattern demonstrated in SmolLM3 (#3180).

Current State

The models/mod.rs currently has a flat structure:

pub mod llama;
pub mod llama2_c;
pub mod llama2_c_weights;
pub mod quantized_llama;
pub mod quantized_llama2_c;
pub mod mistral;
pub mod quantized_mistral;
pub mod mixtral;
pub mod phi;
pub mod phi3;
pub mod quantized_phi;
pub mod quantized_phi3;
pub mod qwen2;
pub mod qwen2_moe;
pub mod qwen3;
pub mod qwen3_moe;
pub mod qwen3_vl;
pub mod quantized_qwen2;
pub mod quantized_qwen3;
// ... 50+ more entries

Problems:

  • 70+ flat modules in a single directory
  • Full and quantized versions scattered
  • No clear model family grouping
  • Harder to navigate and discover related implementations
  • Difficult to see which models have quantized versions

Proposed Structure

Group models by family in subdirectories, similar to SmolLM3 (#3180):

models/
├── llama/
│   ├── mod.rs              # Re-exports for backward compatibility
│   ├── llama.rs            # Full precision
│   ├── llama2_c.rs         # Llama2.c variant
│   ├── quantized_llama.rs
│   └── quantized_llama2_c.rs
├── mistral/
│   ├── mod.rs
│   ├── mistral.rs
│   ├── mixtral.rs
│   └── quantized_mistral.rs
├── phi/
│   ├── mod.rs
│   ├── phi.rs
│   ├── phi3.rs
│   ├── quantized_phi.rs
│   └── quantized_phi3.rs
├── qwen/
│   ├── mod.rs
│   ├── qwen2.rs
│   ├── qwen2_moe.rs
│   ├── qwen3.rs
│   ├── qwen3_moe.rs
│   ├── qwen3_vl.rs
│   ├── quantized_qwen2.rs
│   └── quantized_qwen3.rs
├── smol/                   # Already implemented in #3180
│   ├── mod.rs
│   ├── smollm3.rs
│   └── quantized_smollm3.rs
└── ... other families

Benefits

Better Organization

  • Related implementations grouped together
  • Easy to see all variants of a model family
  • Clear separation between families
  • Easier to navigate codebase

Better Discoverability

  • Users can find all Llama variants in one place
  • Clear which models have quantized versions
  • Easier to compare implementations within family
  • Better for documentation generation

Backward Compatibility

  • Re-export from module for existing imports
  • No breaking changes for users
  • Can migrate incrementally

Backward Compatibility Strategy

The reorganization maintains backward compatibility through re-exports. Using the Llama family as an example:

New Directory Structure

models/llama/
├── mod.rs
├── llama.rs
├── quantized_llama.rs
└── llama2_c.rs

Re-export Pattern

In models/llama/mod.rs:

// Declare submodules
pub mod llama;
pub mod quantized_llama;
pub mod llama2_c;

// Optional: re-export everything for convenience
pub use llama::*;
pub use quantized_llama::*;
pub use llama2_c::*;

In models/mod.rs:

// New: expose the family module
pub mod llama;

// For backward compatibility: re-export submodules at top level
pub use llama::llama;
pub use llama::quantized_llama;
pub use llama::llama2_c;

Three Import Patterns (All Work!)

Pattern 1: Legacy (backward compatible)

use candle_transformers::models::llama;              // Old way still works!
use candle_transformers::models::quantized_llama;    // Old way still works!

Pattern 2: New nested (explicit)

use candle_transformers::models::llama::llama;       // New explicit way
use candle_transformers::models::llama::quantized_llama;

Pattern 3: Import whole family

use candle_transformers::models::llama::*;           // Import entire family

SmolLM3 Example

SmolLM3 (#3180) demonstrates this pattern:

Structure:

models/smol/
├── mod.rs
├── smollm3.rs
└── quantized_smollm3.rs

Current models/smol/mod.rs:

pub mod smollm3;
pub mod quantized_smollm3;

In models/mod.rs:

pub mod smol;

Migration Decision

Suggested Model Families

Based on the current modules, these natural groupings exist:

Core LLM Families:

  • llama/ - llama, llama2_c, quantized variants
  • mistral/ - mistral, mixtral, quantized_mistral
  • phi/ - phi, phi3, quantized variants
  • qwen/ - qwen2, qwen3, MoE variants, VL, quantized versions
  • gemma/ - quantized_gemma3, quantized_recurrent_gemma, paligemma
  • mpt/ - mpt, quantized_mpt
  • stablelm/ - quantized_stable_lm (if more variants added)
  • t5/ - t5, quantized_t5
  • olmo/ - olmo, olmo2

Vision/Multimodal:

  • llava/ - llava variants
  • blip/ - blip, quantized_blip, quantized_blip_text
  • clip/ - openclip, mobileclip
  • moondream/ - moondream, quantized_moondream
  • pixtral/ - pixtral variants

Specialized Architectures:

  • mamba/ - mamba variants
  • rwkv/ - quantized_rwkv_v5, quantized_rwkv_v6
  • mimi/ - mimi variants

Audio/Speech:

  • parler_tts/ - parler_tts variants
  • metavoice/ - metavoice, quantized_metavoice

Keep Standalone (for now):

  • Single-model families or unique architectures that don't fit groups

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions