Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .agents/skills
123 changes: 123 additions & 0 deletions .claude/skills/xtuner-sync-supported-models/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
name: xtuner-sync-supported-models
description: Synchronize xtuner's supported model documentation (docs/en/pretrain_sft/advanced_tutorial/model.md and docs/zh_cn/pretrain_sft/advanced_tutorial/model.md) with the actual Config classes defined under xtuner/v1/model/. Use when (1) new TransformerConfig, MoEConfig, or BaseComposeConfig subclasses are added, removed, or renamed in xtuner/v1/model/, (2) existing model configs change their inheritance hierarchy, scale, or HuggingFace counterpart, or (3) a code review or user request points out that model.md is out of sync with the codebase.
---

# Update XTuner Supported Model Docs

Keep the English and Chinese `model.md` files synchronized with the actual Config classes in `xtuner/v1/model/`.

## Scan the Codebase

Run the bundled scan script from the xtuner project root to discover all Config classes and their inheritance:

```bash
python3 .agents/skills/xtuner-sync-supported-models/scripts/scan_model_configs.py
```

The script outputs JSON with two keys:
- `configs`: list of every `*Config` class under `xtuner/v1/model/` with its parent classes and file path
- `children`: parent-to-children mapping for the hierarchy tree

## What to Update

Compare the script output against the two files:
- `docs/en/pretrain_sft/advanced_tutorial/model.md`
- `docs/zh_cn/pretrain_sft/advanced_tutorial/model.md`

Both files share the same structure and must stay in sync:

1. **Base Config Classes** — configs that directly inherit from `TransformerConfig` (or `MoEConfig`) and provide a `from_hf` classmethod for loading HuggingFace weights
2. **Concrete Model Configs** — fixed-scale subclasses of the base configs above
3. **Compose Models** — multimodal configs that inherit from `BaseComposeConfig`
4. **Inheritance Hierarchy** — a text tree showing the full `XTunerBaseModelConfig` hierarchy

### Rules for the Base Config table

Include these direct descendants of `TransformerConfig`/`MoEConfig`:
- `Qwen2DenseConfig`
- `Qwen3DenseConfig`
- `DeepSeekV3Config`
- `GptOssConfig`
- `Qwen3MoEConfig`

Exclude from the base table:
- `MoEConfig` — it is an intermediate base class, not a usable model family
- `Qwen3_5_VLTextMoEConfig` — it is an intermediate base with only one concrete child; its child `Qwen3_5_VLTextMoE35BA3BConfig` belongs under the MoE concrete table

### Rules for the Concrete Model table

Include every concrete subclass that has fixed parameter defaults. For each row note:
- `Config Class`
- `Base Class / Family`
- `Architecture Type`: `Dense`, `MoE`, `Dense (VL backbone)`, `MoE (VL backbone)`
- `Scale / Notes`: parameter count or total/activated size; for VL backbones note "for multimodal"

`DeepSeekV3Config` appears here even though it has no separate base entry (it is both base and concrete).

### Rules for the Compose Models section

Include three sub-tables:
1. **Compose Base Config Classes** — `Qwen3VLBaseConfig`, `InternVLBaseConfig`, `InternS1BaseConfig`
- `Qwen3VLBaseConfig`: VL model based on Qwen3 text backbone
- `InternVLBaseConfig`: VL model based on InternViT + Qwen3
- `InternS1BaseConfig`: Science multimodal model based on InternViT + Qwen3
2. **Concrete Compose Model Configs** — every subclass of the above bases; for each row note the wrapped `Text Config` and scale

### Rules for the Inheritance Hierarchy tree

Rebuild the tree from `XTunerBaseModelConfig` with two top-level branches:

```text
XTunerBaseModelConfig
├── TransformerConfig
│ ├── Dense Models
│ │ ├── Qwen2DenseConfig
│ │ │ └── Qwen2Dense7BConfig
│ │ └── Qwen3DenseConfig
│ │ ├── Qwen3Dense8BConfig
│ │ ├── Qwen3Dense4BConfig
│ │ ├── Qwen3Dense0P6BConfig
│ │ ├── Qwen3VLTextDense4BConfig
│ │ └── Qwen3VLTextDense8BConfig
│ └── MoE Models (via MoEConfig)
│ ├── DeepSeekV3Config
│ ├── GptOssConfig
│ │ ├── GptOss21BA3P6Config
│ │ └── GptOss117BA5P8Config
│ ├── Qwen3MoEConfig
│ │ ├── Qwen3MoE30BA3Config
│ │ ├── Qwen3MoE235BA22Config
│ │ ├── Qwen3MoEFoPEConfig
│ │ ├── Qwen3VLTextMoE30BA3Config
│ │ └── Qwen3VLTextMoE235BA22Config
│ └── Qwen3_5_VLTextMoEConfig
│ └── Qwen3_5_VLTextMoE35BA3BConfig
└── BaseComposeConfig
├── Qwen3VLBaseConfig
│ ├── Qwen3VLMoE30BA3Config
│ ├── Qwen3VLMoE235BA22Config
│ ├── Qwen3VLDense4BConfig
│ ├── Qwen3VLDense8BConfig
│ └── Qwen3_5_BaseConfig
│ └── Qwen3_5_VLMoE35BA3Config
├── InternVLBaseConfig
│ ├── InternVL3P5Dense8BConfig
│ ├── InternVL3P5MoE30BA3Config
│ └── InternVL3P5Dense1BConfig
└── InternS1BaseConfig
├── InternS1Config
└── InternS1MiniConfig
```

When new configs are added, insert them into the appropriate branch following the same indentation style.

## Translation Notes

Keep the Chinese `model.md` (`docs/zh_cn/...`) structurally identical to the English one. Translate:
- Section headings
- Table header cells
- Description cells (e.g., "Image / Video + Text" → "图像/视频 + 文本")
- Scale descriptions (e.g., "~7B parameters" → "约 7B 参数", "FoPE variant" → "FoPE 变体")

Do **not** translate Config class names, file paths, or code identifiers.
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/usr/bin/env python3
"""Scan xtuner/v1/model for all Config classes and output model info as JSON."""

import json
import re
import sys
from pathlib import Path

# We care about configs that are part of the supported model hierarchy
RELEVANT_BASES = {
"TransformerConfig",
"MoEConfig",
"BaseComposeConfig",
"XTunerBaseModelConfig",
# Known intermediate/family bases
"Qwen2DenseConfig",
"Qwen3DenseConfig",
"Qwen3MoEConfig",
"Qwen3_5_VLTextMoEConfig",
"GptOssConfig",
"DeepSeekV3Config",
"Qwen3VLBaseConfig",
"Qwen3_5_BaseConfig",
"InternVLBaseConfig",
"InternS1BaseConfig",
}


def scan_file(path: Path):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Warning — missing type hints on function signatures

Per CLAUDE.md: "All new code must include type hints for function signatures (parameters and return types)."

scan_file and main are both missing return type annotations and parameter type hints:

Suggested change
def scan_file(path: Path):
def scan_file(path: Path) -> list[dict[str, str | list[str]]]:

And main should have -> None.

text = path.read_text()
# Match class definitions like: class FooConfig(BarConfig):
pattern = r"^class\s+(\w+Config)\s*\(([^)]+)\):"
results = []
for m in re.finditer(pattern, text, re.MULTILINE):
class_name = m.group(1)
parents = [p.strip() for p in m.group(2).split(",")]
results.append({"class": class_name, "parents": parents, "file": str(path)})
return results


def main():
root = Path(sys.argv[1]) if len(sys.argv) > 1 else Path(".")
model_dir = root / "xtuner" / "v1" / "model"
if not model_dir.exists():
print(f"Model directory not found: {model_dir}", file=sys.stderr)
sys.exit(1)

all_configs = []
for py_file in sorted(model_dir.rglob("*.py")):
all_configs.extend(scan_file(py_file))

# Build parent -> children map
children: dict[str, list[str]] = {}
for cfg in all_configs:
for p in cfg["parents"]:
Comment on lines +53 to +55
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit — RELEVANT_BASES set is redundant

The condition p in RELEVANT_BASES or p.endswith("Config") always takes the right branch for every class in the codebase since all relevant parent classes already end with Config. The RELEVANT_BASES set could be removed and the condition simplified to just p.endswith("Config").

if p in RELEVANT_BASES or p.endswith("Config"):
children.setdefault(p, []).append(cfg["class"])

# Deduplicate
for k in children:
children[k] = sorted(set(children[k]))

output = {
"configs": all_configs,
"children": children,
}
print(json.dumps(output, indent=2, ensure_ascii=False))


if __name__ == "__main__":
main()
109 changes: 108 additions & 1 deletion docs/en/pretrain_sft/advanced_tutorial/model.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,110 @@
# Model

Coming soon...
XTuner v1's `TrainEngine` supports a variety of Transformer architectures through different `TransformerConfig` subclasses. The documentation below summarizes the currently supported models (RL-related configs are excluded).

## Base Config Classes

The following table lists the **base config classes** that define each model family. They provide the `from_hf` interface for loading pretrained weights from HuggingFace.

| Base Config Class | Model Family | Architecture Type | HuggingFace Counterpart |
|---|---|---|---|
| `Qwen2DenseConfig` | Qwen2 Dense | Dense | `Qwen2ForCausalLM` |
| `Qwen3DenseConfig` | Qwen3 Dense | Dense | `Qwen3ForCausalLM` |
| `DeepSeekV3Config` | DeepSeek-V3 | MoE | `DeepseekV3ForCausalLM` |
| `GptOssConfig` | GPT-OSS | MoE | `GptOssForCausalLM` |
| `Qwen3MoEConfig` | Qwen3 MoE | MoE | `Qwen3MoeForCausalLM` |

## Concrete Model Configs

The following table lists the **concrete model configs** that inherit from the base classes above. Each config corresponds to a specific model scale or variant.

| Config Class | Base Class / Family | Architecture Type | Scale / Notes |
|---|---|---|---|
| `Qwen2Dense7BConfig` | `Qwen2DenseConfig` | Dense | ~7B parameters |
| `Qwen3Dense8BConfig` | `Qwen3DenseConfig` | Dense | ~8B parameters |
| `Qwen3Dense4BConfig` | `Qwen3DenseConfig` | Dense | ~4B parameters |
| `Qwen3Dense0P6BConfig` | `Qwen3DenseConfig` | Dense | ~0.6B parameters |
| `Qwen3VLTextDense4BConfig` | `Qwen3DenseConfig` | Dense (VL backbone) | ~4B parameters, for multimodal |
| `Qwen3VLTextDense8BConfig` | `Qwen3DenseConfig` | Dense (VL backbone) | ~8B parameters, for multimodal |
| `DeepSeekV3Config` | — | MoE | ~671B total / ~37B activated |
| `GptOss21BA3P6Config` | `GptOssConfig` | MoE | ~21B total / ~3.6B activated |
| `GptOss117BA5P8Config` | `GptOssConfig` | MoE | ~117B total / ~5.8B activated |
| `Qwen3MoE30BA3Config` | `Qwen3MoEConfig` | MoE | ~30B total / ~3B activated |
Comment on lines +27 to +32
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Critical — incorrect inheritance in table and tree

The "Base Class / Family" for these VL text backbone configs is listed as Qwen3DenseConfig / Qwen3MoEConfig, but the actual direct parents are the concrete configs:

Config Listed parent Actual parent
Qwen3VLTextDense4BConfig Qwen3DenseConfig Qwen3Dense4BConfig
Qwen3VLTextDense8BConfig Qwen3DenseConfig Qwen3Dense8BConfig
Qwen3VLTextMoE30BA3Config Qwen3MoEConfig Qwen3MoE30BA3Config
Qwen3VLTextMoE235BA22Config Qwen3MoEConfig Qwen3MoE235BA22Config

Source references:

The same error appears in the inheritance hierarchy tree at the bottom of this file (and in the Chinese version, the SKILL.md file).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude here we only find the base class of such config, and it can indicate the config family of the config, if you agreed with me, please resolve this conversation.

| `Qwen3MoE235BA22Config` | `Qwen3MoEConfig` | MoE | ~235B total / ~22B activated |
| `Qwen3MoEFoPEConfig` | `Qwen3MoEConfig` | MoE | FoPE (Frequency-based Position Embedding) variant |
| `Qwen3VLTextMoE30BA3Config` | `Qwen3MoEConfig` | MoE (VL backbone) | ~30B total, for multimodal |
| `Qwen3VLTextMoE235BA22Config` | `Qwen3MoEConfig` | MoE (VL backbone) | ~235B total, for multimodal |
| `Qwen3_5_VLTextMoE35BA3BConfig` | `Qwen3_5_VLTextMoEConfig` | MoE (VL backbone) | ~35B total / ~3B activated, for multimodal |

## Compose Models

In addition to pure text models, XTuner also supports **multimodal compose models** that combine a vision encoder, a projector, and a language model. These configs inherit from `BaseComposeConfig` rather than `TransformerConfig` directly, but they wrap the text configs listed above.

### Compose Base Config Classes

| Base Config Class | Model Family | Modality | Description |
|---|---|---|---|
| `Qwen3VLBaseConfig` | Qwen3-VL | Image / Video + Text | VL model based on Qwen3 text backbone |
| `InternVLBaseConfig` | InternVL | Image + Text | VL model based on InternViT + Qwen3 |
| `InternS1BaseConfig` | InternS1 | Image + Text | Science multimodal model based on InternViT + Qwen3 |

### Concrete Compose Model Configs

| Config Class | Compose Base / Family | Text Config | Scale / Notes |
|---|---|---|---|
| `Qwen3VLMoE30BA3Config` | `Qwen3VLBaseConfig` | `Qwen3VLTextMoE30BA3Config` | ~30B total, MoE VL |
| `Qwen3VLMoE235BA22Config` | `Qwen3VLBaseConfig` | `Qwen3VLTextMoE235BA22Config` | ~235B total, MoE VL |
| `Qwen3VLDense4BConfig` | `Qwen3VLBaseConfig` | `Qwen3VLTextDense4BConfig` | ~4B parameters, Dense VL |
| `Qwen3VLDense8BConfig` | `Qwen3VLBaseConfig` | `Qwen3VLTextDense8BConfig` | ~8B parameters, Dense VL |
| `Qwen3_5_VLMoE35BA3Config` | `Qwen3_5_BaseConfig` | `Qwen3_5_VLTextMoE35BA3BConfig` | ~35B total / ~3B activated, MoE VL |
| `InternVL3P5Dense8BConfig` | `InternVLBaseConfig` | `Qwen3Dense8BConfig` | ~8B parameters, Dense VL |
| `InternVL3P5MoE30BA3Config` | `InternVLBaseConfig` | `Qwen3MoE30BA3Config` | ~30B total, MoE VL |
| `InternVL3P5Dense1BConfig` | `InternVLBaseConfig` | `Qwen3Dense0P6BConfig` | ~1B parameters, Dense VL |
| `InternS1Config` | `InternS1BaseConfig` | `Qwen3MoE235BA22Config` | ~235B total, MoE multimodal |
| `InternS1MiniConfig` | `InternS1BaseConfig` | `Qwen3Dense8BConfig` | ~8B parameters, Dense multimodal |

## Inheritance Hierarchy

The following diagram shows the complete inheritance hierarchy of all config classes supported by `TrainEngine`, including both `TransformerConfig` and `BaseComposeConfig` branches.

```text
XTunerBaseModelConfig
├── TransformerConfig
│ ├── Dense Models
│ │ ├── Qwen2DenseConfig
│ │ │ └── Qwen2Dense7BConfig
│ │ └── Qwen3DenseConfig
│ │ ├── Qwen3Dense8BConfig
│ │ ├── Qwen3Dense4BConfig
│ │ ├── Qwen3Dense0P6BConfig
│ │ ├── Qwen3VLTextDense4BConfig
│ │ └── Qwen3VLTextDense8BConfig
│ └── MoE Models (via MoEConfig)
Comment on lines +72 to +82
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Critical — inheritance tree does not match actual code

The VL text backbone configs are shown as direct children of the family base, but they actually inherit from the concrete scale configs. The corrected tree should be:

Suggested change
├── TransformerConfig
│ ├── Dense Models
│ │ ├── Qwen2DenseConfig
│ │ │ └── Qwen2Dense7BConfig
│ │ └── Qwen3DenseConfig
│ │ ├── Qwen3Dense8BConfig
│ │ ├── Qwen3Dense4BConfig
│ │ ├── Qwen3Dense0P6BConfig
│ │ ├── Qwen3VLTextDense4BConfig
│ │ └── Qwen3VLTextDense8BConfig
│ └── MoE Models (via MoEConfig)
│ ├── Dense Models
│ │ ├── Qwen2DenseConfig
│ │ │ └── Qwen2Dense7BConfig
│ │ └── Qwen3DenseConfig
│ │ ├── Qwen3Dense8BConfig
│ │ │ └── Qwen3VLTextDense8BConfig
│ │ ├── Qwen3Dense4BConfig
│ │ │ └── Qwen3VLTextDense4BConfig
│ │ └── Qwen3Dense0P6BConfig

Similarly for the MoE section below, Qwen3VLTextMoE30BA3Config should be nested under Qwen3MoE30BA3Config, and Qwen3VLTextMoE235BA22Config under Qwen3MoE235BA22Config.

│ ├── DeepSeekV3Config
│ ├── GptOssConfig
│ │ ├── GptOss21BA3P6Config
│ │ └── GptOss117BA5P8Config
│ ├── Qwen3MoEConfig
│ │ ├── Qwen3MoE30BA3Config
│ │ ├── Qwen3MoE235BA22Config
│ │ ├── Qwen3MoEFoPEConfig
│ │ ├── Qwen3VLTextMoE30BA3Config
│ │ └── Qwen3VLTextMoE235BA22Config
│ └── Qwen3_5_VLTextMoEConfig
│ └── Qwen3_5_VLTextMoE35BA3BConfig
└── BaseComposeConfig
├── Qwen3VLBaseConfig
│ ├── Qwen3VLMoE30BA3Config
│ ├── Qwen3VLMoE235BA22Config
│ ├── Qwen3VLDense4BConfig
│ ├── Qwen3VLDense8BConfig
│ └── Qwen3_5_BaseConfig
│ └── Qwen3_5_VLMoE35BA3Config
├── InternVLBaseConfig
│ ├── InternVL3P5Dense8BConfig
│ ├── InternVL3P5MoE30BA3Config
│ └── InternVL3P5Dense1BConfig
└── InternS1BaseConfig
├── InternS1Config
└── InternS1MiniConfig
```
Loading
Loading