You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[levanter] Add generation_config.json support for chat model checkpoints (#4160)
## Summary
- Add `hf_generation_eos_token_ids` config field to `SimpleDPOConfig`,
`SimpleSFTConfig`, `SimpleTrainConfig`, `TrainDpoConfig`, and
`TrainLmConfig`
- When set (e.g. `[128001, 128009]`), write a validated
`generation_config.json` alongside HF checkpoints so vLLM stops on the
right tokens for chat models
- `config.json` is unchanged — pretraining checkpoints are unaffected
- New shared helper `levanter/utils/hf_export.py` with
`build_generation_config()` for validation/normalization
- `LLAMA3_CHAT_STOP_TOKEN_IDS` constant in `experiments/llama.py`
Replaces #4154 (closed). Does **not** modify the tokenizer's `eos_token`
or override `eos_token_id` in `config.json`.
Fixes#4153Fixes#4159
## Test plan
- [x] 14 unit tests in `test_hf_export.py` — validation, dedup, sort,
auto-add EOS, error cases
- [x] `./infra/pre-commit.py --all-files --fix` passes
- [x] Pre-commit hooks pass on commit
- [x] Verify `generation_config.json` is written when
`hf_generation_eos_token_ids=[128001, 128009]` is set on a DPO run
- [ ] Verify no `generation_config.json` when field is `None` (default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/tutorials/train-dpo.md
+38Lines changed: 38 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -79,6 +79,44 @@ dpo_config = SimpleDPOConfig(
79
79
|`model_name_or_path`| HuggingFace model to initialize the policy from. Also used as the reference model unless `reference_model_path` is set separately. |
80
80
|`reference_model_path`| Path to the reference model. Defaults to `model_name_or_path`. |
81
81
|`validation_split_fraction`| Fraction of training data to hold out for validation (default 0.1). Set to `None` to use a separate validation set. |
82
+
|`hf_generation_eos_token_ids`| List of token IDs to write to `generation_config.json` for inference stop conditions. See below. |
83
+
84
+
### Setting Generation Stop Tokens
85
+
86
+
Chat models use a turn-boundary token (e.g. `<|eot_id|>`) to end assistant
87
+
responses, but the tokenizer's `eos_token` is typically the pre-training
88
+
document boundary (`<|end_of_text|>`). Inference tools like vLLM need both
89
+
tokens as stop conditions.
90
+
91
+
Set `hf_generation_eos_token_ids` to write a `generation_config.json` alongside
92
+
each saved checkpoint. The tokenizer's `eos_token_id` is auto-added if not
93
+
already in the list.
94
+
95
+
For Llama 3 models, use the predefined constant:
96
+
97
+
```python
98
+
from experiments.llama importLLAMA3_CHAT_STOP_TOKEN_IDS
0 commit comments