Is context length extension training possible with Axolotl? #3306

cancan4 · 2025-12-08T08:52:24Z

cancan4
Dec 8, 2025

This may be a silly question, but is context length extension training possible with Axolotl? I would like to extend the maximum supported context length for a model (OLMo 2 32B) which uses RoPE, ideally using QLoRA. Currently the max context length is only 4,096, but I would like to fine-tune it on documents consisting of 32K tokens.

Answered by NanoCode012

Dec 8, 2025

Hello, you can indeed change RoPE by updating/override the model config. Say you're using this model https://huggingface.co/allenai/OLMo-2-0325-32B-SFT/blob/main/config.json :

You can set the rope_scaling similar to the Olmo3 32B model https://huggingface.co/allenai/Olmo-3-32B-Think/blob/main/config.json#L79-L94

overrides_of_model_config:
  rope_scaling:
    type: ${ROPE_SCALING_TYPE}
    factor: ${ROPE_SCALING_FACTOR}
  rope_theta: ${ROPE_THETA}

sequence_len: ${NEW_SEQ}

Make sure that: the model itself Olmo2 in transformers supports rope config too.

Curious question: why not start from Olmo3 which already has their context expanded?

View full answer

NanoCode012 · 2025-12-08T10:36:03Z

NanoCode012
Dec 8, 2025
Maintainer

Hello, you can indeed change RoPE by updating/override the model config. Say you're using this model https://huggingface.co/allenai/OLMo-2-0325-32B-SFT/blob/main/config.json :

You can set the rope_scaling similar to the Olmo3 32B model https://huggingface.co/allenai/Olmo-3-32B-Think/blob/main/config.json#L79-L94

overrides_of_model_config:
  rope_scaling:
    type: ${ROPE_SCALING_TYPE}
    factor: ${ROPE_SCALING_FACTOR}
  rope_theta: ${ROPE_THETA}

sequence_len: ${NEW_SEQ}

Make sure that: the model itself Olmo2 in transformers supports rope config too.

Curious question: why not start from Olmo3 which already has their context expanded?

2 replies

cancan4 Dec 9, 2025
Author

Thank you for answering!

I examined the pretrained models and found that I like OLMo 2's pretraining distribution better. Olmo 3's base model unfortunately is contaminated and has learned to write with cliche phrases, Elara, and other nonsense. OLMo 2 has virtually none of that in its pretraining checkpoints and is golden for creativity.

I have no idea how much this will cost given that Olmo 3 dedicated 100B tokens to its context length extension training, or even if this is the way to go, but you are giving me hope that it's possible. Truthfully I have spent hardly any time researching, but the original YaRN paper says they got the perplexity down with only a few hundred training steps, so that also gives me hope.

NanoCode012 Dec 9, 2025
Maintainer

@cancan4 , Fair enough!

I'm not sure as well on how much training is needed for context expansion. If you ever have some findings on that to share, feel free to leave it here for any future readers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is context length extension training possible with Axolotl? #3306

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Is context length extension training possible with Axolotl? #3306

Uh oh!

cancan4 Dec 8, 2025

Replies: 1 comment · 2 replies

Uh oh!

NanoCode012 Dec 8, 2025 Maintainer

Uh oh!

cancan4 Dec 9, 2025 Author

Uh oh!

NanoCode012 Dec 9, 2025 Maintainer

cancan4
Dec 8, 2025

Replies: 1 comment 2 replies

NanoCode012
Dec 8, 2025
Maintainer

cancan4 Dec 9, 2025
Author

NanoCode012 Dec 9, 2025
Maintainer