Fix: Adjust Vultr Model Context and Output Limits Based on Empirical Testing #326

cyberofficial · 2025-10-23T22:18:01Z

This pull request corrects the [limit] configurations for five models available on the Vultr Inference API to reflect their true, empirically tested context windows. The previous configurations contained aspirational limits that would lead to runtime errors. Due to limited documentation, the previous numbers were outdated.

Methodology:

I conducted a series of automated tests against the Vultr Inference API to determine the precise, stable context window for each model. The testing script rapidly found the upper bound, then used a guided search to fine-tune and verify the maximum token limit. The new configurations are based on these stable results.

Test Results Summary:

Model Name	Verified Avg. Limit	Safe Limit (Floor)	Hard Limit (Ceiling)
`deepseek-r1-distill-qwen-32b`	130,466 tokens	130,000 tokens	131,000 tokens
`qwen2.5-coder-32b-instruct`	15,940 tokens	15,000 tokens	16,000 tokens
`deepseek-r1-distill-llama-70b`	130,466 tokens	130,000 tokens	131,000 tokens
`gpt-oss-120b`	130,530 tokens	130,000 tokens	131,000 tokens
`kimi-k2-instruct`	63,667 tokens	63,000 tokens	64,000 tokens

Proposed Changes:

The [limit] section for each model file has been updated to be friendly for coding tasks (maximizing input context while reserving a generous output buffer) and to stay within the verified total token limits.

`deepseek-r1-distill-qwen-32b.toml` & `deepseek-r1-distill-llama-70b.toml`

 [limit]
-context = 128_000
-output = 32_768
+# VERIFIED TOTAL: ~130k. Reserving 8k for output.
+context = 121_808
+output = 8_192

`gpt-oss-120b.toml`

 [limit]
-context = 128_000
-output = 131_072
+# VERIFIED TOTAL: ~130k. Reserving 8k for output.
+context = 121_808
+output = 8_192

`kimi-k2-instruct.toml`

 [limit]
-context = 128_000
-output = 16_384
+# VERIFIED TOTAL: ~63k. Reserving 4k for output.
+context = 58_904
+output = 4_096

`qwen2.5-coder-32b-instruct.toml`

 [limit]
-context = 128_000
-output = 32_768
+# VERIFIED TOTAL: ~15k. Reserving 2k for output.
+context = 12_952
+output = 2_048

Adjusted the 'context' and 'output' token limits in TOML configs for deepseek-r1-distill-llama-70b, deepseek-r1-distill-qwen-32b, gpt-oss-120b, kimi-k2-instruct, and qwen2.5-coder-32b-instruct models to reflect new capacity constraints.

Update context and output limits for Vultr models

e0dbb54

Adjusted the 'context' and 'output' token limits in TOML configs for deepseek-r1-distill-llama-70b, deepseek-r1-distill-qwen-32b, gpt-oss-120b, kimi-k2-instruct, and qwen2.5-coder-32b-instruct models to reflect new capacity constraints.

rekram1-node merged commit cbaf338 into sst:dev Oct 23, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Adjust Vultr Model Context and Output Limits Based on Empirical Testing #326

Fix: Adjust Vultr Model Context and Output Limits Based on Empirical Testing #326

Uh oh!

cyberofficial commented Oct 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix: Adjust Vultr Model Context and Output Limits Based on Empirical Testing #326

Fix: Adjust Vultr Model Context and Output Limits Based on Empirical Testing #326

Uh oh!

Conversation

cyberofficial commented Oct 23, 2025

deepseek-r1-distill-qwen-32b.toml & deepseek-r1-distill-llama-70b.toml

gpt-oss-120b.toml

kimi-k2-instruct.toml

qwen2.5-coder-32b-instruct.toml

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`deepseek-r1-distill-qwen-32b.toml` & `deepseek-r1-distill-llama-70b.toml`

`gpt-oss-120b.toml`

`kimi-k2-instruct.toml`

`qwen2.5-coder-32b-instruct.toml`