fix: remove codex_spark quota gating causing 503 for plus plan accounts#513
fix: remove codex_spark quota gating causing 503 for plus plan accounts#513nguyendkn wants to merge 1 commit into
Conversation
Plus plan accounts do not return `additional_rate_limits` from the ChatGPT upstream API, leaving `additional_usage_history` empty. When the `codex_spark` entry exists in the additional quota registry, the load balancer blocks all requests for `gpt-5.3-codex-spark` with 503 "No fresh additional quota data available", even though the accounts have valid usage quota. This removes the `codex_spark` entry from the registry so that plus plan accounts are not gated by missing additional quota data. Also adds `./config:/app/config` volume mount to docker-compose.yml so that config changes persist across container rebuilds. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thanks @nguyendkn for digging into this -- the root cause analysis is correct and the report ( Why I don't want to merge as-isRemoving Concretely, after this PR:
The PR body itself already names the right alternative (option 1 in your "Alternative approaches considered"):
That is the fix we want, with a small refinement so the behavior is plan-aware rather than "if data unavailable, just pass". Proposed fix shapeIn
The split could be driven by a small allow-list on the registry entry itself, e.g.: {
"quota_key": "codex_spark",
"display_label": "GPT-5.3-Codex-Spark",
"model_ids": ["gpt-5.3-codex-spark"],
"limit_name_aliases": ["codex_other", "GPT-5.3-Codex-Spark", "gpt-5.3-codex-spark"],
"metered_feature_aliases": ["codex_bengalfox"],
"applies_to_plans": ["pro", "prolite", "team", "business", "enterprise"]
}Then That keeps:
On the docker-compose changeUnrelated to the registry fix but useful on its own. Could you split that into a separate small PR? It's a one-line change and we can land it without entangling it with the eligibility-classifier work. Tests we'd want
Not closing thisI'm leaving the PR open. Happy to take a new commit in this same branch, or pick it up as a follow-up PR -- whichever is easier on your end. Either way, thanks for catching the underlying issue and writing it up so clearly; the analysis is what made the right fix shape obvious. Holding the merge for now. |
|
Thanks @nguyendkn for the deep root-cause write-up — the diagnosis is exactly right: That part is correct and well-explained. The proposed fix, though, has a regression I'd like to address before merging. Concern with the current shapeRemoving the For those accounts, the gate is currently doing real work:
With the entry removed, a Pro deployment serving
That's a real regression for Pro/Team users — the test plan's last unchecked box is exactly this. Suggested fix: tighten
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 554be182a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "metered_feature_aliases": ["codex_bengalfox"] | ||
| } | ||
| ] | ||
| [] |
There was a problem hiding this comment.
Keep codex_spark mapped for accounts with extra quota
Replacing the registry with [] removes the only model_ids -> quota_key mapping, so _gated_limit_name_for_model() now returns None for gpt-5.3-codex-spark and _filter_accounts_for_additional_limit() is never applied in app/modules/proxy/load_balancer.py. In deployments that include Pro/Team accounts (where upstream does return additional_rate_limits for this feature), routing will ignore additional-quota exhaustion and continue selecting accounts that should be gated, leading to avoidable upstream rate-limit failures instead of local eligibility filtering.
Useful? React with 👍 / 👎.
|
Maintainer cleanup: closing this as stale / no longer merge-ready against the current This PR has been open across substantial main-branch churn and is currently carrying at least one stale signal (conflicts/blocked or failing checks, outstanding If the change is still needed, please open a fresh, focused PR rebased on current |
|
Revived on a maintainer-owned branch as #751 because this PR head has maintainerCanModify=false and cannot be updated here. Keeping credit to the original implementation in the replacement PR. |
Summary
codex_sparkentry fromadditional_quota_registry.jsonto fix 503 errors for plus plan accounts./config:/app/configvolume mount todocker-compose.ymlso config changes persist across container rebuildsIssue
Plus plan accounts do not return
additional_rate_limitsfrom the ChatGPT upstream API (/backend-api/wham/usage). The upstream response hasadditional_rate_limits: Nonefor plus plan users.When the
codex_sparkentry exists in the additional quota registry, the load balancer's_filter_accounts_for_additional_limit()method checksadditional_usage_historyfor fresh quota data. Since plus plan accounts never populate this table (the usage refresh scheduler only writes toadditional_usage_historywhenpayload.additional_rate_limitsis not None/empty), all accounts are classified as"data_unavailable"by_additional_quota_eligibility().This results in a 503 error for every request using
gpt-5.3-codex-spark:Even though the accounts have valid primary/secondary usage quota and could serve the request, the additional quota gate blocks them entirely because no
additional_usage_historyrows exist.Root Cause
The
codex_sparkquota registry entry assumes that upstream accounts returnadditional_rate_limitsdata, but plus plan accounts do not. This makes the additional quota gate impossible to satisfy for deployments that only have plus plan accounts.Fix
Remove the
codex_sparkentry from the registry so thatgpt-5.3-codex-sparkis no longer subject to additional quota gating. This allows plus plan accounts to route requests for this model using their standard usage quota.Alternative Approaches Considered
_additional_quota_eligibility()to return"eligible"instead of"data_unavailable"when noadditional_usage_historyrows exist at all. This is a more invasive change and changes the semantics of the eligibility check.Test Plan
gpt-5.5requests succeed (200) after removing the registry entrygpt-5.3-codex-sparkquota key returnsNoneafter clearing registryadditional_rate_limits🤖 Generated with Claude Code