Skip to content

fix: remove codex_spark quota gating causing 503 for plus plan accounts#513

Closed
nguyendkn wants to merge 1 commit into
Soju06:mainfrom
nguyendkn:fix/remove-codex-spark-quota-gating
Closed

fix: remove codex_spark quota gating causing 503 for plus plan accounts#513
nguyendkn wants to merge 1 commit into
Soju06:mainfrom
nguyendkn:fix/remove-codex-spark-quota-gating

Conversation

@nguyendkn

Copy link
Copy Markdown

Summary

  • Remove codex_spark entry from additional_quota_registry.json to fix 503 errors for plus plan accounts
  • Add ./config:/app/config volume mount to docker-compose.yml so config changes persist across container rebuilds

Issue

Plus plan accounts do not return additional_rate_limits from the ChatGPT upstream API (/backend-api/wham/usage). The upstream response has additional_rate_limits: None for plus plan users.

When the codex_spark entry exists in the additional quota registry, the load balancer's _filter_accounts_for_additional_limit() method checks additional_usage_history for fresh quota data. Since plus plan accounts never populate this table (the usage refresh scheduler only writes to additional_usage_history when payload.additional_rate_limits is not None/empty), all accounts are classified as "data_unavailable" by _additional_quota_eligibility().

This results in a 503 error for every request using gpt-5.3-codex-spark:

Blocked gated model routing model=gpt-5.3-codex-spark limit_name=codex_spark reason=additional_quota_data_unavailable freshness_since=2026-04-29T03:31:41.106110 candidate_accounts=9 fresh_accounts=0

Even though the accounts have valid primary/secondary usage quota and could serve the request, the additional quota gate blocks them entirely because no additional_usage_history rows exist.

Root Cause

The codex_spark quota registry entry assumes that upstream accounts return additional_rate_limits data, but plus plan accounts do not. This makes the additional quota gate impossible to satisfy for deployments that only have plus plan accounts.

Fix

Remove the codex_spark entry from the registry so that gpt-5.3-codex-spark is no longer subject to additional quota gating. This allows plus plan accounts to route requests for this model using their standard usage quota.

Alternative Approaches Considered

  1. Make additional quota gating optional when data is unavailable — Would require changes to _additional_quota_eligibility() to return "eligible" instead of "data_unavailable" when no additional_usage_history rows exist at all. This is a more invasive change and changes the semantics of the eligibility check.
  2. Add a config flag to disable additional quota gating — Would add complexity for a case that only affects plus plan accounts.
  3. Upgrade accounts to pro/team — Not always feasible for self-hosted deployments.

Test Plan

  • Verified gpt-5.5 requests succeed (200) after removing the registry entry
  • Verified gpt-5.3-codex-spark quota key returns None after clearing registry
  • Verified config persists across container rebuilds with the new volume mount
  • Confirm no regression for pro/team plan accounts that do return additional_rate_limits

🤖 Generated with Claude Code

Plus plan accounts do not return `additional_rate_limits` from the
ChatGPT upstream API, leaving `additional_usage_history` empty.
When the `codex_spark` entry exists in the additional quota registry,
the load balancer blocks all requests for `gpt-5.3-codex-spark` with
503 "No fresh additional quota data available", even though the
accounts have valid usage quota.

This removes the `codex_spark` entry from the registry so that plus
plan accounts are not gated by missing additional quota data.

Also adds `./config:/app/config` volume mount to docker-compose.yml
so that config changes persist across container rebuilds.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Soju06

Soju06 commented May 11, 2026

Copy link
Copy Markdown
Owner

Thanks @nguyendkn for digging into this -- the root cause analysis is correct and the report (additional_rate_limits: None for Plus accounts → all-accounts data_unavailable → 503 on gpt-5.3-codex-spark) matches what we see in our own dev instance. The Plus-account 503 path is a real regression and we do want to land a fix for it before the next release. But the specific change in this PR is too broad for what the actual bug is, so I'd like to ask for a narrower take.

Why I don't want to merge as-is

Removing codex_spark from additional_quota_registry.json doesn't fix the eligibility check -- it removes the quota gate entirely, including for Pro / Team accounts where the gate is the only thing keeping us from over-spending on a feature with a real upstream limit.

Concretely, after this PR:

The PR body itself already names the right alternative (option 1 in your "Alternative approaches considered"):

Make additional quota gating optional when data is unavailable -- Would require changes to _additional_quota_eligibility() to return "eligible" instead of "data_unavailable" when no additional_usage_history rows exist at all.

That is the fix we want, with a small refinement so the behavior is plan-aware rather than "if data unavailable, just pass".

Proposed fix shape

In app/modules/proxy/load_balancer.py::_additional_quota_eligibility() (or whichever spot owns the data_unavailable classification), distinguish two cases:

  1. Account plan known not to populate additional_rate_limits (today: Plus, Free, and any plan upstream does not return additional_rate_limits for) → classify as eligible (or a new not_subject_to_quota state) and let the request through using the primary/secondary quota the proxy already tracks. Route as usual.
  2. Account plan that does populate additional_rate_limits, but rows are missing or stale (today: Pro / Team with an empty additional_usage_history) → keep the current data_unavailable behavior so we don't burn Pro accounts at an undefined rate. The eligibility check should not relax just because the table happens to be empty for a Pro account; that case usually means the usage refresh hasn't run yet.

The split could be driven by a small allow-list on the registry entry itself, e.g.:

{
  "quota_key": "codex_spark",
  "display_label": "GPT-5.3-Codex-Spark",
  "model_ids": ["gpt-5.3-codex-spark"],
  "limit_name_aliases": ["codex_other", "GPT-5.3-Codex-Spark", "gpt-5.3-codex-spark"],
  "metered_feature_aliases": ["codex_bengalfox"],
  "applies_to_plans": ["pro", "prolite", "team", "business", "enterprise"]
}

Then _additional_quota_eligibility() returns not_applicable (treated as eligible) for any account whose plan is not in applies_to_plans, and keeps the existing data_unavailable / eligible / exhausted classification for plans that should have additional-rate-limits rows.

That keeps:

  • the gate working for Pro / Team where it actually represents real upstream spend,
  • the dashboard / metering visible for accounts that have the data,
  • Plus accounts routing without ever being blocked by an empty additional-rate-limits table,
  • the config file as the single source of truth (no special-case code paths per plan).

On the docker-compose change

Unrelated to the registry fix but useful on its own. Could you split that into a separate small PR? It's a one-line change and we can land it without entangling it with the eligibility-classifier work.

Tests we'd want

  • _additional_quota_eligibility() returns not_applicable / eligible for Plus account against codex_spark even with empty additional_usage_history.
  • _additional_quota_eligibility() returns data_unavailable for Pro account against codex_spark with empty additional_usage_history (no behavior change for the case the original gate cares about).
  • _additional_quota_eligibility() returns exhausted for Pro account once the row exists and shows the quota at 100%.
  • Integration: gpt-5.3-codex-spark routes successfully on a Plus-only deployment.
  • Integration: gpt-5.3-codex-spark 503s for a Pro deployment whose additional_usage_history is empty (the original safety net).

Not closing this

I'm leaving the PR open. Happy to take a new commit in this same branch, or pick it up as a follow-up PR -- whichever is easier on your end. Either way, thanks for catching the underlying issue and writing it up so clearly; the analysis is what made the right fix shape obvious.

Holding the merge for now.

@Soju06

Soju06 commented May 15, 2026

Copy link
Copy Markdown
Owner

Thanks @nguyendkn for the deep root-cause write-up — the diagnosis is exactly right: _additional_quota_eligibility returns data_unavailable for accounts that never populated additional_usage_history (which is the entire Plus tier today), the load balancer treats that as a hard block, and the request 503s with additional_quota_data_unavailable.

That part is correct and well-explained. The proposed fix, though, has a regression I'd like to address before merging.

Concern with the current shape

Removing the codex_spark entry from config/additional_quota_registry.json entirely solves the Plus-only deployment case but disables the additional-quota gate for everyone, including Pro / Team / Business accounts that do populate additional_usage_history with real codex-spark quota windows.

For those accounts, the gate is currently doing real work:

  • It detects when a Pro account's codex-spark window is exhausted (primary_entry / secondary_entry past 100%).
  • It excludes those accounts from gpt-5.3-codex-spark routing until the window resets.
  • It surfaces the situation in dashboards / request logs as additional_quota_data_unavailable vs quota_exhausted.

With the entry removed, a Pro deployment serving gpt-5.3-codex-spark would:

  • Route to a codex-spark-exhausted account.
  • Get the upstream usage_limit_reached envelope.
  • Failover, retry, possibly hit the same condition on the next account.

That's a real regression for Pro/Team users — the test plan's last unchecked box is exactly this.

Suggested fix: tighten data_unavailable instead of removing the entry

The cleaner shape — and the one your "Alternative Approaches Considered #1" gestures at — is to teach _additional_quota_eligibility to distinguish:

  • The registry expects this quota for this account's plan, and we have history rows but they're stale → still data_unavailable (today's behavior, real freshness signal worth preserving).
  • The registry expects this quota, but the account's plan never populates additional_usage_history for it → return eligible (no real signal to apply), with the route falling back to the primary/secondary quota gate that's already there.

Concretely, the check at app/modules/proxy/load_balancer.py:1288:

if latest_primary_entry is None and latest_secondary_entry is None:
    return "data_unavailable"

can be plan-aware. The metadata to drive that decision already exists — the registry entry can carry an explicit list of plans that produce additional_rate_limits for the quota, and the eligibility check can short-circuit to eligible when the account's plan isn't in that list (instead of failing closed).

Sketch of the registry entry shape:

{
  "quota_key": "codex_spark",
  "display_label": "GPT-5.3-Codex-Spark",
  "model_ids": ["gpt-5.3-codex-spark"],
  "limit_name_aliases": ["codex_other", "GPT-5.3-Codex-Spark", "gpt-5.3-codex-spark"],
  "metered_feature_aliases": ["codex_bengalfox"],
  "plan_visibility": ["pro", "team", "business", "enterprise"]
}

And in _additional_quota_eligibility:

if plan_visibility is not None and account_plan not in plan_visibility:
    return "eligible"  # plan doesn't surface this quota at all
if latest_primary_entry is None and latest_secondary_entry is None:
    return "data_unavailable"

This keeps Pro/Team behavior unchanged, fixes the Plus-only deployment case, and leaves the registry as the SSOT for which quotas apply to which plans.

Smaller separable issue

The ./config:/app/config volume mount in docker-compose.yml is unrelated to the quota gate fix — it's a separate "config edits should survive container rebuild" change. Even if we land it, it should be in its own PR with its own commit message so the changelog stays clean. Mind splitting that out?

Open question

Is there a strong reason to keep the original "remove the entry entirely" shape if Pro/Team users are present in your deployment, or is the Plus-only fix scope something you'd be OK widening to the plan-aware classifier?

Either path is fine — I just want the regression for Pro/Team to be explicit one way or the other. If you'd prefer not to expand scope, we can also park this PR and I can open a separate plan-aware-classifier PR that lands the Plus-only fix without the registry change.

@Komzpa

Komzpa commented May 16, 2026

Copy link
Copy Markdown
Collaborator

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 554be182a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"metered_feature_aliases": ["codex_bengalfox"]
}
]
[]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep codex_spark mapped for accounts with extra quota

Replacing the registry with [] removes the only model_ids -> quota_key mapping, so _gated_limit_name_for_model() now returns None for gpt-5.3-codex-spark and _filter_accounts_for_additional_limit() is never applied in app/modules/proxy/load_balancer.py. In deployments that include Pro/Team accounts (where upstream does return additional_rate_limits for this feature), routing will ignore additional-quota exhaustion and continue selecting accounts that should be gated, leading to avoidable upstream rate-limit failures instead of local eligibility filtering.

Useful? React with 👍 / 👎.

@Komzpa Komzpa added 🤖 codex: needs work [@codex review] raised an issue 🤖 codex: ok [@codex review] says no issues found. and removed 🤖 codex: needs work [@codex review] raised an issue 🤖 codex: ok [@codex review] says no issues found. labels May 16, 2026
@Soju06

Soju06 commented May 21, 2026

Copy link
Copy Markdown
Owner

Maintainer cleanup: closing this as stale / no longer merge-ready against the current main.

This PR has been open across substantial main-branch churn and is currently carrying at least one stale signal (conflicts/blocked or failing checks, outstanding codex: needs work, draft/old branch state, or overlap with newer/current implementation). Keeping it open is making the PR queue harder to reason about.

If the change is still needed, please open a fresh, focused PR rebased on current main with green CI and a clean Codex review.

@Soju06 Soju06 closed this May 21, 2026
@Komzpa Komzpa added the stale No response from reporter; scheduled for close label May 21, 2026
@Komzpa

Komzpa commented May 21, 2026

Copy link
Copy Markdown
Collaborator

Revived on a maintainer-owned branch as #751 because this PR head has maintainerCanModify=false and cannot be updated here. Keeping credit to the original implementation in the replacement PR.

@Komzpa Komzpa added superseded Replaced by another PR, issue track, or merged implementation and removed 🤖 codex: needs work [@codex review] raised an issue labels May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale No response from reporter; scheduled for close superseded Replaced by another PR, issue track, or merged implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants