fix: remove codex_spark quota gating causing 503 for plus plan accounts by nguyendkn · Pull Request #513 · Soju06/codex-lb

nguyendkn · 2026-04-29T04:00:24Z

Summary

Remove codex_spark entry from additional_quota_registry.json to fix 503 errors for plus plan accounts
Add ./config:/app/config volume mount to docker-compose.yml so config changes persist across container rebuilds

Issue

Plus plan accounts do not return additional_rate_limits from the ChatGPT upstream API (/backend-api/wham/usage). The upstream response has additional_rate_limits: None for plus plan users.

When the codex_spark entry exists in the additional quota registry, the load balancer's _filter_accounts_for_additional_limit() method checks additional_usage_history for fresh quota data. Since plus plan accounts never populate this table (the usage refresh scheduler only writes to additional_usage_history when payload.additional_rate_limits is not None/empty), all accounts are classified as "data_unavailable" by _additional_quota_eligibility().

This results in a 503 error for every request using gpt-5.3-codex-spark:

Blocked gated model routing model=gpt-5.3-codex-spark limit_name=codex_spark reason=additional_quota_data_unavailable freshness_since=2026-04-29T03:31:41.106110 candidate_accounts=9 fresh_accounts=0

Even though the accounts have valid primary/secondary usage quota and could serve the request, the additional quota gate blocks them entirely because no additional_usage_history rows exist.

Root Cause

The codex_spark quota registry entry assumes that upstream accounts return additional_rate_limits data, but plus plan accounts do not. This makes the additional quota gate impossible to satisfy for deployments that only have plus plan accounts.

Fix

Remove the codex_spark entry from the registry so that gpt-5.3-codex-spark is no longer subject to additional quota gating. This allows plus plan accounts to route requests for this model using their standard usage quota.

Alternative Approaches Considered

Make additional quota gating optional when data is unavailable — Would require changes to _additional_quota_eligibility() to return "eligible" instead of "data_unavailable" when no additional_usage_history rows exist at all. This is a more invasive change and changes the semantics of the eligibility check.
Add a config flag to disable additional quota gating — Would add complexity for a case that only affects plus plan accounts.
Upgrade accounts to pro/team — Not always feasible for self-hosted deployments.

Test Plan

Verified gpt-5.5 requests succeed (200) after removing the registry entry
Verified gpt-5.3-codex-spark quota key returns None after clearing registry
Verified config persists across container rebuilds with the new volume mount
Confirm no regression for pro/team plan accounts that do return additional_rate_limits

🤖 Generated with Claude Code

Plus plan accounts do not return `additional_rate_limits` from the ChatGPT upstream API, leaving `additional_usage_history` empty. When the `codex_spark` entry exists in the additional quota registry, the load balancer blocks all requests for `gpt-5.3-codex-spark` with 503 "No fresh additional quota data available", even though the accounts have valid usage quota. This removes the `codex_spark` entry from the registry so that plus plan accounts are not gated by missing additional quota data. Also adds `./config:/app/config` volume mount to docker-compose.yml so that config changes persist across container rebuilds. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Soju06 · 2026-05-11T02:08:45Z

Thanks @nguyendkn for digging into this -- the root cause analysis is correct and the report (additional_rate_limits: None for Plus accounts → all-accounts data_unavailable → 503 on gpt-5.3-codex-spark) matches what we see in our own dev instance. The Plus-account 503 path is a real regression and we do want to land a fix for it before the next release. But the specific change in this PR is too broad for what the actual bug is, so I'd like to ask for a narrower take.

Why I don't want to merge as-is

Removing codex_spark from additional_quota_registry.json doesn't fix the eligibility check -- it removes the quota gate entirely, including for Pro / Team accounts where the gate is the only thing keeping us from over-spending on a feature with a real upstream limit.

Concretely, after this PR:

Plus accounts can now route gpt-5.3-codex-spark (good, that's the bug).
Pro / Team accounts also route gpt-5.3-codex-spark with no quota gating at all (regression). The upstream Codex Spark window is real for Pro accounts (see Prolite accounts show zero aggregate remaining credits #557 / fix(accounts): recover quota status from usage refresh #559 / fix(proxy): narrow budget-safe gate to primary usage #561 for how seriously we treat the secondary windows in general). Without registry coverage, dashboard usage badges for Codex Spark also disappear and operators can't see when they're approaching the limit.
The registry was the only place enforcing the gate; the surrounding code reads additional_quota_registry to even know that the quota key exists. So removing the entry is not a flag-off -- it's a feature-off for everyone who used to benefit from it.

The PR body itself already names the right alternative (option 1 in your "Alternative approaches considered"):

Make additional quota gating optional when data is unavailable -- Would require changes to _additional_quota_eligibility() to return "eligible" instead of "data_unavailable" when no additional_usage_history rows exist at all.

That is the fix we want, with a small refinement so the behavior is plan-aware rather than "if data unavailable, just pass".

Proposed fix shape

In app/modules/proxy/load_balancer.py::_additional_quota_eligibility() (or whichever spot owns the data_unavailable classification), distinguish two cases:

Account plan known not to populate additional_rate_limits (today: Plus, Free, and any plan upstream does not return additional_rate_limits for) → classify as eligible (or a new not_subject_to_quota state) and let the request through using the primary/secondary quota the proxy already tracks. Route as usual.
Account plan that does populate additional_rate_limits, but rows are missing or stale (today: Pro / Team with an empty additional_usage_history) → keep the current data_unavailable behavior so we don't burn Pro accounts at an undefined rate. The eligibility check should not relax just because the table happens to be empty for a Pro account; that case usually means the usage refresh hasn't run yet.

The split could be driven by a small allow-list on the registry entry itself, e.g.:

{
  "quota_key": "codex_spark",
  "display_label": "GPT-5.3-Codex-Spark",
  "model_ids": ["gpt-5.3-codex-spark"],
  "limit_name_aliases": ["codex_other", "GPT-5.3-Codex-Spark", "gpt-5.3-codex-spark"],
  "metered_feature_aliases": ["codex_bengalfox"],
  "applies_to_plans": ["pro", "prolite", "team", "business", "enterprise"]
}

Then _additional_quota_eligibility() returns not_applicable (treated as eligible) for any account whose plan is not in applies_to_plans, and keeps the existing data_unavailable / eligible / exhausted classification for plans that should have additional-rate-limits rows.

That keeps:

the gate working for Pro / Team where it actually represents real upstream spend,
the dashboard / metering visible for accounts that have the data,
Plus accounts routing without ever being blocked by an empty additional-rate-limits table,
the config file as the single source of truth (no special-case code paths per plan).

On the docker-compose change

Unrelated to the registry fix but useful on its own. Could you split that into a separate small PR? It's a one-line change and we can land it without entangling it with the eligibility-classifier work.

Tests we'd want

_additional_quota_eligibility() returns not_applicable / eligible for Plus account against codex_spark even with empty additional_usage_history.
_additional_quota_eligibility() returns data_unavailable for Pro account against codex_spark with empty additional_usage_history (no behavior change for the case the original gate cares about).
_additional_quota_eligibility() returns exhausted for Pro account once the row exists and shows the quota at 100%.
Integration: gpt-5.3-codex-spark routes successfully on a Plus-only deployment.
Integration: gpt-5.3-codex-spark 503s for a Pro deployment whose additional_usage_history is empty (the original safety net).

Not closing this

I'm leaving the PR open. Happy to take a new commit in this same branch, or pick it up as a follow-up PR -- whichever is easier on your end. Either way, thanks for catching the underlying issue and writing it up so clearly; the analysis is what made the right fix shape obvious.

Holding the merge for now.

Soju06 · 2026-05-15T11:01:05Z

Thanks @nguyendkn for the deep root-cause write-up — the diagnosis is exactly right: _additional_quota_eligibility returns data_unavailable for accounts that never populated additional_usage_history (which is the entire Plus tier today), the load balancer treats that as a hard block, and the request 503s with additional_quota_data_unavailable.

That part is correct and well-explained. The proposed fix, though, has a regression I'd like to address before merging.

Concern with the current shape

Removing the codex_spark entry from config/additional_quota_registry.json entirely solves the Plus-only deployment case but disables the additional-quota gate for everyone, including Pro / Team / Business accounts that do populate additional_usage_history with real codex-spark quota windows.

For those accounts, the gate is currently doing real work:

It detects when a Pro account's codex-spark window is exhausted (primary_entry / secondary_entry past 100%).
It excludes those accounts from gpt-5.3-codex-spark routing until the window resets.
It surfaces the situation in dashboards / request logs as additional_quota_data_unavailable vs quota_exhausted.

With the entry removed, a Pro deployment serving gpt-5.3-codex-spark would:

Route to a codex-spark-exhausted account.
Get the upstream usage_limit_reached envelope.
Failover, retry, possibly hit the same condition on the next account.

That's a real regression for Pro/Team users — the test plan's last unchecked box is exactly this.

Suggested fix: tighten `data_unavailable` instead of removing the entry

The cleaner shape — and the one your "Alternative Approaches Considered #1" gestures at — is to teach _additional_quota_eligibility to distinguish:

The registry expects this quota for this account's plan, and we have history rows but they're stale → still data_unavailable (today's behavior, real freshness signal worth preserving).
The registry expects this quota, but the account's plan never populates additional_usage_history for it → return eligible (no real signal to apply), with the route falling back to the primary/secondary quota gate that's already there.

Concretely, the check at app/modules/proxy/load_balancer.py:1288:

if latest_primary_entry is None and latest_secondary_entry is None:
    return "data_unavailable"

can be plan-aware. The metadata to drive that decision already exists — the registry entry can carry an explicit list of plans that produce additional_rate_limits for the quota, and the eligibility check can short-circuit to eligible when the account's plan isn't in that list (instead of failing closed).

Sketch of the registry entry shape:

{
  "quota_key": "codex_spark",
  "display_label": "GPT-5.3-Codex-Spark",
  "model_ids": ["gpt-5.3-codex-spark"],
  "limit_name_aliases": ["codex_other", "GPT-5.3-Codex-Spark", "gpt-5.3-codex-spark"],
  "metered_feature_aliases": ["codex_bengalfox"],
  "plan_visibility": ["pro", "team", "business", "enterprise"]
}

And in _additional_quota_eligibility:

if plan_visibility is not None and account_plan not in plan_visibility:
    return "eligible"  # plan doesn't surface this quota at all
if latest_primary_entry is None and latest_secondary_entry is None:
    return "data_unavailable"

This keeps Pro/Team behavior unchanged, fixes the Plus-only deployment case, and leaves the registry as the SSOT for which quotas apply to which plans.

Smaller separable issue

The ./config:/app/config volume mount in docker-compose.yml is unrelated to the quota gate fix — it's a separate "config edits should survive container rebuild" change. Even if we land it, it should be in its own PR with its own commit message so the changelog stays clean. Mind splitting that out?

Open question

Is there a strong reason to keep the original "remove the entry entirely" shape if Pro/Team users are present in your deployment, or is the Plus-only fix scope something you'd be OK widening to the plan-aware classifier?

Either path is fine — I just want the regression for Pro/Team to be explicit one way or the other. If you'd prefer not to expand scope, we can also park this PR and I can open a separate plan-aware-classifier PR that lands the Plus-only fix without the registry change.

Komzpa · 2026-05-16T17:52:30Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 554be182a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T17:55:26Z

-    "metered_feature_aliases": ["codex_bengalfox"]
-  }
-]
+[]


Keep codex_spark mapped for accounts with extra quota

Replacing the registry with [] removes the only model_ids -> quota_key mapping, so _gated_limit_name_for_model() now returns None for gpt-5.3-codex-spark and _filter_accounts_for_additional_limit() is never applied in app/modules/proxy/load_balancer.py. In deployments that include Pro/Team accounts (where upstream does return additional_rate_limits for this feature), routing will ignore additional-quota exhaustion and continue selecting accounts that should be gated, leading to avoidable upstream rate-limit failures instead of local eligibility filtering.

Useful? React with 👍 / 👎.

Soju06 · 2026-05-21T06:10:14Z

Maintainer cleanup: closing this as stale / no longer merge-ready against the current main.

This PR has been open across substantial main-branch churn and is currently carrying at least one stale signal (conflicts/blocked or failing checks, outstanding codex: needs work, draft/old branch state, or overlap with newer/current implementation). Keeping it open is making the PR queue harder to reason about.

If the change is still needed, please open a fresh, focused PR rebased on current main with green CI and a clean Codex review.

Komzpa · 2026-05-21T13:19:59Z

Revived on a maintainer-owned branch as #751 because this PR head has maintainerCanModify=false and cannot be updated here. Keeping credit to the original implementation in the replacement PR.

Komzpa mentioned this pull request May 6, 2026

Tested live aggregate stack and suggested merge order #554

Closed

chatgpt-codex-connector Bot reviewed May 16, 2026

View reviewed changes

Komzpa added 🤖 codex: needs work [@codex review] raised an issue 🤖 codex: ok [@codex review] says no issues found. and removed 🤖 codex: needs work [@codex review] raised an issue 🤖 codex: ok [@codex review] says no issues found. labels May 16, 2026

Soju06 closed this May 21, 2026

Komzpa added the stale No response from reporter; scheduled for close label May 21, 2026

Komzpa mentioned this pull request May 21, 2026

fix(proxy): make Codex Spark quota gating plan-aware #751

Merged

Komzpa added superseded Replaced by another PR, issue track, or merged implementation and removed 🤖 codex: needs work [@codex review] raised an issue labels May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove codex_spark quota gating causing 503 for plus plan accounts#513

fix: remove codex_spark quota gating causing 503 for plus plan accounts#513
nguyendkn wants to merge 1 commit into
Soju06:mainfrom
nguyendkn:fix/remove-codex-spark-quota-gating

nguyendkn commented Apr 29, 2026

Uh oh!

Soju06 commented May 11, 2026

Uh oh!

Soju06 commented May 15, 2026

Uh oh!

Komzpa commented May 16, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Uh oh!

Soju06 commented May 21, 2026

Uh oh!

Komzpa commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nguyendkn commented Apr 29, 2026

Summary

Issue

Root Cause

Fix

Alternative Approaches Considered

Test Plan

Uh oh!

Soju06 commented May 11, 2026

Why I don't want to merge as-is

Proposed fix shape

On the docker-compose change

Tests we'd want

Not closing this

Uh oh!

Soju06 commented May 15, 2026

Concern with the current shape

Suggested fix: tighten data_unavailable instead of removing the entry

Smaller separable issue

Open question

Uh oh!

Komzpa commented May 16, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Soju06 commented May 21, 2026

Uh oh!

Komzpa commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Suggested fix: tighten `data_unavailable` instead of removing the entry