Skip to content

docs: document effective token budget enforcement behavior#2774

Merged
lpcox merged 3 commits intomainfrom
docs/effective-token-budget-enforcement
May 9, 2026
Merged

docs: document effective token budget enforcement behavior#2774
lpcox merged 3 commits intomainfrom
docs/effective-token-budget-enforcement

Conversation

@lpcox
Copy link
Copy Markdown
Collaborator

@lpcox lpcox commented May 8, 2026

Summary

Documents the runtime behavior when apiProxy.maxEffectiveTokens is configured. This behavior was previously implemented but undocumented — users and downstream tools (like gh-aw) had no spec to reference for detecting or handling budget exhaustion.

Changes

Spec (docs/awf-config-spec.md)

  • Added §10 Effective Token Budget Enforcement with normative language covering:
    • §10.1 Token weighting formula (input ×1, cache-read ×0.1, output ×4, reasoning ×4)
    • §10.2 Model multiplier semantics
    • §10.3 Enforcement behavior (HTTP 429, error type effective_tokens_limit_exceeded, WebSocket rejection), with reached or exceeded (>=) wording aligned to runtime behavior
    • §10.4 Threshold tracking (50%, 75%, 90%, 95%) via thresholds_crossed
    • §10.5 Introspection via /reflect endpoint
  • Corrected /reflect example values so percent_used and thresholds_crossed are internally consistent.

Schema (docs/awf-config.schema.json + src/awf-config-schema.json)

  • Enhanced maxEffectiveTokens description to reference HTTP 429 status, error type, and spec §10
  • Enhanced modelMultipliers description to clarify default behavior and reference spec §10.2

API Proxy Sidecar docs (docs/api-proxy-sidecar.md)

  • Added comprehensive Effective token budget section with:
    • Configuration examples with model multipliers
    • Token weighting table and formula
    • Enforcement behavior and error response format (using reached or exceeded semantics)
    • Threshold tracking table
    • /reflect endpoint introspection examples
    • Detection code sample for agents/orchestrators
  • Corrected wording to match current implementation: threshold state is surfaced via /reflect; no effective_tokens_threshold log event is documented.

Motivation

  • Issue ET budget detection patterns in gh-aw#31094 don't match AWF log messages #2769 identified that gh-aw's ET budget detection patterns don't match AWF's error format — this documents the canonical format so downstream tools can align
  • Users configuring maxEffectiveTokens had no docs explaining what happens when the limit is reached
  • The weighting formula and model multiplier mechanics were only discoverable by reading source code

Testing

Documentation-only changes. Markdown lint passes for updated docs.

Add §10 to awf-config-spec.md documenting the normative behavior when
apiProxy.maxEffectiveTokens is configured:
- Token weighting formula (input ×1, cache ×0.1, output ×4, reasoning ×4)
- Model multiplier semantics
- HTTP 429 rejection with error type 'effective_tokens_limit_exceeded'
- WebSocket rejection behavior
- Threshold warning emissions (50%, 75%, 90%, 95%)
- /reflect endpoint introspection schema

Update schema descriptions in both docs/ and src/ schemas to reference
the enforcement behavior, HTTP 429 status, and spec section.

Add comprehensive 'Effective token budget' section to api-proxy-sidecar.md
with configuration examples, enforcement details, detection patterns,
and introspection instructions.

Relates-to: #2769

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@lpcox lpcox requested a review from Mossaka as a code owner May 8, 2026 23:46
Copilot AI review requested due to automatic review settings May 8, 2026 23:46
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Documentation Preview

Documentation build failed for this PR. View logs.

Built from commit 509deb2

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR documents the API proxy’s effective token budget behavior (apiProxy.maxEffectiveTokens) across the config spec, schemas, and sidecar documentation, so downstream tools (e.g., gh-aw) have a canonical reference for enforcement semantics and the 429 error format.

Changes:

  • Added a new normative spec section describing effective-token weighting, model multipliers, enforcement, and /reflect introspection.
  • Updated both schema JSON files’ field descriptions to reference HTTP 429 + error type and point to spec §10.
  • Expanded the api-proxy sidecar docs with configuration examples, weighting details, enforcement behavior, and detection guidance.
Show a summary per file
File Description
src/awf-config-schema.json Updated descriptions for maxEffectiveTokens and modelMultipliers to reference 429/error type and spec §10.
docs/awf-config.schema.json Kept docs schema in sync with the source schema description updates.
docs/awf-config-spec.md Added new normative §10 covering effective token budget enforcement and /reflect semantics.
docs/api-proxy-sidecar.md Added an “Effective token budget” section with examples, enforcement behavior, and client detection guidance.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

docs/awf-config-spec.md:420

  • §10.4 states the proxy SHOULD emit structured log warnings at 50/75/90/95%. In the current implementation, thresholds are tracked in-memory (for /reflect) but no log event is emitted on threshold crossings (there’s no logRequest call for thresholds). Either remove/soften this logging requirement or implement the warning logs to match the spec.
The proxy SHOULD emit structured log warnings when the cumulative effective
tokens cross the following percentage thresholds of `maxEffectiveTokens`:

| Threshold | Log level |
|-----------|-----------|
| 50% | `warn` |
| 75% | `warn` |
| 90% | `warn` |
| 95% | `warn` |

Each threshold MUST be emitted at most once per run.
  • Files reviewed: 4/4 changed files
  • Comments generated: 3

Comment thread docs/awf-config-spec.md Outdated
"total_effective_tokens": 456.78,
"remaining_effective_tokens": 543.22,
"percent_used": 45.68,
"thresholds_crossed": [50]
Comment thread docs/awf-config-spec.md
Comment on lines +381 to +386
2. **Pre-request check**: Before forwarding each subsequent request to the
upstream provider, the proxy checks whether the cumulative total has
reached or exceeded `maxEffectiveTokens`.

3. **Rejection**: When the budget is exceeded, the proxy MUST reject the
request with:
Comment thread docs/api-proxy-sidecar.md Outdated
Comment on lines +628 to +637
The proxy emits structured log warnings as usage approaches the limit:

| Threshold | Warning emitted |
|-----------|-----------------|
| 50% | Yes (once) |
| 75% | Yes (once) |
| 90% | Yes (once) |
| 95% | Yes (once) |

These appear in the api-proxy container logs as `effective_tokens_threshold` events.
@github-actions

This comment has been minimized.

@lpcox
Copy link
Copy Markdown
Collaborator Author

lpcox commented May 8, 2026

@copilot address the review feedback

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI commented May 8, 2026

@copilot address the review feedback

Addressed in commit 904fa19. Updated the spec and sidecar docs to use “reached or exceeded” semantics, fixed the /reflect example thresholds to match percent_used, and replaced threshold log-event wording with threshold tracking via /reflect.

Copilot finished work on behalf of lpcox May 8, 2026 23:54
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Add 'Runtime JSONL Schemas' section referencing schemas/audit.schema.json
and schemas/token-usage.schema.json with:
- Schema-to-JSONL-file mapping table
- Versioning policy (_schema wire field, prefix matching)
- Published locations (versioned release assets + latest main branch URLs)
- Link to schemas/README.md in Informative References

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Smoke Test Results

✅ GitHub MCP: Last 2 merged PRs retrieved
✅ Playwright: Page title contains 'GitHub'
✅ File Writing: Test file created
✅ Bash Verification: File read successful

Status: PASS

💥 [THE END] — Illustrated by Smoke Claude

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Smoke Test Codex: FAIL
PR titles: fix: align ET budget error strings with gh-aw detection patterns; refactor: remove dead exports flagged by export audit
✅ GitHub PR review
❌ SafeInputs GH CLI: safeinputs-gh missing
✅ Playwright title contains GitHub
❌ Tavily search: no tools exposed
✅ File write + bash readback
✅ Discussion comment + AWF build
Overall status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Chroot Smoke Test Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3 ❌ NO
Node.js v24.14.1 v20.20.2 ❌ NO
Go go1.22.12 go1.22.12 ✅ YES

Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot environment.

Tested by Smoke Chroot

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Smoke Test Results

Check Result
Redis PING ❌ timeout (no response)
PostgreSQL pg_isready ❌ no response
PostgreSQL SELECT 1 ❌ connection failed

Overall: FAILhost.docker.internal is not reachable from this runner environment. Service containers may not be running or the host alias is not resolving.

🔌 Service connectivity validated by Smoke Services

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx passed ✅ PASS
Node.js execa passed ✅ PASS
Node.js p-limit passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #2774 · ● 924.9K ·

@lpcox lpcox merged commit 3708f24 into main May 9, 2026
65 of 70 checks passed
@lpcox lpcox deleted the docs/effective-token-budget-enforcement branch May 9, 2026 00:20
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

🔥 Smoke Test: Copilot BYOK (Offline) Mode

Test Result
1. GitHub MCP ✅ PR #2774 retrieved ("docs: document effective token budget enforcement behavior")
2. GitHub.com HTTP ⚠️ Pre-step output not substituted (${{ steps.smoke-data.outputs.SMOKE_HTTP_CODE }})
3. File write/read ⚠️ Pre-step output not substituted — file path unavailable
4. BYOK inference ✅ Responding via api-proxy → api.githubcopilot.com

Running in BYOK offline mode (COPILOT_OFFLINE=true) via api-proxy → api.githubcopilot.com.

Overall: PARTIAL — workflow template variables were not substituted; tests 2 & 3 cannot be verified. Author: @lpcox.

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

🤖 Smoke Test Results

Test Status
GitHub MCP connectivity
GitHub.com HTTP connectivity
File write/read

PR: docs: document effective token budget enforcement behavior (@lpcox)

Overall: PASS

📰 BREAKING: Report filed by Smoke Copilot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants