docs: document effective token budget enforcement behavior#2774
Conversation
Add §10 to awf-config-spec.md documenting the normative behavior when apiProxy.maxEffectiveTokens is configured: - Token weighting formula (input ×1, cache ×0.1, output ×4, reasoning ×4) - Model multiplier semantics - HTTP 429 rejection with error type 'effective_tokens_limit_exceeded' - WebSocket rejection behavior - Threshold warning emissions (50%, 75%, 90%, 95%) - /reflect endpoint introspection schema Update schema descriptions in both docs/ and src/ schemas to reference the enforcement behavior, HTTP 429 status, and spec section. Add comprehensive 'Effective token budget' section to api-proxy-sidecar.md with configuration examples, enforcement details, detection patterns, and introspection instructions. Relates-to: #2769 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR documents the API proxy’s effective token budget behavior (apiProxy.maxEffectiveTokens) across the config spec, schemas, and sidecar documentation, so downstream tools (e.g., gh-aw) have a canonical reference for enforcement semantics and the 429 error format.
Changes:
- Added a new normative spec section describing effective-token weighting, model multipliers, enforcement, and
/reflectintrospection. - Updated both schema JSON files’ field descriptions to reference HTTP 429 + error type and point to spec §10.
- Expanded the api-proxy sidecar docs with configuration examples, weighting details, enforcement behavior, and detection guidance.
Show a summary per file
| File | Description |
|---|---|
src/awf-config-schema.json |
Updated descriptions for maxEffectiveTokens and modelMultipliers to reference 429/error type and spec §10. |
docs/awf-config.schema.json |
Kept docs schema in sync with the source schema description updates. |
docs/awf-config-spec.md |
Added new normative §10 covering effective token budget enforcement and /reflect semantics. |
docs/api-proxy-sidecar.md |
Added an “Effective token budget” section with examples, enforcement behavior, and client detection guidance. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
docs/awf-config-spec.md:420
- §10.4 states the proxy SHOULD emit structured log warnings at 50/75/90/95%. In the current implementation, thresholds are tracked in-memory (for
/reflect) but no log event is emitted on threshold crossings (there’s nologRequestcall for thresholds). Either remove/soften this logging requirement or implement the warning logs to match the spec.
The proxy SHOULD emit structured log warnings when the cumulative effective
tokens cross the following percentage thresholds of `maxEffectiveTokens`:
| Threshold | Log level |
|-----------|-----------|
| 50% | `warn` |
| 75% | `warn` |
| 90% | `warn` |
| 95% | `warn` |
Each threshold MUST be emitted at most once per run.
- Files reviewed: 4/4 changed files
- Comments generated: 3
| "total_effective_tokens": 456.78, | ||
| "remaining_effective_tokens": 543.22, | ||
| "percent_used": 45.68, | ||
| "thresholds_crossed": [50] |
| 2. **Pre-request check**: Before forwarding each subsequent request to the | ||
| upstream provider, the proxy checks whether the cumulative total has | ||
| reached or exceeded `maxEffectiveTokens`. | ||
|
|
||
| 3. **Rejection**: When the budget is exceeded, the proxy MUST reject the | ||
| request with: |
| The proxy emits structured log warnings as usage approaches the limit: | ||
|
|
||
| | Threshold | Warning emitted | | ||
| |-----------|-----------------| | ||
| | 50% | Yes (once) | | ||
| | 75% | Yes (once) | | ||
| | 90% | Yes (once) | | ||
| | 95% | Yes (once) | | ||
|
|
||
| These appear in the api-proxy container logs as `effective_tokens_threshold` events. |
This comment has been minimized.
This comment has been minimized.
|
@copilot address the review feedback |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Add 'Runtime JSONL Schemas' section referencing schemas/audit.schema.json and schemas/token-usage.schema.json with: - Schema-to-JSONL-file mapping table - Versioning policy (_schema wire field, prefix matching) - Published locations (versioned release assets + latest main branch URLs) - Link to schemas/README.md in Informative References Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
Smoke Test Results✅ GitHub MCP: Last 2 merged PRs retrieved Status: PASS
|
This comment has been minimized.
This comment has been minimized.
|
Smoke Test Codex: FAIL Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "registry.npmjs.org"See Network Configuration for more information.
|
Chroot Smoke Test Results
Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot environment.
|
Smoke Test Results
Overall: FAIL —
|
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS
|
🔥 Smoke Test: Copilot BYOK (Offline) Mode
Running in BYOK offline mode ( Overall: PARTIAL — workflow template variables were not substituted; tests 2 & 3 cannot be verified. Author:
|
This comment has been minimized.
This comment has been minimized.
🤖 Smoke Test Results
PR: docs: document effective token budget enforcement behavior (@lpcox) Overall: PASS
|
Summary
Documents the runtime behavior when
apiProxy.maxEffectiveTokensis configured. This behavior was previously implemented but undocumented — users and downstream tools (like gh-aw) had no spec to reference for detecting or handling budget exhaustion.Changes
Spec (
docs/awf-config-spec.md)effective_tokens_limit_exceeded, WebSocket rejection), with reached or exceeded (>=) wording aligned to runtime behaviorthresholds_crossed/reflectendpoint/reflectexample values sopercent_usedandthresholds_crossedare internally consistent.Schema (
docs/awf-config.schema.json+src/awf-config-schema.json)maxEffectiveTokensdescription to reference HTTP 429 status, error type, and spec §10modelMultipliersdescription to clarify default behavior and reference spec §10.2API Proxy Sidecar docs (
docs/api-proxy-sidecar.md)/reflectendpoint introspection examples/reflect; noeffective_tokens_thresholdlog event is documented.Motivation
maxEffectiveTokenshad no docs explaining what happens when the limit is reachedTesting
Documentation-only changes. Markdown lint passes for updated docs.