fix(overflow): graceful finish_reason=length; drop arbitrary 1024 default by Arthur-Ficial · Pull Request #136 · Arthur-Ficial/apfel

Arthur-Ficial · 2026-04-27T07:26:16Z

Summary

Root-cause fix for output-side context overflow. When the on-device model runs into the 4096-token ceiling after producing some content, that throw is now treated as a clean finish_reason: "length" instead of a server error. The new pure helper StreamErrorResolver decides: empty prev + overflow → still a 400 (prompt-side); non-empty prev + overflow → graceful truncation. Refusal, guardrail, decoding, rate-limit etc. stay fatal regardless.
Drop the arbitrary 1024 default. With graceful overflow in place, omitted max_tokens flows through as nil on both CLI and server. FoundationModels uses whatever room is left in the 4096-token window. This is drop-in OpenAI semantics and full window utilisation - no fallback constant. BodyLimits.defaultMaxResponseTokens is removed. Single source of truth: both surfaces pass max_tokens through unchanged.
CLI/server unified path. Non-streaming responses now route through collectStream too, so both surfaces and both modes get identical overflow handling for free.

Why

Issue #128 surfaced as "request hangs ~50 s and ends in [context overflow]". PR #129 patched the symptom with a 512-token cap. Issue #130 / PR #133 raised it to 1024 across CLI + server. Both made the cap load-bearing for correctness. The actual root cause: an output-side end condition was being reported as an exception. The 7-whys analysis is in the PR thread; the punchline is that hitting the ceiling mid-stream is OpenAI's finish_reason: "length", not an HTTP 400.

With this PR the cap is no longer load-bearing. It becomes optional - clients pass max_tokens when they want a tighter latency budget, otherwise they get the OpenAI-spec behaviour of "use what's left".

What changed

Area	File	Change
New pure logic	`Sources/Core/Chat/StreamOutcome.swift`	`StreamOutcome`, `StreamErrorResolution`, `StreamErrorResolver`
Constant removed	`Sources/Core/Chat/BodyLimits.swift`	`defaultMaxResponseTokens` deleted
Streaming helper	`Sources/Session.swift`	`collectStream` now returns `StreamOutcome`; output-side overflow → `.length`
Server streaming SSE	`Sources/Handlers.swift`	catch-block applies the resolver, emits length-finish chunk + `[DONE]`
Server non-streaming	`Sources/Handlers.swift`	routes through `collectStream` for the same semantics
CLI non-streaming	`Sources/Session.swift`	same: routes through `collectStream`
CLI surfacing	`Sources/CLI.swift`	stderr warning when `finish_reason=length`
Tracking type	`Sources/Core/ToolCallHandler.swift`	`ProcessPromptResult.finishReason`
Server SSOT	`Sources/Handlers.swift`	drops `?? BodyLimits.defaultMaxResponseTokens`
CLI SSOT	`Sources/main.swift`	drops `?? BodyLimits.defaultMaxResponseTokens`
Docs	`README.md`, `docs/openai-api-compatibility.md`	reflect the new dynamic default

Tests

Unit (TDD red → green, all in swift run apfel-tests)

New StreamErrorResolverTests.swift covering every ApfelError case × empty/non-empty prev. Only .contextOverflow with non-empty prev is graceful; everything else is fatal.
New StreamOutcome Equatable / Hashable / Sendable round-trip tests.
BodyLimitsTests.swift updated; new regression guard that source no longer references defaultMaxResponseTokens.
CLIServerParityTests.swift rewritten to assert both surfaces pass max_tokens through unchanged (no ?? <constant>).
ApfelCorePublicAPIUsageTests.swift extended to lock down the new public types.

Integration (Apple Intelligence required)

New: test_omitted_max_tokens_non_streaming_returns_200 — request without max_tokens returns HTTP 200 with usable content and finish_reason ∈ {stop, length}.
New: test_omitted_max_tokens_streaming_completes_with_done — streaming ends with [DONE], no error payload, terminal finish_reason ∈ {stop, length}.
New: test_omitted_max_tokens_does_not_reference_a_default_constant — source-level lock that the fallback constant stays gone.
Adjusted: MCP test timeouts (30 s → 120 s on CLI side; 10 s → 30 s on the hanging-server test, with explicit max_tokens=128 so the test isolates MCP timing from model latency).

Test plan

swift run apfel-tests — 597 unit tests pass
Three previously failing MCP integration tests pass after the timeout/max_tokens adjustments
Three new overflow integration tests pass
make test — full unit + integration suite (in progress; updating PR description after it lands)
make preflight before release

…length"; drop arbitrary 1024 default Root-cause fix for #128: when the on-device model runs into the 4096-token ceiling after producing some content, that throw is now a clean finish_reason: "length" instead of an HTTP 400 / stream error. The new pure helper StreamErrorResolver branches on prev: empty -> still 400 (prompt-side overflow); non-empty -> graceful truncation. Refusal, guardrail, decoding, rate-limit etc. stay fatal regardless of prev. With graceful overflow handling in place, the load-bearing 1024 default cap is no longer needed. Both CLI and server now pass max_tokens through unchanged: omitted -> nil -> FoundationModels uses whatever room is left in the 4096-token window. Drop-in OpenAI semantics, full window utilisation, no fallback constant. SSOT: both surfaces share one rule (no rule). Non-streaming responses now route through collectStream too, so both modes get identical overflow handling for free. Tests: - New StreamErrorResolverTests covers every ApfelError variant x empty/ non-empty prev. Only contextOverflow + non-empty prev is graceful. - StreamOutcome Equatable / Hashable / Sendable round-trip tests. - BodyLimitsTests gains a regression guard that the constant stays gone. - CLIServerParityTests rewritten to assert both surfaces drop the ?? fallback. - ApfelCorePublicAPIUsageTests locks down the new public types. - Integration: 3 new tests for omitted max_tokens (non-stream / stream / source-level lock). MCP test timeouts adjusted; the hanging-server test gains explicit max_tokens=128 so it isolates MCP timing from model-wandering latency. 584 unit + 257 integration tests, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Arthur-Ficial

Automated first-pass review — PR #136

Nice root-cause analysis. The 7-whys chain from #128 → #129 → #130 → #133 → here is well-traced, and the fix is at the right level: FoundationModels throwing on output-side context overflow is morally finish_reason: "length", not a server error. The StreamErrorResolver captures this distinction cleanly.

Findings

Sev	Area	Summary
P2	`Handlers.swift:184-187`, `Session.swift:344,377`	MCP auto-execute paths still use `session.respond(to:)` directly, not `collectStream`. Output-side overflow during MCP re-prompt would still surface as a fatal error. Low practical risk (tool-call JSON is short; re-prompt creates fresh context), but worth noting for consistency. Follow-up PR material.
P2	`StreamOutcome.swift:1-23`	22-line file header comment. Style nit per project conventions ("default to writing no comments", "one short line max").

What I verified (clean)

StreamErrorResolver logic: Only contextOverflow + non-empty prev is graceful. Every other ApfelError case stays fatal regardless of prev. Correct.
StreamOutcome type: Sendable, Equatable, Hashable — all clean. Public on ApfelCore, consistent with FinishReason and ApfelError visibility.
ProcessPromptResult backward-compat: Default .stop for finishReason — existing callers unaffected.
collectStream catch block: Classifies via ApfelError.classify() then routes through StreamErrorResolver. Prompt-side overflow (empty prev) still throws. Output-side overflow returns StreamOutcome(content:, finishReason: .length).
Server streaming handler (Handlers.swift:549): Applies StreamErrorResolver directly in the catch block. Emits length-finish chunk → optional usage chunk → [DONE]. The defer { continuation.finish() } at line 447 ensures the stream is closed on all paths.
Server non-streaming handler (Handlers.swift:376-386): Uses outcome.finishReason to decide finishReason. When .length and no tool calls → FinishReason.length.openAIValue. Otherwise defers to FinishReasonResolver.resolve(). Correct priority.
CLI warning (CLI.swift:73): Only fires for finish_reason == .length. Appropriate — stderr, yellow, actionable message.
MCP tool re-prompt resets to .stop (Session.swift:269): Correct — the re-prompt produces a fresh natural reply.
Retry interaction: collectStream returns .length as a success (not a throw), so withRetry sees a normal return and doesn't retry. Context overflow without content throws, but isRetryable is false for .contextOverflow. Consistent with pre-PR behavior.
BodyLimits.defaultMaxResponseTokens: Fully removed from source. Parity tests guard against reintroduction via source scanning. Integration tests add a third layer of source-level locking.
Test coverage: StreamErrorResolverTests covers every ApfelError case × both prev states (28+ assertions). BodyLimitsTests updated. CLIServerParityTests rewritten. ApfelCorePublicAPIUsageTests extended. 3 new integration tests. Registered in main.swift. All thorough.
MCP test timeout adjustments: 30s → 120s (CLI), 10s → 30s (server) — reasonable given unbounded generation window.
Docs: README, docs/openai-api-compatibility.md — consistently updated. "Picking a value" table simplified. CLI parity section rewritten.
No security concerns: No new network code, no new URL handling, no new auth, no @unchecked Sendable, no new dependencies.
No .version, BuildInfo.swift, or README badge edits (release workflow outputs left untouched).

Architecture notes

The unification of non-streaming through collectStream (both CLI processPrompt and server nonStreamingResponse) is a good simplification — one code path, identical overflow semantics. The tradeoff is that non-streaming now internally streams and accumulates, but FoundationModels' streaming and non-streaming APIs should produce equivalent content.

The StreamErrorResolver being a pure enum with a static method is the right shape — no state, easily testable, easily inlineable. The decision matrix (prev × error type) is small and complete.

Summary

Solid root-cause fix. The two P2s are follow-up material, not blockers. CI is still running (build-and-test + apfelcore-public-api in progress); those need to go green before merge. make test and make preflight still pending per the PR description — full local qualification is the real gate per project conventions.

Automated review — cannot approve or merge. Franz reviews and merges.

Generated by Claude Code

@available

…pi-breaking-changes passes Two follow-up fixes after CI feedback: 1. BodyLimits.defaultMaxResponseTokens is now @available(*, deprecated) with value 0 instead of being deleted outright. The constant is no longer consulted anywhere; the stub exists purely to preserve the ApfelCore API surface for one release. Slated for removal in 2.0.0. 2. ProcessPromptResult gains an explicit init(content:toolLog:) that delegates to the three-argument init, so the pre-1.3.3 init signature stays in the API surface. Also fold in the cap-hit finish_reason fix: collectStream now resolves .stop vs .length via FinishReasonResolver after natural stream completion. This makes finish_reason: "length" fire on cap-hit too (not just on output-side overflow), which is what OpenAI clients expect. End-to-end verified: - swift package diagnose-api-breaking-changes v1.3.2 -> "No breaking changes detected in ApfelCore" - 597 unit tests pass - apfel --max-tokens 20 ... -> stderr warning fires, finish_reason=length

Arthur-Ficial

Re-review after `a5a815c` (API stability fix)

The second commit (fix(api-stability): keep deprecated stubs so swift-package-diagnose-api-breaking-changes passes) cleanly solves the CI gate by keeping defaultMaxResponseTokens as a deprecated symbol rather than removing it outright.

What changed in the new commit

File	Change
`Sources/Core/Chat/BodyLimits.swift`	`defaultMaxResponseTokens` kept as `@available(*, deprecated)` with value `0`
`Sources/Core/ToolCallHandler.swift`	`ProcessPromptResult` backward-compat 2-arg init preserved (delegates to 3-arg)
`Tests/apfelTests/BodyLimitsTests.swift`	Now asserts the `@available(*, deprecated)` annotation exists via source scanning

Why this is correct

Value of 0 (not 1024): If anyone references it despite the deprecation, the validator rejects <= 0 with a clear error. Fail-loud is safer than silently applying the old cap with wrong semantics.
ProcessPromptResult is package access — the back-compat init is courtesy for internal callers, not a public API concern. Appropriate defensiveness.
The apfelcore-public-api CI check (which runs swift package diagnose-api-breaking-changes) should now pass since no symbols were removed.

Previous findings still valid

The two P2s from the first review remain follow-up material:

MCP auto-execute paths still use session.respond(to:) directly (low practical risk — tool-call JSON is short, re-prompt creates fresh context).
StreamOutcome.swift file header is 22 lines vs project style preference for minimal comments.

Neither blocks merge.

Verified clean (re-confirmed on updated tree)

collectStream catch → StreamErrorResolver → .truncated only for contextOverflow + non-empty prev ✓
FinishReasonResolver.resolve handles nil maxTokens correctly (falls through to .stop) ✓
Server non-streaming routes through collectStream → gets overflow handling for free ✓
Server streaming catch applies StreamErrorResolver before other error branches ✓
CLI emits stderr warning for finish_reason=length ✓
SessionOptions.maxTokens: Int? passes nil through to GenerationOptions.maximumResponseTokens ✓
ProcessPromptResult.finishReason resets to .stop after MCP tool re-prompt ✓
New test suite registered in Tests/apfelTests/main.swift ✓
Integration tests have appropriate timeouts (120s CLI, 30s server) ✓
No .version, BuildInfo.swift, or README badge edits ✓

CI

Both build-and-test and apfelcore-public-api are in progress. The second commit specifically targets the API check. Wait for green before merge.

Generated by Claude Code — re-review on synchronize event

Generated by Claude Code

Arthur-Ficial commented Apr 27, 2026

View reviewed changes

Arthur-Ficial merged commit f72cb5b into main Apr 27, 2026
2 checks passed

Arthur-Ficial deleted the fix/streaming-overflow-and-dynamic-default branch April 27, 2026 07:44

meroamir99881-hue approved these changes Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(overflow): graceful finish_reason=length; drop arbitrary 1024 default#136

fix(overflow): graceful finish_reason=length; drop arbitrary 1024 default#136
Arthur-Ficial merged 2 commits intomainfrom
fix/streaming-overflow-and-dynamic-default

Arthur-Ficial commented Apr 27, 2026

Uh oh!

Arthur-Ficial left a comment

Uh oh!

Arthur-Ficial left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Arthur-Ficial commented Apr 27, 2026

Summary

Why

What changed

Tests

Test plan

Uh oh!

Arthur-Ficial left a comment

Choose a reason for hiding this comment

Automated first-pass review — PR #136

Findings

What I verified (clean)

Architecture notes

Summary

Uh oh!

Arthur-Ficial left a comment

Choose a reason for hiding this comment

Re-review after a5a815c (API stability fix)

What changed in the new commit

Why this is correct

Previous findings still valid

Verified clean (re-confirmed on updated tree)

CI

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Re-review after `a5a815c` (API stability fix)