fix(overflow): graceful finish_reason=length; drop arbitrary 1024 default#136
Conversation
…length"; drop arbitrary 1024 default Root-cause fix for #128: when the on-device model runs into the 4096-token ceiling after producing some content, that throw is now a clean finish_reason: "length" instead of an HTTP 400 / stream error. The new pure helper StreamErrorResolver branches on prev: empty -> still 400 (prompt-side overflow); non-empty -> graceful truncation. Refusal, guardrail, decoding, rate-limit etc. stay fatal regardless of prev. With graceful overflow handling in place, the load-bearing 1024 default cap is no longer needed. Both CLI and server now pass max_tokens through unchanged: omitted -> nil -> FoundationModels uses whatever room is left in the 4096-token window. Drop-in OpenAI semantics, full window utilisation, no fallback constant. SSOT: both surfaces share one rule (no rule). Non-streaming responses now route through collectStream too, so both modes get identical overflow handling for free. Tests: - New StreamErrorResolverTests covers every ApfelError variant x empty/ non-empty prev. Only contextOverflow + non-empty prev is graceful. - StreamOutcome Equatable / Hashable / Sendable round-trip tests. - BodyLimitsTests gains a regression guard that the constant stays gone. - CLIServerParityTests rewritten to assert both surfaces drop the ?? fallback. - ApfelCorePublicAPIUsageTests locks down the new public types. - Integration: 3 new tests for omitted max_tokens (non-stream / stream / source-level lock). MCP test timeouts adjusted; the hanging-server test gains explicit max_tokens=128 so it isolates MCP timing from model-wandering latency. 584 unit + 257 integration tests, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Arthur-Ficial
left a comment
There was a problem hiding this comment.
Automated first-pass review — PR #136
Nice root-cause analysis. The 7-whys chain from #128 → #129 → #130 → #133 → here is well-traced, and the fix is at the right level: FoundationModels throwing on output-side context overflow is morally finish_reason: "length", not a server error. The StreamErrorResolver captures this distinction cleanly.
Findings
| Sev | Area | Summary |
|---|---|---|
| P2 | Handlers.swift:184-187, Session.swift:344,377 |
MCP auto-execute paths still use session.respond(to:) directly, not collectStream. Output-side overflow during MCP re-prompt would still surface as a fatal error. Low practical risk (tool-call JSON is short; re-prompt creates fresh context), but worth noting for consistency. Follow-up PR material. |
| P2 | StreamOutcome.swift:1-23 |
22-line file header comment. Style nit per project conventions ("default to writing no comments", "one short line max"). |
What I verified (clean)
StreamErrorResolverlogic: OnlycontextOverflow + non-empty previs graceful. Every otherApfelErrorcase stays fatal regardless ofprev. Correct.StreamOutcometype:Sendable,Equatable,Hashable— all clean. Public onApfelCore, consistent withFinishReasonandApfelErrorvisibility.ProcessPromptResultbackward-compat: Default.stopforfinishReason— existing callers unaffected.collectStreamcatch block: Classifies viaApfelError.classify()then routes throughStreamErrorResolver. Prompt-side overflow (emptyprev) still throws. Output-side overflow returnsStreamOutcome(content:, finishReason: .length).- Server streaming handler (
Handlers.swift:549): AppliesStreamErrorResolverdirectly in the catch block. Emits length-finish chunk → optional usage chunk →[DONE]. Thedefer { continuation.finish() }at line 447 ensures the stream is closed on all paths. - Server non-streaming handler (
Handlers.swift:376-386): Usesoutcome.finishReasonto decidefinishReason. When.lengthand no tool calls →FinishReason.length.openAIValue. Otherwise defers toFinishReasonResolver.resolve(). Correct priority. - CLI warning (
CLI.swift:73): Only fires forfinish_reason == .length. Appropriate — stderr, yellow, actionable message. - MCP tool re-prompt resets to
.stop(Session.swift:269): Correct — the re-prompt produces a fresh natural reply. - Retry interaction:
collectStreamreturns.lengthas a success (not a throw), sowithRetrysees a normal return and doesn't retry. Context overflow without content throws, butisRetryableisfalsefor.contextOverflow. Consistent with pre-PR behavior. BodyLimits.defaultMaxResponseTokens: Fully removed from source. Parity tests guard against reintroduction via source scanning. Integration tests add a third layer of source-level locking.- Test coverage:
StreamErrorResolverTestscovers everyApfelErrorcase × both prev states (28+ assertions).BodyLimitsTestsupdated.CLIServerParityTestsrewritten.ApfelCorePublicAPIUsageTestsextended. 3 new integration tests. Registered inmain.swift. All thorough. - MCP test timeout adjustments:
30s → 120s(CLI),10s → 30s(server) — reasonable given unbounded generation window. - Docs: README,
docs/openai-api-compatibility.md— consistently updated. "Picking a value" table simplified. CLI parity section rewritten. - No security concerns: No new network code, no new URL handling, no new auth, no
@unchecked Sendable, no new dependencies. - No
.version,BuildInfo.swift, or README badge edits (release workflow outputs left untouched).
Architecture notes
The unification of non-streaming through collectStream (both CLI processPrompt and server nonStreamingResponse) is a good simplification — one code path, identical overflow semantics. The tradeoff is that non-streaming now internally streams and accumulates, but FoundationModels' streaming and non-streaming APIs should produce equivalent content.
The StreamErrorResolver being a pure enum with a static method is the right shape — no state, easily testable, easily inlineable. The decision matrix (prev × error type) is small and complete.
Summary
Solid root-cause fix. The two P2s are follow-up material, not blockers. CI is still running (build-and-test + apfelcore-public-api in progress); those need to go green before merge. make test and make preflight still pending per the PR description — full local qualification is the real gate per project conventions.
Automated review — cannot approve or merge. Franz reviews and merges.
Generated by Claude Code
…pi-breaking-changes passes Two follow-up fixes after CI feedback: 1. BodyLimits.defaultMaxResponseTokens is now @available(*, deprecated) with value 0 instead of being deleted outright. The constant is no longer consulted anywhere; the stub exists purely to preserve the ApfelCore API surface for one release. Slated for removal in 2.0.0. 2. ProcessPromptResult gains an explicit init(content:toolLog:) that delegates to the three-argument init, so the pre-1.3.3 init signature stays in the API surface. Also fold in the cap-hit finish_reason fix: collectStream now resolves .stop vs .length via FinishReasonResolver after natural stream completion. This makes finish_reason: "length" fire on cap-hit too (not just on output-side overflow), which is what OpenAI clients expect. End-to-end verified: - swift package diagnose-api-breaking-changes v1.3.2 -> "No breaking changes detected in ApfelCore" - 597 unit tests pass - apfel --max-tokens 20 ... -> stderr warning fires, finish_reason=length
Arthur-Ficial
left a comment
There was a problem hiding this comment.
Re-review after a5a815c (API stability fix)
The second commit (fix(api-stability): keep deprecated stubs so swift-package-diagnose-api-breaking-changes passes) cleanly solves the CI gate by keeping defaultMaxResponseTokens as a deprecated symbol rather than removing it outright.
What changed in the new commit
| File | Change |
|---|---|
Sources/Core/Chat/BodyLimits.swift |
defaultMaxResponseTokens kept as @available(*, deprecated) with value 0 |
Sources/Core/ToolCallHandler.swift |
ProcessPromptResult backward-compat 2-arg init preserved (delegates to 3-arg) |
Tests/apfelTests/BodyLimitsTests.swift |
Now asserts the @available(*, deprecated) annotation exists via source scanning |
Why this is correct
- Value of
0(not1024): If anyone references it despite the deprecation, the validator rejects<= 0with a clear error. Fail-loud is safer than silently applying the old cap with wrong semantics. ProcessPromptResultispackageaccess — the back-compat init is courtesy for internal callers, not a public API concern. Appropriate defensiveness.- The
apfelcore-public-apiCI check (which runsswift package diagnose-api-breaking-changes) should now pass since no symbols were removed.
Previous findings still valid
The two P2s from the first review remain follow-up material:
- MCP auto-execute paths still use
session.respond(to:)directly (low practical risk — tool-call JSON is short, re-prompt creates fresh context). StreamOutcome.swiftfile header is 22 lines vs project style preference for minimal comments.
Neither blocks merge.
Verified clean (re-confirmed on updated tree)
collectStreamcatch →StreamErrorResolver→.truncatedonly forcontextOverflow + non-empty prev✓FinishReasonResolver.resolvehandlesnilmaxTokens correctly (falls through to.stop) ✓- Server non-streaming routes through
collectStream→ gets overflow handling for free ✓ - Server streaming catch applies
StreamErrorResolverbefore other error branches ✓ - CLI emits stderr warning for
finish_reason=length✓ SessionOptions.maxTokens: Int?passesnilthrough toGenerationOptions.maximumResponseTokens✓ProcessPromptResult.finishReasonresets to.stopafter MCP tool re-prompt ✓- New test suite registered in
Tests/apfelTests/main.swift✓ - Integration tests have appropriate timeouts (120s CLI, 30s server) ✓
- No
.version,BuildInfo.swift, or README badge edits ✓
CI
Both build-and-test and apfelcore-public-api are in progress. The second commit specifically targets the API check. Wait for green before merge.
Generated by Claude Code — re-review on synchronize event
Generated by Claude Code
Summary
finish_reason: "length"instead of a server error. The new pure helperStreamErrorResolverdecides: emptyprev+ overflow → still a 400 (prompt-side); non-emptyprev+ overflow → graceful truncation. Refusal, guardrail, decoding, rate-limit etc. stay fatal regardless.max_tokensflows through asnilon both CLI and server. FoundationModels uses whatever room is left in the 4096-token window. This is drop-in OpenAI semantics and full window utilisation - no fallback constant.BodyLimits.defaultMaxResponseTokensis removed. Single source of truth: both surfaces passmax_tokensthrough unchanged.collectStreamtoo, so both surfaces and both modes get identical overflow handling for free.Why
Issue #128 surfaced as "request hangs ~50 s and ends in
[context overflow]". PR #129 patched the symptom with a 512-token cap. Issue #130 / PR #133 raised it to 1024 across CLI + server. Both made the cap load-bearing for correctness. The actual root cause: an output-side end condition was being reported as an exception. The 7-whys analysis is in the PR thread; the punchline is that hitting the ceiling mid-stream is OpenAI'sfinish_reason: "length", not an HTTP 400.With this PR the cap is no longer load-bearing. It becomes optional - clients pass
max_tokenswhen they want a tighter latency budget, otherwise they get the OpenAI-spec behaviour of "use what's left".What changed
Sources/Core/Chat/StreamOutcome.swiftStreamOutcome,StreamErrorResolution,StreamErrorResolverSources/Core/Chat/BodyLimits.swiftdefaultMaxResponseTokensdeletedSources/Session.swiftcollectStreamnow returnsStreamOutcome; output-side overflow →.lengthSources/Handlers.swift[DONE]Sources/Handlers.swiftcollectStreamfor the same semanticsSources/Session.swiftcollectStreamSources/CLI.swiftfinish_reason=lengthSources/Core/ToolCallHandler.swiftProcessPromptResult.finishReasonSources/Handlers.swift?? BodyLimits.defaultMaxResponseTokensSources/main.swift?? BodyLimits.defaultMaxResponseTokensREADME.md,docs/openai-api-compatibility.mdTests
Unit (TDD red → green, all in
swift run apfel-tests)StreamErrorResolverTests.swiftcovering everyApfelErrorcase × empty/non-emptyprev. Only.contextOverflowwith non-emptyprevis graceful; everything else is fatal.StreamOutcomeEquatable / Hashable / Sendable round-trip tests.BodyLimitsTests.swiftupdated; new regression guard that source no longer referencesdefaultMaxResponseTokens.CLIServerParityTests.swiftrewritten to assert both surfaces passmax_tokensthrough unchanged (no?? <constant>).ApfelCorePublicAPIUsageTests.swiftextended to lock down the new public types.Integration (Apple Intelligence required)
test_omitted_max_tokens_non_streaming_returns_200— request withoutmax_tokensreturns HTTP 200 with usable content andfinish_reason ∈ {stop, length}.test_omitted_max_tokens_streaming_completes_with_done— streaming ends with[DONE], noerrorpayload, terminalfinish_reason ∈ {stop, length}.test_omitted_max_tokens_does_not_reference_a_default_constant— source-level lock that the fallback constant stays gone.max_tokens=128so the test isolates MCP timing from model latency).Test plan
swift run apfel-tests— 597 unit tests passmax_tokensadjustmentsmake test— full unit + integration suite (in progress; updating PR description after it lands)make preflightbefore release