Skip to content

fix(overflow): graceful finish_reason=length; drop arbitrary 1024 default#136

Merged
Arthur-Ficial merged 2 commits intomainfrom
fix/streaming-overflow-and-dynamic-default
Apr 27, 2026
Merged

fix(overflow): graceful finish_reason=length; drop arbitrary 1024 default#136
Arthur-Ficial merged 2 commits intomainfrom
fix/streaming-overflow-and-dynamic-default

Conversation

@Arthur-Ficial
Copy link
Copy Markdown
Owner

Summary

  • Root-cause fix for output-side context overflow. When the on-device model runs into the 4096-token ceiling after producing some content, that throw is now treated as a clean finish_reason: "length" instead of a server error. The new pure helper StreamErrorResolver decides: empty prev + overflow → still a 400 (prompt-side); non-empty prev + overflow → graceful truncation. Refusal, guardrail, decoding, rate-limit etc. stay fatal regardless.
  • Drop the arbitrary 1024 default. With graceful overflow in place, omitted max_tokens flows through as nil on both CLI and server. FoundationModels uses whatever room is left in the 4096-token window. This is drop-in OpenAI semantics and full window utilisation - no fallback constant. BodyLimits.defaultMaxResponseTokens is removed. Single source of truth: both surfaces pass max_tokens through unchanged.
  • CLI/server unified path. Non-streaming responses now route through collectStream too, so both surfaces and both modes get identical overflow handling for free.

Why

Issue #128 surfaced as "request hangs ~50 s and ends in [context overflow]". PR #129 patched the symptom with a 512-token cap. Issue #130 / PR #133 raised it to 1024 across CLI + server. Both made the cap load-bearing for correctness. The actual root cause: an output-side end condition was being reported as an exception. The 7-whys analysis is in the PR thread; the punchline is that hitting the ceiling mid-stream is OpenAI's finish_reason: "length", not an HTTP 400.

With this PR the cap is no longer load-bearing. It becomes optional - clients pass max_tokens when they want a tighter latency budget, otherwise they get the OpenAI-spec behaviour of "use what's left".

What changed

Area File Change
New pure logic Sources/Core/Chat/StreamOutcome.swift StreamOutcome, StreamErrorResolution, StreamErrorResolver
Constant removed Sources/Core/Chat/BodyLimits.swift defaultMaxResponseTokens deleted
Streaming helper Sources/Session.swift collectStream now returns StreamOutcome; output-side overflow → .length
Server streaming SSE Sources/Handlers.swift catch-block applies the resolver, emits length-finish chunk + [DONE]
Server non-streaming Sources/Handlers.swift routes through collectStream for the same semantics
CLI non-streaming Sources/Session.swift same: routes through collectStream
CLI surfacing Sources/CLI.swift stderr warning when finish_reason=length
Tracking type Sources/Core/ToolCallHandler.swift ProcessPromptResult.finishReason
Server SSOT Sources/Handlers.swift drops ?? BodyLimits.defaultMaxResponseTokens
CLI SSOT Sources/main.swift drops ?? BodyLimits.defaultMaxResponseTokens
Docs README.md, docs/openai-api-compatibility.md reflect the new dynamic default

Tests

Unit (TDD red → green, all in swift run apfel-tests)

  • New StreamErrorResolverTests.swift covering every ApfelError case × empty/non-empty prev. Only .contextOverflow with non-empty prev is graceful; everything else is fatal.
  • New StreamOutcome Equatable / Hashable / Sendable round-trip tests.
  • BodyLimitsTests.swift updated; new regression guard that source no longer references defaultMaxResponseTokens.
  • CLIServerParityTests.swift rewritten to assert both surfaces pass max_tokens through unchanged (no ?? <constant>).
  • ApfelCorePublicAPIUsageTests.swift extended to lock down the new public types.

Integration (Apple Intelligence required)

  • New: test_omitted_max_tokens_non_streaming_returns_200 — request without max_tokens returns HTTP 200 with usable content and finish_reason ∈ {stop, length}.
  • New: test_omitted_max_tokens_streaming_completes_with_done — streaming ends with [DONE], no error payload, terminal finish_reason ∈ {stop, length}.
  • New: test_omitted_max_tokens_does_not_reference_a_default_constant — source-level lock that the fallback constant stays gone.
  • Adjusted: MCP test timeouts (30 s → 120 s on CLI side; 10 s → 30 s on the hanging-server test, with explicit max_tokens=128 so the test isolates MCP timing from model latency).

Test plan

  • swift run apfel-tests — 597 unit tests pass
  • Three previously failing MCP integration tests pass after the timeout/max_tokens adjustments
  • Three new overflow integration tests pass
  • make test — full unit + integration suite (in progress; updating PR description after it lands)
  • make preflight before release

…length"; drop arbitrary 1024 default

Root-cause fix for #128: when the on-device model runs into the 4096-token
ceiling after producing some content, that throw is now a clean
finish_reason: "length" instead of an HTTP 400 / stream error. The new pure
helper StreamErrorResolver branches on prev: empty -> still 400
(prompt-side overflow); non-empty -> graceful truncation. Refusal,
guardrail, decoding, rate-limit etc. stay fatal regardless of prev.

With graceful overflow handling in place, the load-bearing 1024 default
cap is no longer needed. Both CLI and server now pass max_tokens through
unchanged: omitted -> nil -> FoundationModels uses whatever room is left
in the 4096-token window. Drop-in OpenAI semantics, full window
utilisation, no fallback constant. SSOT: both surfaces share one rule
(no rule).

Non-streaming responses now route through collectStream too, so both
modes get identical overflow handling for free.

Tests:
- New StreamErrorResolverTests covers every ApfelError variant x empty/
  non-empty prev. Only contextOverflow + non-empty prev is graceful.
- StreamOutcome Equatable / Hashable / Sendable round-trip tests.
- BodyLimitsTests gains a regression guard that the constant stays gone.
- CLIServerParityTests rewritten to assert both surfaces drop the ?? fallback.
- ApfelCorePublicAPIUsageTests locks down the new public types.
- Integration: 3 new tests for omitted max_tokens (non-stream / stream /
  source-level lock). MCP test timeouts adjusted; the hanging-server
  test gains explicit max_tokens=128 so it isolates MCP timing from
  model-wandering latency.

584 unit + 257 integration tests, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@Arthur-Ficial Arthur-Ficial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated first-pass review — PR #136

Nice root-cause analysis. The 7-whys chain from #128#129#130#133 → here is well-traced, and the fix is at the right level: FoundationModels throwing on output-side context overflow is morally finish_reason: "length", not a server error. The StreamErrorResolver captures this distinction cleanly.

Findings

Sev Area Summary
P2 Handlers.swift:184-187, Session.swift:344,377 MCP auto-execute paths still use session.respond(to:) directly, not collectStream. Output-side overflow during MCP re-prompt would still surface as a fatal error. Low practical risk (tool-call JSON is short; re-prompt creates fresh context), but worth noting for consistency. Follow-up PR material.
P2 StreamOutcome.swift:1-23 22-line file header comment. Style nit per project conventions ("default to writing no comments", "one short line max").

What I verified (clean)

  • StreamErrorResolver logic: Only contextOverflow + non-empty prev is graceful. Every other ApfelError case stays fatal regardless of prev. Correct.
  • StreamOutcome type: Sendable, Equatable, Hashable — all clean. Public on ApfelCore, consistent with FinishReason and ApfelError visibility.
  • ProcessPromptResult backward-compat: Default .stop for finishReason — existing callers unaffected.
  • collectStream catch block: Classifies via ApfelError.classify() then routes through StreamErrorResolver. Prompt-side overflow (empty prev) still throws. Output-side overflow returns StreamOutcome(content:, finishReason: .length).
  • Server streaming handler (Handlers.swift:549): Applies StreamErrorResolver directly in the catch block. Emits length-finish chunk → optional usage chunk → [DONE]. The defer { continuation.finish() } at line 447 ensures the stream is closed on all paths.
  • Server non-streaming handler (Handlers.swift:376-386): Uses outcome.finishReason to decide finishReason. When .length and no tool calls → FinishReason.length.openAIValue. Otherwise defers to FinishReasonResolver.resolve(). Correct priority.
  • CLI warning (CLI.swift:73): Only fires for finish_reason == .length. Appropriate — stderr, yellow, actionable message.
  • MCP tool re-prompt resets to .stop (Session.swift:269): Correct — the re-prompt produces a fresh natural reply.
  • Retry interaction: collectStream returns .length as a success (not a throw), so withRetry sees a normal return and doesn't retry. Context overflow without content throws, but isRetryable is false for .contextOverflow. Consistent with pre-PR behavior.
  • BodyLimits.defaultMaxResponseTokens: Fully removed from source. Parity tests guard against reintroduction via source scanning. Integration tests add a third layer of source-level locking.
  • Test coverage: StreamErrorResolverTests covers every ApfelError case × both prev states (28+ assertions). BodyLimitsTests updated. CLIServerParityTests rewritten. ApfelCorePublicAPIUsageTests extended. 3 new integration tests. Registered in main.swift. All thorough.
  • MCP test timeout adjustments: 30s → 120s (CLI), 10s → 30s (server) — reasonable given unbounded generation window.
  • Docs: README, docs/openai-api-compatibility.md — consistently updated. "Picking a value" table simplified. CLI parity section rewritten.
  • No security concerns: No new network code, no new URL handling, no new auth, no @unchecked Sendable, no new dependencies.
  • No .version, BuildInfo.swift, or README badge edits (release workflow outputs left untouched).

Architecture notes

The unification of non-streaming through collectStream (both CLI processPrompt and server nonStreamingResponse) is a good simplification — one code path, identical overflow semantics. The tradeoff is that non-streaming now internally streams and accumulates, but FoundationModels' streaming and non-streaming APIs should produce equivalent content.

The StreamErrorResolver being a pure enum with a static method is the right shape — no state, easily testable, easily inlineable. The decision matrix (prev × error type) is small and complete.

Summary

Solid root-cause fix. The two P2s are follow-up material, not blockers. CI is still running (build-and-test + apfelcore-public-api in progress); those need to go green before merge. make test and make preflight still pending per the PR description — full local qualification is the real gate per project conventions.

Automated review — cannot approve or merge. Franz reviews and merges.


Generated by Claude Code

…pi-breaking-changes passes

Two follow-up fixes after CI feedback:

1. BodyLimits.defaultMaxResponseTokens is now @available(*, deprecated)
   with value 0 instead of being deleted outright. The constant is no
   longer consulted anywhere; the stub exists purely to preserve the
   ApfelCore API surface for one release. Slated for removal in 2.0.0.

2. ProcessPromptResult gains an explicit init(content:toolLog:) that
   delegates to the three-argument init, so the pre-1.3.3 init signature
   stays in the API surface.

Also fold in the cap-hit finish_reason fix: collectStream now resolves
.stop vs .length via FinishReasonResolver after natural stream completion.
This makes finish_reason: "length" fire on cap-hit too (not just on
output-side overflow), which is what OpenAI clients expect.

End-to-end verified:
- swift package diagnose-api-breaking-changes v1.3.2 -> "No breaking
  changes detected in ApfelCore"
- 597 unit tests pass
- apfel --max-tokens 20 ... -> stderr warning fires, finish_reason=length
Copy link
Copy Markdown
Owner Author

@Arthur-Ficial Arthur-Ficial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review after a5a815c (API stability fix)

The second commit (fix(api-stability): keep deprecated stubs so swift-package-diagnose-api-breaking-changes passes) cleanly solves the CI gate by keeping defaultMaxResponseTokens as a deprecated symbol rather than removing it outright.

What changed in the new commit

File Change
Sources/Core/Chat/BodyLimits.swift defaultMaxResponseTokens kept as @available(*, deprecated) with value 0
Sources/Core/ToolCallHandler.swift ProcessPromptResult backward-compat 2-arg init preserved (delegates to 3-arg)
Tests/apfelTests/BodyLimitsTests.swift Now asserts the @available(*, deprecated) annotation exists via source scanning

Why this is correct

  • Value of 0 (not 1024): If anyone references it despite the deprecation, the validator rejects <= 0 with a clear error. Fail-loud is safer than silently applying the old cap with wrong semantics.
  • ProcessPromptResult is package access — the back-compat init is courtesy for internal callers, not a public API concern. Appropriate defensiveness.
  • The apfelcore-public-api CI check (which runs swift package diagnose-api-breaking-changes) should now pass since no symbols were removed.

Previous findings still valid

The two P2s from the first review remain follow-up material:

  1. MCP auto-execute paths still use session.respond(to:) directly (low practical risk — tool-call JSON is short, re-prompt creates fresh context).
  2. StreamOutcome.swift file header is 22 lines vs project style preference for minimal comments.

Neither blocks merge.

Verified clean (re-confirmed on updated tree)

  • collectStream catch → StreamErrorResolver.truncated only for contextOverflow + non-empty prev
  • FinishReasonResolver.resolve handles nil maxTokens correctly (falls through to .stop) ✓
  • Server non-streaming routes through collectStream → gets overflow handling for free ✓
  • Server streaming catch applies StreamErrorResolver before other error branches ✓
  • CLI emits stderr warning for finish_reason=length
  • SessionOptions.maxTokens: Int? passes nil through to GenerationOptions.maximumResponseTokens
  • ProcessPromptResult.finishReason resets to .stop after MCP tool re-prompt ✓
  • New test suite registered in Tests/apfelTests/main.swift
  • Integration tests have appropriate timeouts (120s CLI, 30s server) ✓
  • No .version, BuildInfo.swift, or README badge edits ✓

CI

Both build-and-test and apfelcore-public-api are in progress. The second commit specifically targets the API check. Wait for green before merge.


Generated by Claude Code — re-review on synchronize event


Generated by Claude Code

@Arthur-Ficial Arthur-Ficial merged commit f72cb5b into main Apr 27, 2026
2 checks passed
@Arthur-Ficial Arthur-Ficial deleted the fix/streaming-overflow-and-dynamic-default branch April 27, 2026 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants