feat(openaicompat): reasoning_content + provider quirks for Kimi K2.5 & GLM 5.0 by danshapiro · Pull Request #1 · prime-radiant-inc/serf

danshapiro · 2026-03-26T06:33:40Z

Summary

Parse reasoning_content in non-streaming and streaming Chat Completions responses, emitting unified ContentThinking / StreamEventReasoning* types
Round-trip reasoning_content in outgoing assistant messages for multi-turn reasoning preservation
Extract native reasoning_tokens from completion_tokens_details; fall back to chars/4 estimation
Add ProviderQuirks system with named presets (kimi-k2.5, glm-5) configurable via OPENAI_COMPATIBLE_PROVIDER_QUIRKS env var
Quirk behaviors: locked hyperparameters, tool_choice restriction, stop sequence limits, empty content stripping, JSON schema downgrade, finish reason mapping

Test plan

47/47 tests pass (24 pre-existing + 23 new)
Integration tests exercise full Kimi K2.5 and GLM 5.0 conversation flows with httptest mocks
go vet ./llm/providers/openaicompat/ clean

🤖 Generated with Claude Code

Extends the openaicompat adapter with: - reasoning_content parsing (non-streaming and streaming) - reasoning_content round-tripping in outgoing messages - Usage.ReasoningTokens extraction and estimation - ProviderQuirks system for per-provider behavioral overrides - Named presets for kimi-k2.5 and glm-5 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix two bugs in the openaicompat-reasoning plan: 1. FinishReasonMap implementation: the original approach passed the mapped value through NormalizeFinishReason, which would overwrite .Raw with the mapped value instead of preserving the original provider value (e.g., "sensitive"). The fix constructs FinishReason explicitly when a quirk mapping is applied, preserving the original raw value for diagnostics. 2. Streaming reasoning-to-tool-call transition: added logic to close REASONING_END before the first tool call starts, matching Anthropic's event ordering. Without this, reasoning would remain "open" until [DONE] when tool calls follow reasoning with no text in between (common for Kimi K2.5). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…provider quirks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ntent Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…n outgoing messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ter support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…schema, and finish reason quirks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…v var config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…d verify reasoning in final response Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…narios Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The past-viewer transcript pane was empty because serveReplay decoded top-level {kind, data} fields, but transcript entries are {kind:"entry", seq, turn:{kind, message, ...}} — the data field was always missing. Even with the right schema, transcript turns are message-shaped objects, not the runtime SSE event stream the renderer listens for. Translate each turn into the equivalent renderer events: header → SESSION_START; USER_INPUT → USER_INPUT; ASSISTANT text part → ASSISTANT_TEXT_START + ASSISTANT_TEXT_END; ASSISTANT tool_call part → TOOL_CALL_START; TOOL_RESULTS → TOOL_CALL_END (using a call_id → tool_name map carried across turns). Closes kata #1.

…ests The previous rendering produced three redundant elements per task transition: a steering "current task" divider, a tool-call card ("tasks update · 1 updated"), and a body row ("✓ #1 → done"). A reader couldn't follow the narrative. This rewrite is a synthesis of six parallel design treatments (ambient-strip, timeline-lane, status-line-min, collapsible-group, action-narrative, kanban-beam), landing on the action-narrative direction. Conversation transcript changes: - "current task" steering is suppressed entirely (it fired before nearly every turn and duplicated information available elsewhere). Its task title still seeds the description cache so subsequent updates can name the task. - task_list tool calls render as a single line of system prose: "marked \"Understand task\" done" "added 3 tasks: \"Phase 1\" · \"Phase 2\" · \"Phase 3\"" "marked \"X\" done — took longer than expected, started \"Y\"" Multiple updates collapse into one comma-joined sentence (cap 4). action=view is suppressed entirely. - full-list steering (post-compaction) collapses to one "task list reloaded · N items" pointer that opens the side panel. - loop / read-only / all-done / unknown steering still render as the existing collapsible divider (these are exceptional events). Tasks button changes: - Shows a live progress badge next to the icon (e.g. "3/7"), updated by a 5s background poller and immediately after each task_list call. Visible without opening the panel. Tasks side panel changes: - EVERY task is expandable now (not just notes-having ones). Click a row to reveal a metadata <dl> with type, status, depends_on, reasoning_effort, full prompt, and notes log. Chevron rotates on open. Description stays in the row summary. Description cache: - Client-side Map seeded from steering full-list parses, task_list-append args (best-effort), and the /tasks poll. Used by the system-line renderer to print "marked \"X\" done" instead of "marked #N done". Falls back to "#N" when uncached. JSDOM tests under cmd/serf-hub/jstest: - test-renderer.js: replays the captured event sequence from Jesse's screenshot scenario; asserts 0 steering, 0 tool-call cards, 1 system-line ("marked \"Understand task\" done"). - test-renderer-advanced.js: append (3 tasks named), multi-update with notes, full-list pointer, loop steering kept, view action suppressed, missing-description #N fallback. - test-sidebar.js: every task expandable; metadata dl populated with prompt/type/depends_on/reasoning_effort/notes; badge shows 1/3 for the test fixture.

…te (PRI-1975) Done-bar #1 is "go test -race ./... green as a PERMANENT gate." The module is race-clean (Phase 0 + the two hub-race fixes), but CI ran `go test -short` without -race, so a future race could land green. Switch the CI test step to `go test -race -short -count=1 ./...` so the race detector runs on every build. -short is kept (CI skips resource-dependent tests); the full `go test -race ./...` remains the local dev gate. Verified `go test -race -short ./...` green locally. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Final review caught three salvage artifacts in MODEL-FACING rendered text (the design doc was right; the render strings were stale): the elided-turns marker pointed at find_session_transcripts(transcript_ref) for an outline (find takes no ref — that's read format=outline); the header said format=transcript_jsonl (now jsonl); the hard-cap note said full_result_for (now expand_turn). Tests asserting the old strings updated. #1 fired on the default read of any long session. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A practical, example-driven guide for authoring hooks in a serf plugin: discovery (--plugin-dir, manifest path / inline / default hooks.json), the nine events serf fires and their tiers, matcher semantics with worked examples, the command handler (shell vs exec-form args, shell selection, env vars), the input JSON fields, exit-code semantics per event, the Go RE2 caveat, the loud misconfiguration warnings, and a copy-pasteable example hooks.json. The shell->Bash tool-name caveat is called out prominently as 'the #1 mistake'. Linked from 07 (the contract) and from architecture.md's hooks-package bullet.

…broken anchor Adversarial review found, and I verified against the code, three real defects: - hooks.md listed `stopReason` in the "fields serf honors" block, but the output parser never parses it — `parsedHookOutput` has no such field (the only StopReason in the tree is the goal subsystem). It was wrongly lifted from old 07's *reserved* universal-JSON block into the *shipped* list. Removed it. `continue`/`suppressOutput` are assigned to `parsedHookOutput` but absent from `RunResult`/`PreToolUseResult`, so they never reach a caller — reframed both docs to say they are parsed but not acted on, not honored. - The "Output is capped at 10,000 characters" claim has no basis: no length cap exists anywhere in the hook output path. Removed from hooks.md and 07. - The shell->Bash gotcha anchor `#the-1-mistake-shellbash` is wrong; the GitHub slug for "## The #1 mistake: `shell` -> `Bash`" is `#the-1-mistake-shell--bash` (the arrow between spaces yields a double hyphen, matching the Phase-B/C convention already used). Fixed all four links (3 in hooks.md, 1 new cross-doc link in 07).

Dan Shapiro and others added 13 commits March 25, 2026 23:30

test-plan: concrete test plan for openaicompat reasoning_content and …

0d41148

…provider quirks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(openaicompat): parse reasoning_content in non-streaming responses

704ac73

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(openaicompat): emit reasoning events from streaming reasoning_co…

960f48b

…ntent Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(openaicompat): round-trip ContentThinking as reasoning_content i…

1bf9679

…n outgoing messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(openaicompat): extract and estimate reasoning_tokens in usage

24b8e8d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(openaicompat): add ProviderQuirks struct with locked hyperparame…

2a1fe87

…ter support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(openaicompat): implement tool_choice, stop, empty content, json_…

1650f23

…schema, and finish reason quirks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(openaicompat): add QuirksPreset for kimi-k2.5 and glm-5, with en…

4c09758

…v var config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(openaicompat): apply quirk finish reason mapping in streaming an…

74cda43

…d verify reasoning in final response Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test(openaicompat): add integration tests for kimi-k2.5 and glm-5 sce…

8a3e829

…narios Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore(openaicompat): add doc comment to NewFromEnv

49b706a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openaicompat): reasoning_content + provider quirks for Kimi K2.5 & GLM 5.0#1

feat(openaicompat): reasoning_content + provider quirks for Kimi K2.5 & GLM 5.0#1
danshapiro wants to merge 13 commits into
prime-radiant-inc:mainfrom
danshapiro:openaicompat-reasoning

danshapiro commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant