Skip to content

feat(openaicompat): reasoning_content + provider quirks for Kimi K2.5 & GLM 5.0#1

Open
danshapiro wants to merge 13 commits into
prime-radiant-inc:mainfrom
danshapiro:openaicompat-reasoning
Open

feat(openaicompat): reasoning_content + provider quirks for Kimi K2.5 & GLM 5.0#1
danshapiro wants to merge 13 commits into
prime-radiant-inc:mainfrom
danshapiro:openaicompat-reasoning

Conversation

@danshapiro

Copy link
Copy Markdown

Summary

  • Parse reasoning_content in non-streaming and streaming Chat Completions responses, emitting unified ContentThinking / StreamEventReasoning* types
  • Round-trip reasoning_content in outgoing assistant messages for multi-turn reasoning preservation
  • Extract native reasoning_tokens from completion_tokens_details; fall back to chars/4 estimation
  • Add ProviderQuirks system with named presets (kimi-k2.5, glm-5) configurable via OPENAI_COMPATIBLE_PROVIDER_QUIRKS env var
  • Quirk behaviors: locked hyperparameters, tool_choice restriction, stop sequence limits, empty content stripping, JSON schema downgrade, finish reason mapping

Test plan

  • 47/47 tests pass (24 pre-existing + 23 new)
  • Integration tests exercise full Kimi K2.5 and GLM 5.0 conversation flows with httptest mocks
  • go vet ./llm/providers/openaicompat/ clean

🤖 Generated with Claude Code

Dan Shapiro and others added 13 commits March 25, 2026 23:30
Extends the openaicompat adapter with:
- reasoning_content parsing (non-streaming and streaming)
- reasoning_content round-tripping in outgoing messages
- Usage.ReasoningTokens extraction and estimation
- ProviderQuirks system for per-provider behavioral overrides
- Named presets for kimi-k2.5 and glm-5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix two bugs in the openaicompat-reasoning plan:

1. FinishReasonMap implementation: the original approach passed the
   mapped value through NormalizeFinishReason, which would overwrite
   .Raw with the mapped value instead of preserving the original
   provider value (e.g., "sensitive"). The fix constructs FinishReason
   explicitly when a quirk mapping is applied, preserving the original
   raw value for diagnostics.

2. Streaming reasoning-to-tool-call transition: added logic to close
   REASONING_END before the first tool call starts, matching
   Anthropic's event ordering. Without this, reasoning would remain
   "open" until [DONE] when tool calls follow reasoning with no text
   in between (common for Kimi K2.5).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…provider quirks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ntent

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n outgoing messages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ter support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…schema, and finish reason quirks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…v var config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d verify reasoning in final response

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…narios

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
obra added a commit that referenced this pull request May 11, 2026
The past-viewer transcript pane was empty because serveReplay decoded
top-level {kind, data} fields, but transcript entries are
{kind:"entry", seq, turn:{kind, message, ...}} — the data field was
always missing. Even with the right schema, transcript turns are
message-shaped objects, not the runtime SSE event stream the renderer
listens for.

Translate each turn into the equivalent renderer events:
header → SESSION_START; USER_INPUT → USER_INPUT;
ASSISTANT text part → ASSISTANT_TEXT_START + ASSISTANT_TEXT_END;
ASSISTANT tool_call part → TOOL_CALL_START;
TOOL_RESULTS → TOOL_CALL_END (using a call_id → tool_name map carried
across turns).

Closes kata #1.
obra added a commit that referenced this pull request May 11, 2026
…ests

The previous rendering produced three redundant elements per task
transition: a steering "current task" divider, a tool-call card
("tasks update · 1 updated"), and a body row ("✓ #1 → done"). A
reader couldn't follow the narrative. This rewrite is a synthesis of
six parallel design treatments (ambient-strip, timeline-lane,
status-line-min, collapsible-group, action-narrative, kanban-beam),
landing on the action-narrative direction.

Conversation transcript changes:
  - "current task" steering is suppressed entirely (it fired before
    nearly every turn and duplicated information available elsewhere).
    Its task title still seeds the description cache so subsequent
    updates can name the task.
  - task_list tool calls render as a single line of system prose:
    "marked \"Understand task\" done"
    "added 3 tasks: \"Phase 1\" · \"Phase 2\" · \"Phase 3\""
    "marked \"X\" done — took longer than expected, started \"Y\""
    Multiple updates collapse into one comma-joined sentence (cap 4).
    action=view is suppressed entirely.
  - full-list steering (post-compaction) collapses to one
    "task list reloaded · N items" pointer that opens the side panel.
  - loop / read-only / all-done / unknown steering still render as
    the existing collapsible divider (these are exceptional events).

Tasks button changes:
  - Shows a live progress badge next to the icon (e.g. "3/7"),
    updated by a 5s background poller and immediately after each
    task_list call. Visible without opening the panel.

Tasks side panel changes:
  - EVERY task is expandable now (not just notes-having ones). Click
    a row to reveal a metadata <dl> with type, status, depends_on,
    reasoning_effort, full prompt, and notes log. Chevron rotates on
    open. Description stays in the row summary.

Description cache:
  - Client-side Map seeded from steering full-list parses,
    task_list-append args (best-effort), and the /tasks poll. Used
    by the system-line renderer to print "marked \"X\" done" instead
    of "marked #N done". Falls back to "#N" when uncached.

JSDOM tests under cmd/serf-hub/jstest:
  - test-renderer.js: replays the captured event sequence from
    Jesse's screenshot scenario; asserts 0 steering, 0 tool-call
    cards, 1 system-line ("marked \"Understand task\" done").
  - test-renderer-advanced.js: append (3 tasks named), multi-update
    with notes, full-list pointer, loop steering kept, view action
    suppressed, missing-description #N fallback.
  - test-sidebar.js: every task expandable; metadata dl populated
    with prompt/type/depends_on/reasoning_effort/notes; badge shows
    1/3 for the test fixture.
obra added a commit that referenced this pull request Jun 4, 2026
…te (PRI-1975)

Done-bar #1 is "go test -race ./... green as a PERMANENT gate." The module is
race-clean (Phase 0 + the two hub-race fixes), but CI ran `go test -short` without
-race, so a future race could land green. Switch the CI test step to
`go test -race -short -count=1 ./...` so the race detector runs on every build.
-short is kept (CI skips resource-dependent tests); the full `go test -race ./...`
remains the local dev gate. Verified `go test -race -short ./...` green locally.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
obra added a commit that referenced this pull request Jun 7, 2026
Final review caught three salvage artifacts in MODEL-FACING rendered text (the design
doc was right; the render strings were stale): the elided-turns marker pointed at
find_session_transcripts(transcript_ref) for an outline (find takes no ref — that's
read format=outline); the header said format=transcript_jsonl (now jsonl); the
hard-cap note said full_result_for (now expand_turn). Tests asserting the old strings
updated. #1 fired on the default read of any long session.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
obra added a commit that referenced this pull request Jun 8, 2026
A practical, example-driven guide for authoring hooks in a serf plugin:
discovery (--plugin-dir, manifest path / inline / default hooks.json), the nine
events serf fires and their tiers, matcher semantics with worked examples, the
command handler (shell vs exec-form args, shell selection, env vars), the input
JSON fields, exit-code semantics per event, the Go RE2 caveat, the loud
misconfiguration warnings, and a copy-pasteable example hooks.json.

The shell->Bash tool-name caveat is called out prominently as 'the #1 mistake'.

Linked from 07 (the contract) and from architecture.md's hooks-package bullet.
obra added a commit that referenced this pull request Jun 8, 2026
…broken anchor

Adversarial review found, and I verified against the code, three real defects:

- hooks.md listed `stopReason` in the "fields serf honors" block, but the output
  parser never parses it — `parsedHookOutput` has no such field (the only
  StopReason in the tree is the goal subsystem). It was wrongly lifted from old
  07's *reserved* universal-JSON block into the *shipped* list. Removed it.
  `continue`/`suppressOutput` are assigned to `parsedHookOutput` but absent from
  `RunResult`/`PreToolUseResult`, so they never reach a caller — reframed both
  docs to say they are parsed but not acted on, not honored.
- The "Output is capped at 10,000 characters" claim has no basis: no length cap
  exists anywhere in the hook output path. Removed from hooks.md and 07.
- The shell->Bash gotcha anchor `#the-1-mistake-shellbash` is wrong; the GitHub
  slug for "## The #1 mistake: `shell` -> `Bash`" is `#the-1-mistake-shell--bash`
  (the arrow between spaces yields a double hyphen, matching the Phase-B/C
  convention already used). Fixed all four links (3 in hooks.md, 1 new cross-doc
  link in 07).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant