feat: multiturn synthetic user Runner#1441
Conversation
Removes the /respond SDK module and its supporting wire types (RespondRequest/Response, SyntheticUserDriverConfig, ConversationTurn, the nested SyntheticUserInfo model). Per-turn synthetic-user invocation moves to OSS at libs/core/kiln_ai/synthetic_user/ in a subsequent commit. Collapses SyntheticUserCase.synthetic_user_info to a single tagged blob string: <persona>...</persona><goal>...</goal><behavior_guidance>...</behavior_guidance> The server treats the blob as opaque; the local player parses it. Adds a typed `code` literal on /generate's 502 response (llm_unavailable | upstream_invalid_output) so callers can discriminate between transient model failures and unparseable model output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OSS-side per-turn synthetic-user invocation — the replacement for kiln_server's removed /respond endpoint. Lives in libs/core/kiln_ai/synthetic_user/ so the runner can call the LLM using the user's own provider keys rather than a hosted endpoint. Modules: - models — Pydantic SyntheticUserInfo (parsed form) + SyntheticUserDriverConfig. - parser — tagged-blob ↔ SyntheticUserInfo. Required: <persona>, <goal>; optional: <behavior_guidance>. Unknown tags ignored (forward-compat). - role_swap — flips eval-frame user/assistant labels into LLM-frame labels; raises on system/tool roles (the driver filters those upstream) and on non-string content. - prompt — persona-playing system prompt. No <DONE>/<CANCEL> guidance: drive loop is fixed-length; SU stays engaged across the conversation. - driver — SyntheticUserDriver. Parses the blob once at construction, renders the system prompt once, builds the adapter once. respond() filters visible roles, role-swaps, prepends the system prompt as prior_trace[0], calls adapter.invoke_returning_run_output (in-memory — the SU never persists a TaskRun), returns the raw string. 56 unit tests covering: parser roundtrip / required-tag enforcement / whitespace / unknown-tag forward-compat; role_swap empty/alternating/ preserves-order/raises-on-system-or-tool; prompt structural assertions (persona/goal/conventions present, behavior_guidance only when set, no <DONE>/<CANCEL>); driver happy path, role-swap shape, custom visible_roles, ends-on-assistant invariant, non-string output guard, parse-error on construction, adapter reuse across turns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up an OpenAPI description on GenerateSyntheticUsersResponse.cases documenting the strict-N batch contract. No shape change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thin async wrapper around the SDK's /v1/synthetic_user/generate endpoint. The SDK now parses 401/422/500/502 into typed response models, so the wrapper switches on the parsed type rather than reading raw bytes — 502 surfaces its typed `code` literal (llm_unavailable | upstream_invalid_output) directly to callers. No retry loop. /generate is a once-per-batch authoring call; kiln_server's pipeline already retries transient provider failures internally before returning 502, so a 502 reaching us is a genuine per-batch failure that should propagate. Drops the v1 client's SyntheticUserTransientError + backoff machinery. No /respond. Per-turn synthetic-user invocation lives at libs/core/kiln_ai/synthetic_user/ and runs locally with the user's keys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an explicit "your entire output is the user's next message, verbatim
and nothing else: no narration, no meta-commentary, no quotes, no labels
like 'User:'" clause to the persona-playing system prompt.
A team running similar SU-driven evals reported the persona-playing
model frequently breaks character — narrating ("I would now ask..."),
self-evaluating, or labeling its output. This clause pins that down at
the prompt boundary so we don't end up reaching for post-processing
band-aids later.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drive_loop.py:
- drive_case(*, case, target_invoker, su_driver, turns, on_turn) runs the
loop for exactly `turns` iterations — no early termination, no
stop_signal plumbing. Returns DriveCaseResult(chain) with the persisted
TaskRun chain.
- TargetInvoker + TurnHook Protocols. The SU driver does all role
filtering / role swap / invariant checks internally; the drive loop
passes the cumulative trace as-is.
runner.py:
- run_cases_batch is an async generator yielding typed BatchEvents
(BatchStartedEvent / TurnCompletedEvent / CaseCompletedEvent /
CaseFailedEvent / BatchCompletedEvent). No stop_signal/stop_reason
fields — drive loop is fixed-length.
- Constructs a SyntheticUserDriver per case; a malformed
synthetic_user_info blob surfaces as a CaseFailedEvent for that case
alone (other cases continue).
- _make_target_invoker / _build_input_source / _tag_leaf patterns kept
from the prior v1 commits (target persistence + SU attribution
unchanged). input_source now carries the opaque blob on the root run
+ slim {batch_tag, turn_index} on subsequent turns.
- Per-case try/except now WRAPS _tag_leaf too, so a save_to_file failure
surfaces as case_failed instead of silently disappearing into
asyncio.gather(return_exceptions=True). Same try also wraps the
target_invoker construction.
- Case tasks are kicked off before the first BatchStartedEvent yield and
the entire drain loop is inside a try/finally that cancels them on
consumer disconnect — fixes the v1 issue where browser disconnect kept
the request alive for the full duration of every in-flight case.
14 tests cover: input validation, happy-path event stream, leaf tagging,
auto-generated batch_tag, malformed blob → case_failed, target invoke
failure → case_failed, tag-save failure → case_failed, concurrency
semaphore enforcing max-in-flight, root vs slim input_source
attribution, and consumer cancellation propagating to case tasks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two routes for the multi-turn synthetic-user data-generation pipeline: - POST .../multiturn_sdg/generate_cases (sync JSON) - POST .../multiturn_sdg/run_cases_batch (SSE via CancellableStreamingResponse) Wires connect_multiturn_sdg_api into desktop_server.make_app and registers the Multiturn SDG tag in kiln_server's tags_metadata so the regenerated api_schema.d.ts surfaces the routes in the typed client. Both routes guard task.turn_mode == multiturn before doing any upstream work and route SyntheticUserClient typed errors through to faithful HTTP statuses (401/422/502 preserved, not collapsed). The SSE route threads build_save_context(request) into run_cases_batch and uses an isinstance whitelist on the JSON encoder so future Pydantic types on the wire need explicit review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Rename total_cost -> target_total_cost on CaseCompletedEvent and BatchCompletedEvent. The runner only sees target adapter spend; the SU driver's per-turn cost isn't rolled up here. Old name was misleading in a beta where users pick the SU model. - Thread an optional save_context through run_cases_batch and wrap the leaf-tag save. Adapter writes inside adapter.invoke still bypass — a kiln_ai-side gap shared with the chat SSE pattern, documented in the runner docstring. - Add a re-run idempotency test for _tag_leaf to lock in the spec's "set-union + sort, preserves pre-existing tags" contract. - Drop the dead UNSET/None branch in client._code_or_default; the remaining one-liner has identical behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Rename DEFAULT_TURNS -> MAX_TURNS_DEFAULT to match spec naming. - Name asyncio.create_task instances so debug dumps point at this code. - Pre-assert non-empty seed_prompt in drive_case (assert-loud invariant). - Document invariants on _make_target_invoker (sequential-per-case), _tag_leaf (one-writer-per-leaf), and _close_when_done (final put on cancel path goes into the void). - Drop the unreachable generic fallback in _to_http_exception; tighten the param type to the two real subclasses so the type checker enforces exhaustiveness at the call site. - Log a warning in _format_validation_detail when every item is skipped so a silent SDK shape drift surfaces. - Tests: parameterize turns<1 with negatives, lock in _event_to_payload's unregistered-event guard, and couple the auto-batch_tag test to the public regex instead of the implementation. - Stale "Phase 3" docstring scrub + f-string cosmetic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root TaskRun's input_source.properties now carries the decomposed SU case context — persona, goal, behavior_guidance (when present), seed_prompt — instead of the opaque tagged blob. Lets dataset readers and eval tooling inspect SU attribution by direct property access rather than re-parsing the XML each time. The blob is losslessly reconstructable from these fields via build_synthetic_user_info if a downstream tool needs the original wire form. Parse happens once per case in _build_input_source on the root turn; the SU driver constructor already validated the blob, so the re-parse here can't surface a new error class. behavior_guidance is omitted when the parser returns None (the DataSource validator rejects empty strings). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SyntheticUserDriver.respond now returns (message, cost) — the per-call cost is read from the in-memory TaskRun's usage.cost (the only place SU spend surfaces, since SU turns aren't persisted as TaskRuns). drive_case accumulates su_total_cost across turns and exposes it on DriveCaseResult. The runner adds it to the leaf's cumulative_usage.cost to produce an honest CaseCompletedEvent.total_cost — renamed from target_total_cost since the field now reports total spend, not just the target adapter's. BatchCompletedEvent.total_cost sums across successful cases the same way. Matters now because the SU model is user-selectable: someone picking Sonnet for higher-quality probes would have had ~half their spend invisible under the old target-only total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…source Every input to the filter has stronger upstream protection now: seed_prompt is asserted non-empty in drive_case; persona and goal are required-non-empty by parse_synthetic_user_info; behavior_guidance is already conditionally skipped if None; the remaining keys are Pydantic- validated or non-string. The filter was guarding nothing. The DataSource validator stays as the real backstop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure relocation + boundary update; behavior unchanged.
run_cases_batch and drive_case now live at
libs/core/kiln_ai/synthetic_user/{runner,drive_loop}.py alongside the
existing SyntheticUserDriver. Same neighborhood as EvalRunner /
RagJobRunner / ExtractorRunner — runners belong in libs/core.
To make libs/core SDK-agnostic, introduce a small
kiln_ai.synthetic_user.SyntheticUserCase Pydantic model (two fields,
field-identical to the kiln_server SDK's case shape). The
multiturn_sdg_api route validates dicts straight into the libs/core type
via Pydantic, so the runner never sees the SDK class. The SDK case is
still used for `/generate_cases` output via `to_dict()` — nothing
about that pro-gated authoring path changes.
Tests move with the code. studio_server keeps only the SDK-wrapper
SyntheticUserClient and the FastAPI route, which is exactly the
established shape for eval_api driving EvalRunner.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WalkthroughThis PR introduces a complete multi-turn synthetic data generation pipeline enabling local execution of multi-turn synthetic user conversations against target tasks. The implementation includes a batch runner with concurrent case execution, per-turn drivers that invoke adapters, FastAPI endpoints exposing generation and execution, an SDK client wrapper for upstream case generation, comprehensive event streaming via SSE, and end-to-end test coverage. ChangesMulti-turn Synthetic Data Generation Pipeline
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces multi-turn synthetic data generation (SDG) capabilities, adding FastAPI routes, a local synthetic-user driver, client wrappers, and comprehensive unit tests, alongside updates to tracking models. The review feedback highlights several critical issues: multiple model files (chat_session_list_item.py, kiln_base_model.py, task_output.py, task_output_rating.py, and task_run.py) use datetime.datetime.fromisoformat without importing the datetime module, which will cause runtime NameErrors. Additionally, manually overriding the Content-Type header with a hardcoded boundary in the prompt optimization endpoint is fragile and should be removed, and role_swap.py needs to gracefully handle None content in assistant messages to prevent crashes during tool-use turns.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
📊 Coverage ReportOverall Coverage: 92% Diff: origin/leonard/kil-632-feat-multiturn-task...HEAD
Summary
Line-by-lineView line-by-line diff coverageapp/desktop/studio_server/synthetic_user/client.pyLines 169-178 169 parts: list[str] = []
170 skipped = 0
171 for item in detail:
172 if not isinstance(item, ValidationError):
! 173 skipped += 1
! 174 continue
175 loc = ".".join(str(x) for x in item.loc)
176 parts.append(f"{loc}: {item.msg}")
177 if not parts:
178 # The SDK's HTTPValidationError.detail had items the SDK couldn'tLines 178-187 178 # The SDK's HTTPValidationError.detail had items the SDK couldn't
179 # parse as ValidationError — a shape we don't expect today. Log
180 # so we can spot the discrepancy if it ever appears in the wild,
181 # instead of silently returning the empty fallback.
! 182 if skipped:
! 183 logger.warning(
184 "HTTPValidationError carried %d non-ValidationError detail item(s); "
185 "raw detail repr: %r",
186 skipped,
187 detail,Lines 185-191 185 "raw detail repr: %r",
186 skipped,
187 detail,
188 )
! 189 return "Validation error (no detail)."
190 return "Validation error: " + "; ".join(parts)libs/core/kiln_ai/synthetic_user/drive_loop.pyLines 97-105 97 # Assert-loud on missing seed. An empty string would silently flow
98 # into the target adapter and surface as a confusing model-side error
99 # rather than a clean "the case is malformed" signal.
100 if not case.seed_prompt:
! 101 raise ValueError("case.seed_prompt must be a non-empty string")
102
103 user_msg: str = case.seed_prompt
104 prev_run: TaskRun | None = None
105 prev_trace: list[ChatCompletionMessageParam] | None = Nonelibs/core/kiln_ai/synthetic_user/driver.pyLines 114-122 114 swapped = role_swap(visible)
115 last = swapped[-1]
116 user_input = last["content"]
117 if not isinstance(user_input, str):
! 118 raise RuntimeError(
119 "synthetic user input must be a plain string after role_swap"
120 )
121
122 system_msg: ChatCompletionSystemMessageParam = {libs/core/kiln_ai/synthetic_user/role_swap.pyLines 41-49 41 # the target. Narrowing here lets us assign into the swapped wrapper
42 # type without a cast.
43 content = msg["content"]
44 if not isinstance(content, str):
! 45 raise ValueError(
46 f"role_swap requires string content for role {role!r}; "
47 f"got {type(content).__name__}"
48 )
49 if role == "user":libs/core/kiln_ai/synthetic_user/runner.pyLines 416-424 416 missing (defensive against fakes in unit tests that don't populate it).
417 """
418 usage = getattr(run, "cumulative_usage", None)
419 if usage is None:
! 420 return 0.0
421 return float(getattr(usage, "cost", None) or 0.0)
422
423
424 def _tag_leaf(leaf: TaskRun, batch_tag: str) -> None:
|
… role_swap Tool-using targets emit assistant turns with content=None and tool_calls set — pure tool dispatches, not user-facing speech. Pre-this-fix, those hit role_swap's strict-content invariant and crashed the SU run. Gemini's suggestion (coerce None → "") would have let them through but degraded the SU LLM's conversation view to consecutive user turns with empty content — silently worse than the crash. The right place to filter is at the driver, next to the existing visible_message_roles filter — "what's visible to the SU" is the driver's responsibility. role_swap stays strict on None content (the trip wire for any caller bypassing the driver's filter). Filter predicate: drop assistant turns where content is None. Keep assistant turns that carry text alongside tool_calls — the text is user-facing speech the SU should respond to. Addresses gemini-code-assist comment on PR #1441 / role_swap.py without applying the suggested empty-string coercion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…context Fix comment numbering in driver.py (4→5), correct "greedy" to "non-greedy" in parser.py, remove inaccurate drive-loop claim from studio_server __init__. Strip historical /respond migration references, remove app-layer concerns (SSE, @no_write_lock) from SDK-level docstrings, deduplicate cost-attribution explanations across driver/runner/drive_loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ask' into dchiang/multiturn-synthetic-user
Stray U+200B (zero-width space) between "disables/" and "spinners" in a comment tripped eslint no-irregular-whitespace. Likely a paste artifact from Leonard's recent commit; fixed in passing during the merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||
| headers["Content-Type"] = "multipart/form-data; boundary=+++" | ||
|
|
||
| _kwargs["headers"] = headers |
There was a problem hiding this comment.
All the changes under /api_client are files copied from the new server SDK. No need to review those.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
libs/core/kiln_ai/synthetic_user/runner.py (1)
57-66: ⚖️ Poor tradeoff
TurnCompletedEvent.cumulative_costomits SU-driver spend whileCaseCompletedEvent.total_costincludes it.A live cost ticker driven off
cumulative_costwill undercount during turns, then jump up whencase_completedaddsresult.su_total_cost. This matches the documented "honest totals only at case end" intent, so it's not a bug — just flagging the per-turn vs per-case inconsistency in case the UI relies on a smooth running total. Threading the running SU cost intoon_turnwould remove the jump.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@libs/core/kiln_ai/synthetic_user/runner.py` around lines 57 - 66, TurnCompletedEvent.cumulative_cost currently excludes SU-driver spend while CaseCompletedEvent.total_cost includes it, causing per-turn cost undercounts then a jump at case completion; update the on-turn flow to thread the running SU cost into each TurnCompletedEvent so cumulative_cost reflects assistant+SU spend per turn (adjust the code paths that construct TurnCompletedEvent and any function handling on_turn to accept and pass the incremental su_running_cost), and ensure CaseCompletedEvent.total_cost still aggregates final su_total_cost so the live ticker remains smooth and consistent with the end-of-case total.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/web_ui/src/lib/api_schema.d.ts`:
- Around line 17079-17104: The OpenAPI docs currently advertise
stream_run_cases_batch
(stream_run_cases_batch_api_projects__project_id__tasks__task_id__multiturn_sdg_run_cases_batch_post)
as returning "application/json" but the route actually returns a
StreamingResponse with media_type="text/event-stream"; update the FastAPI route
in app/desktop/studio_server/multiturn_sdg_api.py to declare the 200 response
content type as "text/event-stream" (e.g., add responses={200: {"content":
{"text/event-stream": {"schema": {"type":"string"}}}}} or set
response_class/response_model metadata appropriately) so the OpenAPI spec
reflects SSE, then run app/web_ui/src/lib/generate_schema.sh to regenerate
app/web_ui/src/lib/api_schema.d.ts; do not manually edit the generated TS file.
---
Nitpick comments:
In `@libs/core/kiln_ai/synthetic_user/runner.py`:
- Around line 57-66: TurnCompletedEvent.cumulative_cost currently excludes
SU-driver spend while CaseCompletedEvent.total_cost includes it, causing
per-turn cost undercounts then a jump at case completion; update the on-turn
flow to thread the running SU cost into each TurnCompletedEvent so
cumulative_cost reflects assistant+SU spend per turn (adjust the code paths that
construct TurnCompletedEvent and any function handling on_turn to accept and
pass the incremental su_running_cost), and ensure CaseCompletedEvent.total_cost
still aggregates final su_total_cost so the live ticker remains smooth and
consistent with the end-of-case total.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: dd8cddc7-7358-4d89-a388-06e6d09f5738
⛔ Files ignored due to path filters (20)
app/desktop/studio_server/api_client/kiln_ai_server_client/api/jobs/start_prompt_optimization_job_v1_jobs_prompt_optimization_job_start_post.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/api/jobs/start_sample_job_v1_jobs_sample_job_start_post.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/api/synthetic_user/__init__.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/api/synthetic_user/generate_v1_synthetic_user_generate_post.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/__init__.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/chat_completion_assistant_message_param_wrapper.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/chat_session_list_item.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/generate_synthetic_users_request.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/generate_synthetic_users_response.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/generate_v1_synthetic_user_generate_post_response_401.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/generate_v1_synthetic_user_generate_post_response_500.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/generate_v1_synthetic_user_generate_post_response_502.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/generate_v1_synthetic_user_generate_post_response_502_code.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/kiln_base_model.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/message_usage.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/synthetic_user_case.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/task_output.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/task_output_rating.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/task_run.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**app/desktop/studio_server/api_client/kiln_ai_server_client/models/usage.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**
📒 Files selected for processing (26)
app/desktop/desktop_server.pyapp/desktop/studio_server/multiturn_sdg_api.pyapp/desktop/studio_server/synthetic_user/__init__.pyapp/desktop/studio_server/synthetic_user/client.pyapp/desktop/studio_server/synthetic_user/test_client.pyapp/desktop/studio_server/test_multiturn_sdg_api.pyapp/web_ui/src/lib/api_schema.d.tsapp/web_ui/src/lib/ui/conversation/multiturn_composer.sveltelibs/core/kiln_ai/synthetic_user/__init__.pylibs/core/kiln_ai/synthetic_user/case.pylibs/core/kiln_ai/synthetic_user/drive_loop.pylibs/core/kiln_ai/synthetic_user/driver.pylibs/core/kiln_ai/synthetic_user/models.pylibs/core/kiln_ai/synthetic_user/parser.pylibs/core/kiln_ai/synthetic_user/prompt.pylibs/core/kiln_ai/synthetic_user/role_swap.pylibs/core/kiln_ai/synthetic_user/runner.pylibs/core/kiln_ai/synthetic_user/test_case.pylibs/core/kiln_ai/synthetic_user/test_drive_loop.pylibs/core/kiln_ai/synthetic_user/test_driver.pylibs/core/kiln_ai/synthetic_user/test_models.pylibs/core/kiln_ai/synthetic_user/test_parser.pylibs/core/kiln_ai/synthetic_user/test_prompt.pylibs/core/kiln_ai/synthetic_user/test_role_swap.pylibs/core/kiln_ai/synthetic_user/test_runner.pylibs/server/kiln_server/server.py
| stream_run_cases_batch_api_projects__project_id__tasks__task_id__multiturn_sdg_run_cases_batch_post: { | ||
| parameters: { | ||
| query?: never; | ||
| header?: never; | ||
| path: { | ||
| /** @description ID of the project containing the target task. */ | ||
| project_id: string; | ||
| /** @description ID of the target task. Must be a multi-turn task. */ | ||
| task_id: string; | ||
| }; | ||
| cookie?: never; | ||
| }; | ||
| requestBody: { | ||
| content: { | ||
| "application/json": components["schemas"]["RunCasesBatchApiInput"]; | ||
| }; | ||
| }; | ||
| responses: { | ||
| /** @description Successful Response */ | ||
| 200: { | ||
| headers: { | ||
| [name: string]: unknown; | ||
| }; | ||
| content: { | ||
| "application/json": unknown; | ||
| }; |
There was a problem hiding this comment.
run_cases_batch response media type is mis-modeled as JSON instead of SSE.
stream_run_cases_batch is typed with 200 -> application/json, but the backend route returns StreamingResponse(..., media_type="text/event-stream") (see app/desktop/studio_server/multiturn_sdg_api.py). This weakens the generated client contract for streaming and can break typed frontend consumption.
Please update the backend route OpenAPI metadata/response docs to advertise text/event-stream, then regenerate app/web_ui/src/lib/api_schema.d.ts via app/web_ui/src/lib/generate_schema.sh rather than editing this file directly.
Based on learnings: "app/web_ui/src/lib/api_schema.d.ts is auto-generated by openapi-typescript; do not propose manual edits. Schema changes should be made in the FastAPI backend … then re-generate the TS types."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/web_ui/src/lib/api_schema.d.ts` around lines 17079 - 17104, The OpenAPI
docs currently advertise stream_run_cases_batch
(stream_run_cases_batch_api_projects__project_id__tasks__task_id__multiturn_sdg_run_cases_batch_post)
as returning "application/json" but the route actually returns a
StreamingResponse with media_type="text/event-stream"; update the FastAPI route
in app/desktop/studio_server/multiturn_sdg_api.py to declare the 200 response
content type as "text/event-stream" (e.g., add responses={200: {"content":
{"text/event-stream": {"schema": {"type":"string"}}}}} or set
response_class/response_model metadata appropriately) so the OpenAPI spec
reflects SSE, then run app/web_ui/src/lib/generate_schema.sh to regenerate
app/web_ui/src/lib/api_schema.d.ts; do not manually edit the generated TS file.
What does this PR do?
kiln_ai.synthetic_user.runner—drive_case+run_cases_batchSyntheticUserCasecontractSyntheticUserClientwrapping kiln_server/v1/synthetic_user/generategenerate_cases(sync) +run_cases_batch(SSE)Pipeline
/generate(pro-gated, kiln-AI keys)synthetic_user_case+synthetic_user_batch:<tag>asyncio.Semaphore(4); stream BatchEvents over SSENotable
total_costhonestly sums target adapter + SU driver spendNUM_CASES_MAX=10,MAX_TURNS_DEFAULT=5,CONCURRENCY=4Flow
Test plan
_smoke.py, untracked): 3 hand-crafted cases → 3 persisted chains, $0.04 totalRelated Issues
Contributor License Agreement
I, @, confirm that I have read and agree to the Contributors License Agreement.
Checklists