feat: multiturn tasks#1409
Conversation
Introduce the backbone for multiturn tasks at the data layer: - Add TurnMode enum (single_turn, multiturn) in datamodel_enums. - Add Task.turn_mode field defaulting to single_turn, with a model_validator rejecting multiturn combined with input_json_schema or output_json_schema. - Enforce turn_mode immutability in task_api.update_task (400 on attempted change post-creation), with string-to-enum coercion for patch payloads. - Add backend pytest coverage for defaults, JSON migration of legacy tasks without the field, forbidden schema combinations, and the update_task immutability guard. - Add Phase 1 spec artifacts under specs/projects/multiturn_tasks/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build on the Phase 1 data model to plumb continuation runs through the run endpoint and exclude interior nodes from list views and dataset splits: - Add RunTaskRequest.parent_task_run_id field on the run endpoint, so callers can mark a run as a continuation of an existing one. - Plumb parent_task_run_id through run_task: load the parent TaskRun, pass its trace as prior_trace and the run itself as parent_task_run to the adapter; reject single-turn tasks with 400 and missing parent ids with 404. - Filter get_runs_summary down to leaves only — drop runs whose ids appear as parent_task_run_id on any other run, so the runs table shows one row per conversation. - Apply the same leaf-only filter in DatasetSplit.build_split_contents so dataset splits sample whole conversations rather than interior turns. - Backend pytest coverage for the continuation flow, validation errors, both leaf filters, and regenerate api_schema.d.ts. - Mark Phase 2 complete in the implementation plan and add the Phase 2 spec artifact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surface the Phase 1 data model in the task creation/edit UI:
- Add a radio-style turn_mode picker (Single-turn / Multiturn) as a new
"Part 2: Task Type" section in edit_task.svelte, with renumbered
Input/Output/Requirements parts. Defaults to single_turn for new
tasks and reflects task.turn_mode for existing ones.
- Conditionally hide the input + output schema editors when multiturn
is selected, replacing them with short explanatory notes
("Multiturn tasks use plain-text input." / "Structured output is not
supported for multiturn tasks yet."), and skip the schema fields in
the create POST body so the backend doesn't get a stale string.
- Guard inputSchemaSection / outputSchemaSection access in has_edits
with optional chaining so multiturn forms (where the bind:this
targets don't mount) don't blow up.
- Add a read_only_turn_mode prop wired by load_task_editor.svelte so
the edit page renders turn_mode as an immutable label with a hint,
while clone mode still gets the editable picker (cloned task gets a
fresh id, so a new turn_mode is allowed).
- Export TurnMode from $lib/types so the component can reference the
literal-union type.
- New vitest suite edit_task.test.ts covering default render, toggle
behavior, conditional schema sections, read-only mode, and POST
payload contents for both turn modes.
- Mark Phase 3 complete in the implementation plan and add the Phase 3
spec artifact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Frontend-only phase that completes the multiturn-task UX end-to-end,
building on the Phase 1-3 data model, run plumbing, and task-edit UI:
- Branch /run send-success on multiturn tasks to goto the dataset
run-detail page for the newly-created TaskRun, leaving single-turn
flows untouched.
- Two-column multiturn layout on the dataset run-detail page: left
column shows ConversationView + composer (RunInputForm wrapped in
FormContainer with keyboard submit); right column shows the existing
PropertyList plus a SavedRunConfigurationsDropdown +
RunConfigComponent so the user can switch model/prompt before each
Send. Single-turn run-detail rendering preserved unchanged.
- New conversation_view.svelte + message_block.svelte under
lib/ui/conversation/: flat thread of role-based message blocks
rendered from TaskRun.trace, with reasoning + tool calls collapsed
inside the assistant block and a wrap fix so long messages do not
blow out the column.
- New multiturn_send.ts helper that owns the run-config-component
resolve/build, POST to /run with parent_task_run_id, and post-send
cleanup; covered by multiturn_send.test.ts.
- URL replaceState on each Send so back-button stays sane while the
conversation grows.
- Run-config picker is wrapped in {#key run.id} so it re-seeds from
the new run's config after every Send instead of sticking to the
initial selection.
- Frontend test coverage for conversation_view, message_block, and
multiturn_send.
- Mark Phase 4 complete in the implementation plan and add the
Phase 4 spec artifact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eat-multiturn-task # Conflicts: # app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte
Replace the multiturn ConversationView with the shared Trace component used by single-turn runs so messages render in collapsible role-labeled blocks (system, user, assistant, tool). Add a markdown_content flag to Trace that renders non-tool message content (and reasoning) via ChatMarkdown, preserving the markdown rendering the multiturn UI previously had. The plaintext input form continues to render below the trace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds GET /api/projects/{project_id}/tasks/{task_id}/runs/{run_id}/ancestors
which walks parent_task_run_id from leaf to root and returns the ordered
chain with a chain_broken flag. Defensively handles missing parents,
cycles, runaway depth, and chains longer than the leaf trace's user-message
count. Includes pytest coverage and regenerated OpenAPI types.
This is Phase 1 of the multiturn trace forking project; the endpoint will
back the frontend fork affordance in Phase 2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of the multiturn-trace-forking project. Adds the user-facing fork affordance on the run-detail page so users can branch a multiturn conversation from any eligible past user turn. - Extract the inline append composer into a reusable `MultiturnComposer` and add a fork mode (prefill, context strip, Cancel with dirty-confirm dialog, dirty-switch confirmation when changing fork target mid-edit). - Add `ForkIcon.svelte` to match the existing icon conventions. - Extend `trace.svelte` with optional `forkable_run_ids`, `truncate_at_trace_index`, and `on_fork` props; render the fork button inside the expanded user block body so it only appears once a turn is opened. - Wire fork state in the run-detail `+page.svelte`: ancestors fetch, `compute_forkable_run_ids`, fork-target state, and non-blocking banners for `chain_broken` and ancestors load failures. - Extract `RunSidebar` and mount it on both the single-turn and multiturn run pages so Rating, Tags, and Usage have parity. - Add vitest suites for `fork_helpers`, `trace`, and `multiturn_composer`. - Update `functional_spec.md` and `ui_design.md` to reflect the in-expanded-body fork-button placement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis pull request introduces multiturn task support: a ChangesMultiturn task and run system
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Code Review
This pull request implements multiturn task support across the stack, introducing task turn modes, conversation-based run views, and a forking mechanism for trace branches. Key backend additions include ancestor chain traversal and cascade deletion logic, while the frontend sees new components for conversation management and a refactored run sidebar. Review feedback highlights opportunities for performance optimization in subtask usage calculations and disk I/O during deletions. It also identifies a UX bug regarding event propagation on the fork button, an architectural violation in component dependencies, and a missing reactive dependency in the ratings UI.
| # the loader for each ancestor. We need the full chain view here. | ||
| all_runs = task.filter_runs(include_intermediate_runs=True, readonly=True) | ||
| children_by_parent: Dict[str, list[str]] = {} | ||
| for r in all_runs: | ||
| if r.parent_task_run_id and r.id is not None: | ||
| children_by_parent.setdefault(r.parent_task_run_id, []).append(str(r.id)) | ||
|
|
||
| to_delete: list[TaskRun] = [target] | ||
| deleted_ids: set[str] = set() | ||
| if target.id is not None: | ||
| deleted_ids.add(str(target.id)) | ||
|
|
||
| visited: set[str] = set(deleted_ids) | ||
| current = target | ||
| for _ in range(_MAX_ANCESTOR_DEPTH): | ||
| parent_id = current.parent_task_run_id | ||
| if parent_id is None: | ||
| break | ||
| if parent_id in visited: | ||
| # Cycle: stop here, but everything queued so far is still valid. | ||
| break | ||
| parent = TaskRun.from_id_and_parent_path(parent_id, task.path) | ||
| if parent is None: | ||
| # Chain broken: stop the cascade, don't 500. | ||
| break | ||
| live_children = [ | ||
| cid | ||
| for cid in children_by_parent.get(parent_id, []) | ||
| if cid not in deleted_ids | ||
| ] | ||
| if live_children: | ||
| # A sibling branch survives — keep this parent. | ||
| break | ||
| to_delete.append(parent) | ||
| if parent.id is not None: | ||
| deleted_ids.add(str(parent.id)) | ||
| visited.add(str(parent.id)) | ||
| current = parent |
There was a problem hiding this comment.
Redundant disk I/O in _collect_cascade_delete_runs. The function already loads all runs for the task into all_runs at line 93. Instead of calling TaskRun.from_id_and_parent_path in a loop (which hits the filesystem repeatedly), you should use a lookup dictionary built from the all_runs list already in memory.
There was a problem hiding this comment.
Addressed — _collect_cascade_delete_runs now builds a runs_by_id dict alongside children_by_parent from the single all_runs load, and the ancestor walk uses runs_by_id.get(parent_id) instead of TaskRun.from_id_and_parent_path.
📊 Coverage ReportOverall Coverage: 92% Diff: origin/main...HEAD
Summary
Line-by-lineView line-by-line diff coveragelibs/core/kiln_ai/utils/dataset_import.pyLines 356-364 356 (`tags -> 0: ...`) when constructing TaskRuns downstream.
357 """
358 for tag in tags:
359 if not tag:
! 360 raise KilnInvalidImportFormat(
361 "Tags cannot be empty strings.",
362 row_number=row_number,
363 )
364 if " " in tag:Lines 392-400 392
393 messages: list[ValidatedMessage] = []
394 for k, msg in enumerate(trace, start=1):
395 if not isinstance(msg, dict):
! 396 raise KilnInvalidImportFormat(
397 f"message {k}: must be a JSON object.",
398 row_number=row_number,
399 )Lines 447-455 447 if role == "assistant":
448 rc = msg.get("reasoning_content")
449 if rc is not None:
450 if not isinstance(rc, str):
! 451 raise KilnInvalidImportFormat(
452 f"message {k}: 'reasoning_content' must be a string.",
453 row_number=row_number,
454 )
455 if not rc:Lines 559-567 559 with open(dataset_path, "r", newline="", encoding="utf-8") as csvfile:
560 reader = csv.DictReader(csvfile)
561
562 if not reader.fieldnames:
! 563 raise KilnInvalidImportFormat(
564 "CSV file appears to be empty or missing headers"
565 )
566
567 actual_headers = set(reader.fieldnames)Lines 574-582 574 f"Got: {', '.join(sorted(actual_headers))}."
575 )
576 missing_headers = required_headers - actual_headers
577 if missing_headers:
! 578 raise KilnInvalidImportFormat(
579 f"Missing required headers: {', '.join(missing_headers)}. "
580 f"Required headers are: {', '.join(required_headers)}"
581 )Lines 581-589 581 )
582
583 unknown_headers = actual_headers - (required_headers | optional_headers)
584 if unknown_headers:
! 585 logger.warning(
586 f"Unknown headers in CSV file will be ignored: {', '.join(unknown_headers)}"
587 )
588
589 for row_number, row in enumerate(reader, start=2):Lines 593-603 593 **row,
594 "tags": deserialize_tags(row.get("tags")),
595 }
596 )
! 597 except ValidationError as e:
! 598 logger.warning(f"Invalid row {row_number}: {row}", exc_info=True)
! 599 raise KilnInvalidImportFormat(
600 format_validation_error(e),
601 row_number=row_number,
602 ) from elibs/server/kiln_server/run_api.pyLines 73-82 73 chain.append(parent)
74 if parent.id is not None:
75 visited.add(str(parent.id))
76 current = parent
! 77 chain.reverse()
! 78 return chain, True
79
80
81 def _collect_cascade_delete_runs(
82 task: Task,Lines 147-155 147
148
149 def _count_user_messages(trace: list[Any] | None) -> int:
150 if not trace:
! 151 return 0
152 return sum(1 for m in trace if isinstance(m, dict) and m.get("role") == "user")
153
154
155 def deep_update(Lines 707-715 707 status_code=404,
708 detail=f"Parent run not found. ID: {request.parent_task_run_id}",
709 )
710 if not parent_task_run.trace:
! 711 raise HTTPException(
712 status_code=400,
713 detail="Parent run cannot be continued because it has no trace.",
714 )
715 prior_trace = parent_task_run.trace
|
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/web_ui/src/lib/ui/icons/info_circle_icon.svelte (1)
1-15: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd TypeScript script block per coding guidelines.
The component should include a
<script lang="ts">block for consistency with the project's TypeScript requirement, even if currently empty. This ensures the guideline is met and makes future prop additions straightforward. As per coding guidelines, all frontend code inapp/web_ui/**/*.svelteshould use TypeScript.📝 Proposed addition
+<script lang="ts"> + export let class: string | undefined = undefined; +</script> + <svg - class="w-full h-full" + class="w-full h-full {class}" fill="currentColor" viewBox="0 0 1024 1024" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" >🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/web_ui/src/lib/ui/icons/info_circle_icon.svelte` around lines 1 - 15, The SVG Svelte component file lacks the required TypeScript script block; add a top-level <script lang="ts"> block in app/web_ui/src/lib/ui/icons/info_circle_icon.svelte (even if empty) to satisfy the project's TypeScript guideline for Svelte components and make future prop typing easier; place it before the SVG markup and include any future exports (e.g., export let ...) there when needed.
🧹 Nitpick comments (6)
app/desktop/studio_server/test_eval_api.py (1)
1570-1570: ⚡ Quick winAssert
readonlyis forwarded tofilter_runs.This test now mocks
filter_runs, but it doesn’t assert the expected call signature. Add an assertion to ensureruns_in_filter(..., readonly=True)actually callsmock_task.filter_runs(readonly=True).Suggested patch
result = runs_in_filter(mock_task, "tag::some_filter", readonly=True) # Verify the results assert len(result) == 2 assert result[0].id == "run1" assert result[1].id == "run3" + mock_task.filter_runs.assert_called_once_with(readonly=True) # Verify the filter was called for each run assert mock_filter.call_count == 3 mock_dataset_filter_from_id.assert_called_once_with("tag::some_filter")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/desktop/studio_server/test_eval_api.py` at line 1570, The test sets mock_task.filter_runs.return_value but doesn't assert that the readonly flag is forwarded; update the test to assert that runs_in_filter (the code under test) calls mock_task.filter_runs with readonly=True by adding an assertion like mock_task.filter_runs.assert_called_once_with(readonly=True) (or assert_called_with(readonly=True) if multiple calls are expected) after the invocation so the test verifies the call signature.app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.ts (1)
38-65: 💤 Low valueConsider logging when ancestor count exceeds user turns.
The suffix-alignment logic at lines 55–60 handles the edge case where
ancestors.length > total_turns(which would be a data inconsistency) by allowingoffsetto be negative. Whenoffset + kis negative for smallk,user_trace_indices[offset + k]returnsundefined, and line 60 guards against that. While this is safe, silently ignoring the mismatch could make debugging harder if the data model is violated. Adding aconsole.warnwhenoffset < 0would surface this edge case without breaking functionality.📝 Optional improvement to log edge case
const total_turns = user_trace_indices.length if (total_turns === 0 || ancestors.length === 0) { return result } // Suffix-align the ancestor list against the user messages: the last // ancestor (the leaf) corresponds to the last user message. const offset = total_turns - ancestors.length + if (offset < 0) { + console.warn( + "More ancestors than user turns in trace — data inconsistency", + { ancestors_count: ancestors.length, user_turns: total_turns }, + ) + } for (let k = 0; k < ancestors.length; k++) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.ts around lines 38 - 65, The function compute_forkable_run_ids silently tolerates ancestors.length > total_turns by letting offset be negative; add a warning when this happens to surface the data inconsistency: inside compute_forkable_run_ids, after computing const offset = total_turns - ancestors.length, check if offset < 0 and emit a console.warn (or use an existing logger) that includes values for total_turns, ancestors.length, and maybe a brief context (e.g., project/run IDs if available) so developers can detect when ancestors and user_trace_indices are misaligned; keep the existing guard for user_trace_indices[offset + k] so behavior is unchanged.app/web_ui/src/lib/ui/trace/trace.svelte (2)
37-37: 💤 Low value
messageExpandedis initialized once and not re-synced withtracelength.If the
traceprop is replaced with a different-length array (e.g., page-level state swaps the trace without unmountingTrace),messageExpandedkeeps its old length, which can desyncbind:checked={messageExpanded[index]}against the new messages and persist stale expansion state across runs. In the multiturn page today the parent doesrun = nullbefore navigating, so this component is currently torn down and remounted — but that's a fragile contract for a reusable UI component.Consider re-syncing reactively when the trace identity changes:
♻️ Suggested change
- // Track collapsed state for each message (true = expanded, false = collapsed) - let messageExpanded: boolean[] = trace.map(() => false) + // Track collapsed state for each message (true = expanded, false = collapsed). + // Re-sync whenever the trace identity or length changes so we don't carry + // stale expansion state across run navigations. + let messageExpanded: boolean[] = trace.map(() => false) + $: messageExpanded = + messageExpanded.length === trace.length + ? messageExpanded + : trace.map(() => false)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/web_ui/src/lib/ui/trace/trace.svelte` at line 37, The component initializes messageExpanded once from trace but doesn't resync when trace is replaced, causing stale expansion state; update the component to recompute messageExpanded whenever the trace identity or its length changes (e.g., add a reactive statement that sets messageExpanded = trace.map(() => false) when trace changes or when trace.length differs) so bind:checked={messageExpanded[index]} stays in sync; reference the existing messageExpanded and trace variables and ensure any user expansion state is preserved or intentionally reset per the desired behavior.
263-265: 💤 Low valueCost formatting hardcodes USD and 6 decimals.
$${usage.cost.toFixed(6)}will render values like$0.000000for zero-cost runs and assumes USD. Ifcostis ever expressed in a different currency or accumulates significant whole-dollar values, this display becomes misleading. Consider routing through a shared formatter (similar toformatLatency) so currency/precision live in one place and can be locale-aware later.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/web_ui/src/lib/ui/trace/trace.svelte` around lines 263 - 265, The UI hardcodes USD and 6 decimal places when showing usage.cost (lines.push(`Cost: $${usage.cost.toFixed(6)}`)); extract this into a shared formatter (e.g., formatCost) similar to formatLatency, and replace the direct string construction with a call to formatCost(usage.cost, { currency?: 'USD' }) so currency and precision are centralized and locale-aware; update trace.svelte to use formatCost for all cost displays and add tests/usages where appropriate.app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte (2)
478-500: 💤 Low valueReplace
setTimeout(..., 100)with Svelte'stick()for the raw-data scroll.The 100 ms delay is a magic number guessing how long Svelte takes to render the now-unhidden
#multiturn_raw_datablock.await tick()is the deterministic primitive for "wait until reactive DOM updates have flushed" and avoids both the flicker if rendering takes longer than 100 ms and the unnecessary wait if it takes less.♻️ Suggested change
- function multiturn_toggle_raw_data() { - multiturn_show_raw_data = !multiturn_show_raw_data - if (multiturn_show_raw_data) { - setTimeout(() => { - const rawDataElement = document.getElementById("multiturn_raw_data") - if (rawDataElement) { - rawDataElement.scrollIntoView({ - behavior: "smooth", - block: "start", - }) - } - }, 100) - } - } + async function multiturn_toggle_raw_data() { + multiturn_show_raw_data = !multiturn_show_raw_data + if (!multiturn_show_raw_data) return + await tick() + document.getElementById("multiturn_raw_data")?.scrollIntoView({ + behavior: "smooth", + block: "start", + }) + }(Plus
import { tick } from "svelte"at the top.)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte around lines 478 - 500, The setTimeout(…, 100) in multiturn_toggle_raw_data is a brittle race; replace it with Svelte's tick() to wait for DOM updates: import { tick } from "svelte", then after setting multiturn_show_raw_data to true await tick() and only then query document.getElementById("multiturn_raw_data") and call scrollIntoView as currently done; keep the same scroll options and remove the setTimeout wrapper.
548-555: 💤 Low value
ancestors_loaded_for_run_idis not reset alongside the other ancestor state.The reset block at lines 541‑546 clears
ancestors,ancestors_chain_broken,ancestors_load_failedandfork_targeton everyrun_idchange, but intentionally leavesancestors_loaded_for_run_idunchanged so the trigger conditionancestors_loaded_for_run_id !== run_idcan fire the fetch. That works today, but it's worth a one‑line comment near line 535 explaining why this single piece of state is deliberately excluded from the reset — otherwise a future change that "completes" the reset by addingancestors_loaded_for_run_id = nullwould silently re‑fetch on every load even whentask.turn_mode !== "multiturn".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte around lines 548 - 555, Add a one-line comment next to the reset block that clears ancestors, ancestors_chain_broken, ancestors_load_failed and fork_target explaining that ancestors_loaded_for_run_id is intentionally NOT reset so the reactive check (ancestors_loaded_for_run_id !== run_id) can trigger load_ancestors(project_id, task_id, run_id) only when needed; mention that clearing ancestors_loaded_for_run_id would cause unintended re-fetches even when task.turn_mode !== "multiturn".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/web_ui/src/lib/api_schema.d.ts`:
- Around line 9804-9811: The ancestors property description is ambiguous about
ordering when chain_broken=true (it currently says root→leaf but also “from the
leaf back”); update the OpenAPI schema so the canonical ordering is explicit
(e.g., always root-to-leaf, and when chain_broken=true the array contains only
the intact suffix ordered root-to-leaf for that suffix) and mention that
chain_broken indicates a truncated prefix was dropped; modify the
TaskRunAncestors/ancestors description accordingly and then regenerate the
client schema by running the app/web_ui/src/lib/generate_schema.sh script so
app/web_ui/src/lib/api_schema.d.ts reflects the clarified wording.
In `@app/web_ui/src/lib/ui/run_sidebar.svelte`:
- Around line 201-217: In extract_subtask_references, avoid destructuring the
result of message.kiln_task_tool_data.split(":::") directly; first call const
parts = message.kiln_task_tool_data.split(":::") and validate parts.length >= 4
and that parts[0], parts[2], and parts[3] are non-empty before assigning
project_id, task_id, run_id (or pushing into references). This ensures malformed
kiln_task_tool_data does not produce undefined values or invalid API calls and
only well-formed triples are added to the references array.
In `@libs/server/kiln_server/run_api.py`:
- Around line 617-632: When task.turn_mode == TurnMode.multiturn, require
request.parent_task_run_id to be present: if request.parent_task_run_id is None
then raise HTTPException(status_code=400, detail="parent_task_run_id is required
for multiturn tasks.") before calling TaskRun.from_id_and_parent_path; keep the
existing validation that parent_task_run_id is invalid for non-multiturn tasks
(the current check that raises when task.turn_mode != TurnMode.multiturn) and
preserve the subsequent lookup using TaskRun.from_id_and_parent_path and raising
404 if not found.
---
Outside diff comments:
In `@app/web_ui/src/lib/ui/icons/info_circle_icon.svelte`:
- Around line 1-15: The SVG Svelte component file lacks the required TypeScript
script block; add a top-level <script lang="ts"> block in
app/web_ui/src/lib/ui/icons/info_circle_icon.svelte (even if empty) to satisfy
the project's TypeScript guideline for Svelte components and make future prop
typing easier; place it before the SVG markup and include any future exports
(e.g., export let ...) there when needed.
---
Nitpick comments:
In `@app/desktop/studio_server/test_eval_api.py`:
- Line 1570: The test sets mock_task.filter_runs.return_value but doesn't assert
that the readonly flag is forwarded; update the test to assert that
runs_in_filter (the code under test) calls mock_task.filter_runs with
readonly=True by adding an assertion like
mock_task.filter_runs.assert_called_once_with(readonly=True) (or
assert_called_with(readonly=True) if multiple calls are expected) after the
invocation so the test verifies the call signature.
In `@app/web_ui/src/lib/ui/trace/trace.svelte`:
- Line 37: The component initializes messageExpanded once from trace but doesn't
resync when trace is replaced, causing stale expansion state; update the
component to recompute messageExpanded whenever the trace identity or its length
changes (e.g., add a reactive statement that sets messageExpanded = trace.map(()
=> false) when trace changes or when trace.length differs) so
bind:checked={messageExpanded[index]} stays in sync; reference the existing
messageExpanded and trace variables and ensure any user expansion state is
preserved or intentionally reset per the desired behavior.
- Around line 263-265: The UI hardcodes USD and 6 decimal places when showing
usage.cost (lines.push(`Cost: $${usage.cost.toFixed(6)}`)); extract this into a
shared formatter (e.g., formatCost) similar to formatLatency, and replace the
direct string construction with a call to formatCost(usage.cost, { currency?:
'USD' }) so currency and precision are centralized and locale-aware; update
trace.svelte to use formatCost for all cost displays and add tests/usages where
appropriate.
In
`@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte:
- Around line 478-500: The setTimeout(…, 100) in multiturn_toggle_raw_data is a
brittle race; replace it with Svelte's tick() to wait for DOM updates: import {
tick } from "svelte", then after setting multiturn_show_raw_data to true await
tick() and only then query document.getElementById("multiturn_raw_data") and
call scrollIntoView as currently done; keep the same scroll options and remove
the setTimeout wrapper.
- Around line 548-555: Add a one-line comment next to the reset block that
clears ancestors, ancestors_chain_broken, ancestors_load_failed and fork_target
explaining that ancestors_loaded_for_run_id is intentionally NOT reset so the
reactive check (ancestors_loaded_for_run_id !== run_id) can trigger
load_ancestors(project_id, task_id, run_id) only when needed; mention that
clearing ancestors_loaded_for_run_id would cause unintended re-fetches even when
task.turn_mode !== "multiturn".
In
`@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.ts:
- Around line 38-65: The function compute_forkable_run_ids silently tolerates
ancestors.length > total_turns by letting offset be negative; add a warning when
this happens to surface the data inconsistency: inside compute_forkable_run_ids,
after computing const offset = total_turns - ancestors.length, check if offset <
0 and emit a console.warn (or use an existing logger) that includes values for
total_turns, ancestors.length, and maybe a brief context (e.g., project/run IDs
if available) so developers can detect when ancestors and user_trace_indices are
misaligned; keep the existing guard for user_trace_indices[offset + k] so
behavior is unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8909dd12-8a4b-4e3b-b18d-9c52e28e87b6
📒 Files selected for processing (48)
app/desktop/studio_server/agent_api.pyapp/desktop/studio_server/eval_api.pyapp/desktop/studio_server/finetune_api.pyapp/desktop/studio_server/test_data_gen_api.pyapp/desktop/studio_server/test_eval_api.pyapp/desktop/studio_server/test_finetune_api.pyapp/desktop/studio_server/test_repair_api.pyapp/web_ui/src/lib/api_schema.d.tsapp/web_ui/src/lib/types.tsapp/web_ui/src/lib/ui/conversation/multiturn_composer.svelteapp/web_ui/src/lib/ui/conversation/multiturn_composer.test.tsapp/web_ui/src/lib/ui/icons/fork_icon.svelteapp/web_ui/src/lib/ui/icons/info_circle_icon.svelteapp/web_ui/src/lib/ui/run_sidebar.svelteapp/web_ui/src/lib/ui/trace/trace.svelteapp/web_ui/src/lib/ui/trace/trace.test.tsapp/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelteapp/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.test.tsapp/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.tsapp/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/multiturn_send.test.tsapp/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/multiturn_send.tsapp/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/+page.svelteapp/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelteapp/web_ui/src/routes/(app)/run/+page.svelteapp/web_ui/src/routes/(app)/run/run.svelteapp/web_ui/src/routes/(app)/run/run_input_form.svelteapp/web_ui/src/routes/(app)/settings/edit_task/[project_id]/[task_id]/load_task_editor.svelteapp/web_ui/src/routes/(app)/specs/[project_id]/[task_id]/[spec_id]/[eval_id]/create_eval_config/get_eval_steps.test.tsapp/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelteapp/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.test.tslibs/core/kiln_ai/adapters/eval/eval_runner.pylibs/core/kiln_ai/adapters/fine_tune/dataset_formatter.pylibs/core/kiln_ai/adapters/fine_tune/test_dataset_formatter.pylibs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.pylibs/core/kiln_ai/adapters/prompt_builders.pylibs/core/kiln_ai/cli/commands/test_package_project.pylibs/core/kiln_ai/datamodel/datamodel_enums.pylibs/core/kiln_ai/datamodel/dataset_split.pylibs/core/kiln_ai/datamodel/task.pylibs/core/kiln_ai/datamodel/test_dataset_split.pylibs/core/kiln_ai/datamodel/test_example_models.pylibs/core/kiln_ai/datamodel/test_models.pylibs/core/kiln_ai/datamodel/test_task.pylibs/core/kiln_ai/utils/test_dataset_import.pylibs/server/kiln_server/run_api.pylibs/server/kiln_server/task_api.pylibs/server/kiln_server/test_run_api.pylibs/server/kiln_server/test_task_api.py
|
After merging #1415 - we can ditch |
Add multiturn CSV import path with parsing, validation, and TaskRun chain materialization. `import_csv` dispatches on `task.turn_mode`. Introduces `ImportResult` and adds `imported_conversation_count` to the bulk upload response. Covered by unit and endpoint tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make the dataset upload dialog turn-mode aware: multiturn tasks get a tailored title, help block describing the `trace`/`tags` columns with an inline example, an alternation/no-system-messages callout, and a link to a new `static/sample_multiturn.csv` asset. Adds Vitest coverage for the single-turn, multiturn, and null-task branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eat-multiturn-task
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@libs/core/kiln_ai/datamodel/task.py`:
- Around line 283-286: The docstring for filter_runs is inconsistent with
runs(): update the docstring to state that runs() returns leaf-only runs by
default and only includes intermediate (non-leaf) turns when
include_intermediate_runs=True; reference the runs() behavior and the
include_intermediate_runs parameter in the filter_runs docstring so it
accurately describes filtering semantics for multiturn tasks using filter_runs
and runs().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 85623d66-30e5-4884-b457-8c88c1e1776a
📒 Files selected for processing (12)
app/desktop/studio_server/finetune_api.pyapp/desktop/studio_server/test_finetune_api.pyapp/web_ui/src/lib/api_schema.d.tsapp/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.sveltelibs/core/kiln_ai/adapters/fine_tune/dataset_formatter.pylibs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.pylibs/core/kiln_ai/datamodel/task.pylibs/core/kiln_ai/datamodel/test_models.pylibs/core/kiln_ai/datamodel/test_task.pylibs/core/kiln_ai/utils/test_dataset_import.pylibs/server/kiln_server/run_api.pylibs/server/kiln_server/test_run_api.py
🚧 Files skipped from review as they are similar to previous changes (8)
- libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
- app/desktop/studio_server/test_finetune_api.py
- app/web_ui/src/lib/api_schema.d.ts
- app/desktop/studio_server/finetune_api.py
- libs/core/kiln_ai/utils/test_dataset_import.py
- app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
- libs/server/kiln_server/run_api.py
- libs/server/kiln_server/test_run_api.py
…-AI/Kiln into leonard/kil-658-multiturn-support-csv
…-AI/Kiln into leonard/kil-658-multiturn-support-csv
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@libs/server/kiln_server/run_api.py`:
- Around line 439-445: The batch delete endpoint currently calls run.delete()
directly and must be changed to use the same cascade behavior as the single-run
delete: use the same helper _collect_cascade_delete_runs (the code that builds
runs_to_delete) to expand each target run into its cascade set and then delete
the resulting runs in that set rather than deleting individual runs directly;
update the POST /runs/delete handler (the batch deletion logic that currently
iterates runs and calls run.delete()) to call task_and_run_from_id for each id,
call _collect_cascade_delete_runs(task, run) to gather deletions, deduplicate
the combined set, and then delete each run in that unified runs_to_delete
collection so both endpoints behave identically.
- Around line 633-641: After loading the parent via
TaskRun.from_id_and_parent_path, ensure you reject continuation requests when
the parent has no trace: check parent_task_run.trace (referenced as prior_trace
in this block) and if it is None raise an HTTPException (4xx) with a clear
message including request.parent_task_run_id rather than proceeding; place this
check immediately after the parent_task_run is validated and before any use of
prior_trace or calling the adapter so a child cannot be persisted as a
continuation of a parent with no trace.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4509f62f-81ff-43d2-b916-5d5f4c348454
📒 Files selected for processing (16)
app/desktop/studio_server/finetune_api.pyapp/desktop/studio_server/test_data_gen_api.pyapp/desktop/studio_server/test_finetune_api.pyapp/desktop/studio_server/test_repair_api.pylibs/core/kiln_ai/adapters/fine_tune/dataset_formatter.pylibs/core/kiln_ai/adapters/fine_tune/test_dataset_formatter.pylibs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.pylibs/core/kiln_ai/cli/commands/test_package_project.pylibs/core/kiln_ai/datamodel/dataset_split.pylibs/core/kiln_ai/datamodel/task.pylibs/core/kiln_ai/datamodel/test_example_models.pylibs/core/kiln_ai/datamodel/test_models.pylibs/core/kiln_ai/datamodel/test_task.pylibs/core/kiln_ai/utils/test_dataset_import.pylibs/server/kiln_server/run_api.pylibs/server/kiln_server/test_run_api.py
🚧 Files skipped from review as they are similar to previous changes (6)
- libs/core/kiln_ai/datamodel/test_example_models.py
- libs/core/kiln_ai/datamodel/test_models.py
- app/desktop/studio_server/test_finetune_api.py
- libs/core/kiln_ai/datamodel/dataset_split.py
- app/desktop/studio_server/test_data_gen_api.py
- app/desktop/studio_server/test_repair_api.py
…-AI/Kiln into leonard/kil-658-multiturn-support-csv
…-AI/Kiln into leonard/kil-658-multiturn-support-csv
…ln-AI/Kiln into leonard/kil-658-multiturn-support-csv
…task-2 refactor: multiturn UI more like chat
…rt-csv feat: multiturn csv import
…eat-multiturn-task
…-AI/Kiln into leonard/kil-632-feat-multiturn-task
…eat-multiturn-task
…eat-multiturn-task
- Hide the redundant "Plaintext Input" header in the composer and use a "Write a message…" placeholder (RunInputForm gains label/placeholder/ hide_label props). - Disable composer autofocus so loading a turn no longer yanks the viewport to the textbox. - Restyle the fork context strip as a status header (label + helper text) instead of a chat-bubble container. - Make the multiturn view a chat layout on xl+: scrollable transcript with the composer pinned to the bottom, Options column kept at natural height. "Show Raw Data" stays under the composer; transcript has no horizontal scroll and uses the Assistant scrollbar styling. - Scroll the transcript to the latest turn on load and after sending, re-pinning to the bottom across late markdown reflow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On send, append the user's message to the transcript optimistically so it shows immediately, disable the composer input/Send while the run is in flight, then redirect to the new run. The run-load reactive clears the placeholder once the real trace arrives (and on_send_settled clears it on error), avoiding a duplicate bubble. Append-mode only; fork mode just locks the input and navigates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Tag multiturn turns as "manual_run" instead of "multiturn_run" so they aren't singled out from other manually-created runs. - In the multiturn sidebar, title the usage section "Final Turn Usage" and drop the "Total" prefix from the Cost/Tokens/Latency line items (it's the final turn's run, not a whole-conversation aggregate). Single-turn is unchanged. - Open Raw Data in a modal instead of expanding inline and reflowing the chat column. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lish
- Move the per-message usage/fork actions out of the chat bubbles into a
dedicated row below each message that fades in on hover/focus (Claude
desktop style). Space is reserved so hovering never reflows the
conversation, and bubbles stay clean (no always-empty line).
- Label the actions ("Usage" with a new dollar icon component, "Fork");
usage is right-aligned to the bubble's edge.
- Add a height prop to RunInputForm; the composer uses a shorter textarea.
- Fix assistant bubbles (esp. collapsed thinking/tool blocks) shrinking
to content width instead of filling their allocated 70%.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Disable the Evals and Synthetic Data screens for multi-turn tasks (warning + no action button + no redirect into the flow), matching the fine-tuning behavior. - Remove a stray zero-width space in a comment that failed the web UI no-irregular-whitespace lint check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…send - Show the assistant ChatLoading indicator below the optimistic message while a turn is generating, kept up continuously across the redirect until the new run renders. - Stop nulling run / flipping loading on send so the transcript stays mounted during navigation instead of flashing the full-page spinner; keep the composer disabled (busy) until the new run loads to prevent a double-send against the stale parent. - Fix the cleared composer flagging "Required" after a successful send by keying the plaintext field so clear_input resets its validation state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When the current task is multi-turn, the /run page now renders a chat-style new-conversation layout (empty transcript + pinned MultiturnComposer + Options sidebar) mirroring the in-run multiturn page, instead of the Input/Run/Output layout. The first message starts a root run (optimistic message + loading indicator) then redirects to the dataset run page. send_multiturn gains an allow_root_turn flag so a parentless first turn is permitted from /run; the in-chat composer leaves it false. Single-turn tasks keep the existing layout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What does this PR do?
Add support for multiturn tasks:
Tasknow has aturn_mode: 'single_turn' | 'multiturn'task.filter_runs()instead oftask.runs()to filter out intermediateTaskRunsTaskRunsChecklists