feat: multiturn tasks by leonardmq · Pull Request #1409 · Kiln-AI/Kiln

leonardmq · 2026-05-14T08:32:50Z

What does this PR do?

Add support for multiturn tasks:

datamodel:
- feat: Task now has a turn_mode: 'single_turn' | 'multiturn'
- refactor: migrate callers to use task.filter_runs() instead of task.runs() to filter out intermediate TaskRuns
UI:
- task creation: can pick between multiturn and single turn
- now supports multiturn conversations
- allow forking conversations
- dataset only shows leaf TaskRuns
- finetuning: not supported for now

Checklists

Tests have been run locally and passed
New tests have been added to any work in /lib

Introduce the backbone for multiturn tasks at the data layer: - Add TurnMode enum (single_turn, multiturn) in datamodel_enums. - Add Task.turn_mode field defaulting to single_turn, with a model_validator rejecting multiturn combined with input_json_schema or output_json_schema. - Enforce turn_mode immutability in task_api.update_task (400 on attempted change post-creation), with string-to-enum coercion for patch payloads. - Add backend pytest coverage for defaults, JSON migration of legacy tasks without the field, forbidden schema combinations, and the update_task immutability guard. - Add Phase 1 spec artifacts under specs/projects/multiturn_tasks/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Build on the Phase 1 data model to plumb continuation runs through the run endpoint and exclude interior nodes from list views and dataset splits: - Add RunTaskRequest.parent_task_run_id field on the run endpoint, so callers can mark a run as a continuation of an existing one. - Plumb parent_task_run_id through run_task: load the parent TaskRun, pass its trace as prior_trace and the run itself as parent_task_run to the adapter; reject single-turn tasks with 400 and missing parent ids with 404. - Filter get_runs_summary down to leaves only — drop runs whose ids appear as parent_task_run_id on any other run, so the runs table shows one row per conversation. - Apply the same leaf-only filter in DatasetSplit.build_split_contents so dataset splits sample whole conversations rather than interior turns. - Backend pytest coverage for the continuation flow, validation errors, both leaf filters, and regenerate api_schema.d.ts. - Mark Phase 2 complete in the implementation plan and add the Phase 2 spec artifact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Surface the Phase 1 data model in the task creation/edit UI: - Add a radio-style turn_mode picker (Single-turn / Multiturn) as a new "Part 2: Task Type" section in edit_task.svelte, with renumbered Input/Output/Requirements parts. Defaults to single_turn for new tasks and reflects task.turn_mode for existing ones. - Conditionally hide the input + output schema editors when multiturn is selected, replacing them with short explanatory notes ("Multiturn tasks use plain-text input." / "Structured output is not supported for multiturn tasks yet."), and skip the schema fields in the create POST body so the backend doesn't get a stale string. - Guard inputSchemaSection / outputSchemaSection access in has_edits with optional chaining so multiturn forms (where the bind:this targets don't mount) don't blow up. - Add a read_only_turn_mode prop wired by load_task_editor.svelte so the edit page renders turn_mode as an immutable label with a hint, while clone mode still gets the editable picker (cloned task gets a fresh id, so a new turn_mode is allowed). - Export TurnMode from $lib/types so the component can reference the literal-union type. - New vitest suite edit_task.test.ts covering default render, toggle behavior, conditional schema sections, read-only mode, and POST payload contents for both turn modes. - Mark Phase 3 complete in the implementation plan and add the Phase 3 spec artifact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Frontend-only phase that completes the multiturn-task UX end-to-end, building on the Phase 1-3 data model, run plumbing, and task-edit UI: - Branch /run send-success on multiturn tasks to goto the dataset run-detail page for the newly-created TaskRun, leaving single-turn flows untouched. - Two-column multiturn layout on the dataset run-detail page: left column shows ConversationView + composer (RunInputForm wrapped in FormContainer with keyboard submit); right column shows the existing PropertyList plus a SavedRunConfigurationsDropdown + RunConfigComponent so the user can switch model/prompt before each Send. Single-turn run-detail rendering preserved unchanged. - New conversation_view.svelte + message_block.svelte under lib/ui/conversation/: flat thread of role-based message blocks rendered from TaskRun.trace, with reasoning + tool calls collapsed inside the assistant block and a wrap fix so long messages do not blow out the column. - New multiturn_send.ts helper that owns the run-config-component resolve/build, POST to /run with parent_task_run_id, and post-send cleanup; covered by multiturn_send.test.ts. - URL replaceState on each Send so back-button stays sane while the conversation grows. - Run-config picker is wrapped in {#key run.id} so it re-seeds from the new run's config after every Send instead of sticking to the initial selection. - Frontend test coverage for conversation_view, message_block, and multiturn_send. - Mark Phase 4 complete in the implementation plan and add the Phase 4 spec artifact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eat-multiturn-task # Conflicts: # app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte

Replace the multiturn ConversationView with the shared Trace component used by single-turn runs so messages render in collapsible role-labeled blocks (system, user, assistant, tool). Add a markdown_content flag to Trace that renders non-tool message content (and reasoning) via ChatMarkdown, preserving the markdown rendering the multiturn UI previously had. The plaintext input form continues to render below the trace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds GET /api/projects/{project_id}/tasks/{task_id}/runs/{run_id}/ancestors which walks parent_task_run_id from leaf to root and returns the ordered chain with a chain_broken flag. Defensively handles missing parents, cycles, runaway depth, and chains longer than the leaf trace's user-message count. Includes pytest coverage and regenerated OpenAPI types. This is Phase 1 of the multiturn trace forking project; the endpoint will back the frontend fork affordance in Phase 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 2 of the multiturn-trace-forking project. Adds the user-facing fork affordance on the run-detail page so users can branch a multiturn conversation from any eligible past user turn. - Extract the inline append composer into a reusable `MultiturnComposer` and add a fork mode (prefill, context strip, Cancel with dirty-confirm dialog, dirty-switch confirmation when changing fork target mid-edit). - Add `ForkIcon.svelte` to match the existing icon conventions. - Extend `trace.svelte` with optional `forkable_run_ids`, `truncate_at_trace_index`, and `on_fork` props; render the fork button inside the expanded user block body so it only appears once a turn is opened. - Wire fork state in the run-detail `+page.svelte`: ancestors fetch, `compute_forkable_run_ids`, fork-target state, and non-blocking banners for `chain_broken` and ancestors load failures. - Extract `RunSidebar` and mount it on both the single-turn and multiturn run pages so Rating, Tags, and Usage have parity. - Add vitest suites for `fork_helpers`, `trace`, and `multiturn_composer`. - Update `functional_spec.md` and `ui_design.md` to reflect the in-expanded-body fork-button placement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-14T08:33:09Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

This pull request introduces multiturn task support: a turn_mode field controls whether tasks run single-turn or allow conversational continuations. The backend implements run ancestry tracking via parent_task_run_id, ancestor querying, and cascade deletion. The frontend provides task editor UI for turn mode selection, a new MultiturnComposer component for fork/append modes with discard confirmation, trace forking via ancestor navigation, and a delegated RunSidebar for rating/feedback management.

Changes

Multiturn task and run system

Layer / File(s)	Summary
Core data model: TurnMode enum and task validation `libs/core/kiln_ai/datamodel/datamodel_enums.py`, `libs/core/kiln_ai/datamodel/task.py`	Introduces `TurnMode` enum (`single_turn`, `multiturn`) and adds `turn_mode` field to `Task` with validation rejecting structured schemas for multiturn tasks.
API schema and type definitions `app/web_ui/src/lib/api_schema.d.ts`, `app/web_ui/src/lib/types.ts`	Auto-generated OpenAPI schema adds `TurnMode` enum, `TaskRunAncestor`, `TaskRunAncestorsResponse` types, `parent_task_run_id` field to `RunTaskRequest`, and ancestors endpoint definition; TypeScript type aliases exported.
Server: run ancestry endpoint, parent-child execution, cascade deletion `libs/server/kiln_server/run_api.py`, `libs/server/kiln_server/task_api.py`	Implements `GET /ancestors` endpoint with cycle detection and chain-broken tracking; adds `parent_task_run_id` support for run continuation passing `prior_trace` to adapter; implements cascade-delete for ancestor runs; validates task `turn_mode` immutability on PATCH.
Server tests: ancestry, continuation, cascade deletion `libs/server/kiln_server/test_run_api.py`, `libs/server/kiln_server/test_task_api.py`	Tests multiturn run continuation, ancestors endpoint edge cases (missing parents, cycles, chain truncation), cascade deletion semantics, and task turn_mode change rejection.
Datamodel: Task.turn_mode and dataset splits `libs/core/kiln_ai/datamodel/task.py`, `libs/core/kiln_ai/datamodel/dataset_split.py`, `libs/core/kiln_ai/datamodel/test_dataset_split.py`, `libs/core/kiln_ai/datamodel/test_task.py`	Adds `turn_mode` validation to Task; updates DatasetSplit methods to include intermediate multiturn runs when resolving references; adds test helpers and regression tests for leaf-only vs all-run split inclusion.
Backend: dataset formatter and test run lookups `libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py`, multiple test files across `adapters/`, `cli/`, `utils/`	Updates dataset formatter to resolve split entries using intermediate runs; sweeps test files to use `include_intermediate_runs=True` when fetching task runs.
Frontend task editor: turn_mode UI and validation `app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte`, `app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.test.ts`, `app/web_ui/src/routes/(app)/settings/edit_task/[project_id]/[task_id]/load_task_editor.svelte`	Adds "Part 2: Task Type" section with single-turn/multiturn radio controls (read-only when cloning); hides schema inputs for multiturn; includes tests for toggle behavior, read-only mode, and payload construction.
Frontend: fine-tune pages, single-turn routing, task loading `app/web_ui/src/routes/(app)/fine_tune/...`, `app/web_ui/src/routes/(app)/run/+page.svelte`	Fine-tune pages load task and display unsupported warnings for multiturn; single-turn run page routes to dataset run page for multiturn; task loading store helpers added to routes.
Frontend: send_multiturn implementation and tests `app/web_ui/src/routes/(app)/dataset/[...]/run/multiturn_send.ts`, `app/web_ui/src/routes/(app)/dataset/[...]/run/multiturn_send.test.ts`	Implements `send_multiturn` helper POSTing to run endpoint with `parent_task_run_id`, run config, and tags; tests cover request construction, success/error flows, model validation, and input preservation.
Frontend: MultiturnComposer component and tests `app/web_ui/src/lib/ui/conversation/multiturn_composer.svelte`, `app/web_ui/src/lib/ui/conversation/multiturn_composer.test.ts`	Svelte component supporting append and fork modes with dirty-state tracking, discard confirmation dialog, `is_dirty()` and `request_swap()` exports; tests validate UI rendering and interaction across modes.
Frontend: fork helpers for ancestry navigation `app/web_ui/src/routes/(app)/dataset/[...]/run/fork_helpers.ts`, `app/web_ui/src/routes/(app)/dataset/[...]/run/fork_helpers.test.ts`	Implements `fork_target_from_user_block`, `compute_forkable_run_ids`, and `content_string_from_user_message` helpers for trace forking; tests cover all scenarios including edge cases.
Frontend: trace component enhancements `app/web_ui/src/lib/ui/trace/trace.svelte`, `app/web_ui/src/lib/ui/trace/trace.test.ts`	Adds optional Markdown rendering for non-tool messages, fork button on user turns, usage tooltips (latency/tokens/cost), and trace truncation; tests cover rendering rules, fork affordance conditions, and truncation.
Frontend: RunSidebar component for rating and feedback `app/web_ui/src/lib/ui/run_sidebar.svelte`	New component providing task-rating selection, requirement ratings, feedback management (load/sort/submit/delete), and recursive subtask usage aggregation with async safety guards.
Frontend: run.svelte refactored to use RunSidebar `app/web_ui/src/routes/(app)/run/run.svelte`, `app/web_ui/src/routes/(app)/run/run_input_form.svelte`	Simplifies run.svelte by delegating sidebar to `RunSidebar` component; removes local rating/feedback/usage logic; adds textarea seed/focus methods to run_input_form.
Frontend: dataset run page multiturn layout `app/web_ui/src/routes/(app)/dataset/[...]/run/+page.svelte`	Adds conditional multiturn layout (trace, composer, raw-data, options) vs single-turn; includes ancestor loading, forkable-run-id computation, fork handling, and handle_send() navigation.
Frontend: UI icons and backend finetune rejection `app/web_ui/src/lib/ui/icons/fork_icon.svelte`, `app/web_ui/src/lib/ui/icons/info_circle_icon.svelte`, `app/desktop/studio_server/finetune_api.py`	New fork and info-circle icons; desktop finetune API rejects multiturn tasks with HTTP 400.
Backend desktop tests and config updates `app/desktop/studio_server/test_*.py`, `app/web_ui/tests/e2e/...`, `libs/server/kiln_server/utils/agent_checks/annotations/...`	Desktop tests updated for intermediate run handling; e2e test expectations updated for Part section headings; agent policy annotation added for ancestors endpoint.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Kiln-AI/Kiln#1415: Introduces ParentOfRelationship and include_intermediate_runs parameter to control leaf-vs-intermediate run handling, which this PR relies on throughout the multiturn API/UI.
Kiln-AI/Kiln#1176: Refactors backend data model to persist parent_task_run_id field that this PR builds upon for ancestry and continuation.
Kiln-AI/Kiln#601: Previous trace/run UI refactoring that the main PR extends with markdown rendering, fork affordances, and sidebar extraction.

Suggested reviewers

scosman
chiang-daniel

🐰 A multiturn maze of traces deep,
Where runs can fork and ancestors sleep,
With composers dancing, sidebars bright,
Conversations bloom—single turn takes flight! ✨

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch leonard/kil-632-feat-multiturn-task

…eat-multiturn-task

gemini-code-assist

Code Review

This pull request implements multiturn task support across the stack, introducing task turn modes, conversation-based run views, and a forking mechanism for trace branches. Key backend additions include ancestor chain traversal and cascade deletion logic, while the frontend sees new components for conversation management and a refactored run sidebar. Review feedback highlights opportunities for performance optimization in subtask usage calculations and disk I/O during deletions. It also identifies a UX bug regarding event propagation on the fork button, an architectural violation in component dependencies, and a missing reactive dependency in the ratings UI.

gemini-code-assist · 2026-05-14T08:35:34Z

+    # the loader for each ancestor. We need the full chain view here.
+    all_runs = task.filter_runs(include_intermediate_runs=True, readonly=True)
+    children_by_parent: Dict[str, list[str]] = {}
+    for r in all_runs:
+        if r.parent_task_run_id and r.id is not None:
+            children_by_parent.setdefault(r.parent_task_run_id, []).append(str(r.id))
+
+    to_delete: list[TaskRun] = [target]
+    deleted_ids: set[str] = set()
+    if target.id is not None:
+        deleted_ids.add(str(target.id))
+
+    visited: set[str] = set(deleted_ids)
+    current = target
+    for _ in range(_MAX_ANCESTOR_DEPTH):
+        parent_id = current.parent_task_run_id
+        if parent_id is None:
+            break
+        if parent_id in visited:
+            # Cycle: stop here, but everything queued so far is still valid.
+            break
+        parent = TaskRun.from_id_and_parent_path(parent_id, task.path)
+        if parent is None:
+            # Chain broken: stop the cascade, don't 500.
+            break
+        live_children = [
+            cid
+            for cid in children_by_parent.get(parent_id, [])
+            if cid not in deleted_ids
+        ]
+        if live_children:
+            # A sibling branch survives — keep this parent.
+            break
+        to_delete.append(parent)
+        if parent.id is not None:
+            deleted_ids.add(str(parent.id))
+            visited.add(str(parent.id))
+        current = parent


Redundant disk I/O in _collect_cascade_delete_runs. The function already loads all runs for the task into all_runs at line 93. Instead of calling TaskRun.from_id_and_parent_path in a loop (which hits the filesystem repeatedly), you should use a lookup dictionary built from the all_runs list already in memory.

Addressed — _collect_cascade_delete_runs now builds a runs_by_id dict alongside children_by_parent from the single all_runs load, and the ancestor walk uses runs_by_id.get(parent_id) instead of TaskRun.from_id_and_parent_path.

github-actions · 2026-05-14T08:37:09Z

📊 Coverage Report

Overall Coverage: 92%

Diff: origin/main...HEAD

app/desktop/studio_server/finetune_api.py (100%)
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (100%)
libs/core/kiln_ai/datamodel/datamodel_enums.py (100%)
libs/core/kiln_ai/datamodel/dataset_split.py (100%)
libs/core/kiln_ai/datamodel/task.py (100%)
libs/core/kiln_ai/datamodel/task_run.py (100%)
libs/core/kiln_ai/utils/dataset_import.py (93.4%): Missing lines 360,396,451,563,578,585,597-599
libs/server/kiln_server/run_api.py (96.9%): Missing lines 77-78,151,711
libs/server/kiln_server/task_api.py (100%)

Summary

Total: 308 lines
Missing: 13 lines
Coverage: 95%

Line-by-line

View line-by-line diff coverage

libs/core/kiln_ai/utils/dataset_import.py

Lines 356-364

  356     (`tags -> 0: ...`) when constructing TaskRuns downstream.
  357     """
  358     for tag in tags:
  359         if not tag:
! 360             raise KilnInvalidImportFormat(
  361                 "Tags cannot be empty strings.",
  362                 row_number=row_number,
  363             )
  364         if " " in tag:

Lines 392-400

  392 
  393     messages: list[ValidatedMessage] = []
  394     for k, msg in enumerate(trace, start=1):
  395         if not isinstance(msg, dict):
! 396             raise KilnInvalidImportFormat(
  397                 f"message {k}: must be a JSON object.",
  398                 row_number=row_number,
  399             )

Lines 447-455

  447         if role == "assistant":
  448             rc = msg.get("reasoning_content")
  449             if rc is not None:
  450                 if not isinstance(rc, str):
! 451                     raise KilnInvalidImportFormat(
  452                         f"message {k}: 'reasoning_content' must be a string.",
  453                         row_number=row_number,
  454                     )
  455                 if not rc:

Lines 559-567

  559     with open(dataset_path, "r", newline="", encoding="utf-8") as csvfile:
  560         reader = csv.DictReader(csvfile)
  561 
  562         if not reader.fieldnames:
! 563             raise KilnInvalidImportFormat(
  564                 "CSV file appears to be empty or missing headers"
  565             )
  566 
  567         actual_headers = set(reader.fieldnames)

Lines 574-582

  574                 f"Got: {', '.join(sorted(actual_headers))}."
  575             )
  576         missing_headers = required_headers - actual_headers
  577         if missing_headers:
! 578             raise KilnInvalidImportFormat(
  579                 f"Missing required headers: {', '.join(missing_headers)}. "
  580                 f"Required headers are: {', '.join(required_headers)}"
  581             )

Lines 581-589

  581             )
  582 
  583         unknown_headers = actual_headers - (required_headers | optional_headers)
  584         if unknown_headers:
! 585             logger.warning(
  586                 f"Unknown headers in CSV file will be ignored: {', '.join(unknown_headers)}"
  587             )
  588 
  589         for row_number, row in enumerate(reader, start=2):

Lines 593-603

  593                         **row,
  594                         "tags": deserialize_tags(row.get("tags")),
  595                     }
  596                 )
! 597             except ValidationError as e:
! 598                 logger.warning(f"Invalid row {row_number}: {row}", exc_info=True)
! 599                 raise KilnInvalidImportFormat(
  600                     format_validation_error(e),
  601                     row_number=row_number,
  602                 ) from e

libs/server/kiln_server/run_api.py

Lines 73-82

  73         chain.append(parent)
  74         if parent.id is not None:
  75             visited.add(str(parent.id))
  76         current = parent
! 77     chain.reverse()
! 78     return chain, True
  79 
  80 
  81 def _collect_cascade_delete_runs(
  82     task: Task,

Lines 147-155

  147 
  148 
  149 def _count_user_messages(trace: list[Any] | None) -> int:
  150     if not trace:
! 151         return 0
  152     return sum(1 for m in trace if isinstance(m, dict) and m.get("role") == "user")
  153 
  154 
  155 def deep_update(

Lines 707-715

  707                     status_code=404,
  708                     detail=f"Parent run not found. ID: {request.parent_task_run_id}",
  709                 )
  710             if not parent_task_run.trace:
! 711                 raise HTTPException(
  712                     status_code=400,
  713                     detail="Parent run cannot be continued because it has no trace.",
  714                 )
  715             prior_trace = parent_task_run.trace

📊 HTML Coverage Report - Interactive coverage report
📈 Diff Coverage Report - Detailed diff analysis
Github Actions Run - View the full coverage report

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

app/web_ui/src/lib/ui/icons/info_circle_icon.svelte (1)
1-15: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add TypeScript script block per coding guidelines.

The component should include a <script lang="ts"> block for consistency with the project's TypeScript requirement, even if currently empty. This ensures the guideline is met and makes future prop additions straightforward. As per coding guidelines, all frontend code in app/web_ui/**/*.svelte should use TypeScript.
📝 Proposed addition
+<script lang="ts">
+  export let class: string | undefined = undefined;
+</script>
+
 <svg
-  class="w-full h-full"
+  class="w-full h-full {class}"
   fill="currentColor"
   viewBox="0 0 1024 1024"
   xmlns="http://www.w3.org/2000/svg"
   aria-hidden="true"
 >
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/web_ui/src/lib/ui/icons/info_circle_icon.svelte` around lines 1 - 15, The
SVG Svelte component file lacks the required TypeScript script block; add a
top-level <script lang="ts"> block in
app/web_ui/src/lib/ui/icons/info_circle_icon.svelte (even if empty) to satisfy
the project's TypeScript guideline for Svelte components and make future prop
typing easier; place it before the SVG markup and include any future exports
(e.g., export let ...) there when needed.

🧹 Nitpick comments (6)

app/desktop/studio_server/test_eval_api.py (1)
1570-1570: ⚡ Quick win

Assert readonly is forwarded to filter_runs.

This test now mocks filter_runs, but it doesn’t assert the expected call signature. Add an assertion to ensure runs_in_filter(..., readonly=True) actually calls mock_task.filter_runs(readonly=True).
Suggested patch
         result = runs_in_filter(mock_task, "tag::some_filter", readonly=True)

         # Verify the results
         assert len(result) == 2
         assert result[0].id == "run1"
         assert result[1].id == "run3"

+        mock_task.filter_runs.assert_called_once_with(readonly=True)
         # Verify the filter was called for each run
         assert mock_filter.call_count == 3
         mock_dataset_filter_from_id.assert_called_once_with("tag::some_filter")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/desktop/studio_server/test_eval_api.py` at line 1570, The test sets
mock_task.filter_runs.return_value but doesn't assert that the readonly flag is
forwarded; update the test to assert that runs_in_filter (the code under test)
calls mock_task.filter_runs with readonly=True by adding an assertion like
mock_task.filter_runs.assert_called_once_with(readonly=True) (or
assert_called_with(readonly=True) if multiple calls are expected) after the
invocation so the test verifies the call signature.
app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.ts (1)
38-65: 💤 Low value

Consider logging when ancestor count exceeds user turns.

The suffix-alignment logic at lines 55–60 handles the edge case where ancestors.length > total_turns (which would be a data inconsistency) by allowing offset to be negative. When offset + k is negative for small k, user_trace_indices[offset + k] returns undefined, and line 60 guards against that. While this is safe, silently ignoring the mismatch could make debugging harder if the data model is violated. Adding a console.warn when offset < 0 would surface this edge case without breaking functionality.
📝 Optional improvement to log edge case
   const total_turns = user_trace_indices.length
   if (total_turns === 0 || ancestors.length === 0) {
     return result
   }
   // Suffix-align the ancestor list against the user messages: the last
   // ancestor (the leaf) corresponds to the last user message.
   const offset = total_turns - ancestors.length
+  if (offset < 0) {
+    console.warn(
+      "More ancestors than user turns in trace — data inconsistency",
+      { ancestors_count: ancestors.length, user_turns: total_turns },
+    )
+  }
   for (let k = 0; k < ancestors.length; k++) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.ts
around lines 38 - 65, The function compute_forkable_run_ids silently tolerates
ancestors.length > total_turns by letting offset be negative; add a warning when
this happens to surface the data inconsistency: inside compute_forkable_run_ids,
after computing const offset = total_turns - ancestors.length, check if offset <
0 and emit a console.warn (or use an existing logger) that includes values for
total_turns, ancestors.length, and maybe a brief context (e.g., project/run IDs
if available) so developers can detect when ancestors and user_trace_indices are
misaligned; keep the existing guard for user_trace_indices[offset + k] so
behavior is unchanged.
app/web_ui/src/lib/ui/trace/trace.svelte (2)
37-37: 💤 Low value

messageExpanded is initialized once and not re-synced with trace length.

If the trace prop is replaced with a different-length array (e.g., page-level state swaps the trace without unmounting Trace), messageExpanded keeps its old length, which can desync bind:checked={messageExpanded[index]} against the new messages and persist stale expansion state across runs. In the multiturn page today the parent does run = null before navigating, so this component is currently torn down and remounted — but that's a fragile contract for a reusable UI component.

Consider re-syncing reactively when the trace identity changes:
♻️ Suggested change
-  // Track collapsed state for each message (true = expanded, false = collapsed)
-  let messageExpanded: boolean[] = trace.map(() => false)
+  // Track collapsed state for each message (true = expanded, false = collapsed).
+  // Re-sync whenever the trace identity or length changes so we don't carry
+  // stale expansion state across run navigations.
+  let messageExpanded: boolean[] = trace.map(() => false)
+  $: messageExpanded =
+    messageExpanded.length === trace.length
+      ? messageExpanded
+      : trace.map(() => false)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/web_ui/src/lib/ui/trace/trace.svelte` at line 37, The component
initializes messageExpanded once from trace but doesn't resync when trace is
replaced, causing stale expansion state; update the component to recompute
messageExpanded whenever the trace identity or its length changes (e.g., add a
reactive statement that sets messageExpanded = trace.map(() => false) when trace
changes or when trace.length differs) so bind:checked={messageExpanded[index]}
stays in sync; reference the existing messageExpanded and trace variables and
ensure any user expansion state is preserved or intentionally reset per the
desired behavior.
263-265: 💤 Low value

Cost formatting hardcodes USD and 6 decimals.

$${usage.cost.toFixed(6)} will render values like $0.000000 for zero-cost runs and assumes USD. If cost is ever expressed in a different currency or accumulates significant whole-dollar values, this display becomes misleading. Consider routing through a shared formatter (similar to formatLatency) so currency/precision live in one place and can be locale-aware later.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/web_ui/src/lib/ui/trace/trace.svelte` around lines 263 - 265, The UI
hardcodes USD and 6 decimal places when showing usage.cost (lines.push(`Cost:
$${usage.cost.toFixed(6)}`)); extract this into a shared formatter (e.g.,
formatCost) similar to formatLatency, and replace the direct string construction
with a call to formatCost(usage.cost, { currency?: 'USD' }) so currency and
precision are centralized and locale-aware; update trace.svelte to use
formatCost for all cost displays and add tests/usages where appropriate.
app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte (2)
478-500: 💤 Low value

Replace setTimeout(..., 100) with Svelte's tick() for the raw-data scroll.

The 100 ms delay is a magic number guessing how long Svelte takes to render the now-unhidden #multiturn_raw_data block. await tick() is the deterministic primitive for "wait until reactive DOM updates have flushed" and avoids both the flicker if rendering takes longer than 100 ms and the unnecessary wait if it takes less.
♻️ Suggested change
-  function multiturn_toggle_raw_data() {
-    multiturn_show_raw_data = !multiturn_show_raw_data
-    if (multiturn_show_raw_data) {
-      setTimeout(() => {
-        const rawDataElement = document.getElementById("multiturn_raw_data")
-        if (rawDataElement) {
-          rawDataElement.scrollIntoView({
-            behavior: "smooth",
-            block: "start",
-          })
-        }
-      }, 100)
-    }
-  }
+  async function multiturn_toggle_raw_data() {
+    multiturn_show_raw_data = !multiturn_show_raw_data
+    if (!multiturn_show_raw_data) return
+    await tick()
+    document.getElementById("multiturn_raw_data")?.scrollIntoView({
+      behavior: "smooth",
+      block: "start",
+    })
+  }
(Plus import { tick } from "svelte" at the top.)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte
around lines 478 - 500, The setTimeout(…, 100) in multiturn_toggle_raw_data is a
brittle race; replace it with Svelte's tick() to wait for DOM updates: import {
tick } from "svelte", then after setting multiturn_show_raw_data to true await
tick() and only then query document.getElementById("multiturn_raw_data") and
call scrollIntoView as currently done; keep the same scroll options and remove
the setTimeout wrapper.
548-555: 💤 Low value

ancestors_loaded_for_run_id is not reset alongside the other ancestor state.

The reset block at lines 541‑546 clears ancestors, ancestors_chain_broken, ancestors_load_failed and fork_target on every run_id change, but intentionally leaves ancestors_loaded_for_run_id unchanged so the trigger condition ancestors_loaded_for_run_id !== run_id can fire the fetch. That works today, but it's worth a one‑line comment near line 535 explaining why this single piece of state is deliberately excluded from the reset — otherwise a future change that "completes" the reset by adding ancestors_loaded_for_run_id = null would silently re‑fetch on every load even when task.turn_mode !== "multiturn".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte
around lines 548 - 555, Add a one-line comment next to the reset block that
clears ancestors, ancestors_chain_broken, ancestors_load_failed and fork_target
explaining that ancestors_loaded_for_run_id is intentionally NOT reset so the
reactive check (ancestors_loaded_for_run_id !== run_id) can trigger
load_ancestors(project_id, task_id, run_id) only when needed; mention that
clearing ancestors_loaded_for_run_id would cause unintended re-fetches even when
task.turn_mode !== "multiturn".

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/web_ui/src/lib/api_schema.d.ts`:
- Around line 9804-9811: The ancestors property description is ambiguous about
ordering when chain_broken=true (it currently says root→leaf but also “from the
leaf back”); update the OpenAPI schema so the canonical ordering is explicit
(e.g., always root-to-leaf, and when chain_broken=true the array contains only
the intact suffix ordered root-to-leaf for that suffix) and mention that
chain_broken indicates a truncated prefix was dropped; modify the
TaskRunAncestors/ancestors description accordingly and then regenerate the
client schema by running the app/web_ui/src/lib/generate_schema.sh script so
app/web_ui/src/lib/api_schema.d.ts reflects the clarified wording.

In `@app/web_ui/src/lib/ui/run_sidebar.svelte`:
- Around line 201-217: In extract_subtask_references, avoid destructuring the
result of message.kiln_task_tool_data.split(":::") directly; first call const
parts = message.kiln_task_tool_data.split(":::") and validate parts.length >= 4
and that parts[0], parts[2], and parts[3] are non-empty before assigning
project_id, task_id, run_id (or pushing into references). This ensures malformed
kiln_task_tool_data does not produce undefined values or invalid API calls and
only well-formed triples are added to the references array.

In `@libs/server/kiln_server/run_api.py`:
- Around line 617-632: When task.turn_mode == TurnMode.multiturn, require
request.parent_task_run_id to be present: if request.parent_task_run_id is None
then raise HTTPException(status_code=400, detail="parent_task_run_id is required
for multiturn tasks.") before calling TaskRun.from_id_and_parent_path; keep the
existing validation that parent_task_run_id is invalid for non-multiturn tasks
(the current check that raises when task.turn_mode != TurnMode.multiturn) and
preserve the subsequent lookup using TaskRun.from_id_and_parent_path and raising
404 if not found.

---

Outside diff comments:
In `@app/web_ui/src/lib/ui/icons/info_circle_icon.svelte`:
- Around line 1-15: The SVG Svelte component file lacks the required TypeScript
script block; add a top-level <script lang="ts"> block in
app/web_ui/src/lib/ui/icons/info_circle_icon.svelte (even if empty) to satisfy
the project's TypeScript guideline for Svelte components and make future prop
typing easier; place it before the SVG markup and include any future exports
(e.g., export let ...) there when needed.

---

Nitpick comments:
In `@app/desktop/studio_server/test_eval_api.py`:
- Line 1570: The test sets mock_task.filter_runs.return_value but doesn't assert
that the readonly flag is forwarded; update the test to assert that
runs_in_filter (the code under test) calls mock_task.filter_runs with
readonly=True by adding an assertion like
mock_task.filter_runs.assert_called_once_with(readonly=True) (or
assert_called_with(readonly=True) if multiple calls are expected) after the
invocation so the test verifies the call signature.

In `@app/web_ui/src/lib/ui/trace/trace.svelte`:
- Line 37: The component initializes messageExpanded once from trace but doesn't
resync when trace is replaced, causing stale expansion state; update the
component to recompute messageExpanded whenever the trace identity or its length
changes (e.g., add a reactive statement that sets messageExpanded = trace.map(()
=> false) when trace changes or when trace.length differs) so
bind:checked={messageExpanded[index]} stays in sync; reference the existing
messageExpanded and trace variables and ensure any user expansion state is
preserved or intentionally reset per the desired behavior.
- Around line 263-265: The UI hardcodes USD and 6 decimal places when showing
usage.cost (lines.push(`Cost: $${usage.cost.toFixed(6)}`)); extract this into a
shared formatter (e.g., formatCost) similar to formatLatency, and replace the
direct string construction with a call to formatCost(usage.cost, { currency?:
'USD' }) so currency and precision are centralized and locale-aware; update
trace.svelte to use formatCost for all cost displays and add tests/usages where
appropriate.

In
`@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte:
- Around line 478-500: The setTimeout(…, 100) in multiturn_toggle_raw_data is a
brittle race; replace it with Svelte's tick() to wait for DOM updates: import {
tick } from "svelte", then after setting multiturn_show_raw_data to true await
tick() and only then query document.getElementById("multiturn_raw_data") and
call scrollIntoView as currently done; keep the same scroll options and remove
the setTimeout wrapper.
- Around line 548-555: Add a one-line comment next to the reset block that
clears ancestors, ancestors_chain_broken, ancestors_load_failed and fork_target
explaining that ancestors_loaded_for_run_id is intentionally NOT reset so the
reactive check (ancestors_loaded_for_run_id !== run_id) can trigger
load_ancestors(project_id, task_id, run_id) only when needed; mention that
clearing ancestors_loaded_for_run_id would cause unintended re-fetches even when
task.turn_mode !== "multiturn".

In
`@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.ts:
- Around line 38-65: The function compute_forkable_run_ids silently tolerates
ancestors.length > total_turns by letting offset be negative; add a warning when
this happens to surface the data inconsistency: inside compute_forkable_run_ids,
after computing const offset = total_turns - ancestors.length, check if offset <
0 and emit a console.warn (or use an existing logger) that includes values for
total_turns, ancestors.length, and maybe a brief context (e.g., project/run IDs
if available) so developers can detect when ancestors and user_trace_indices are
misaligned; keep the existing guard for user_trace_indices[offset + k] so
behavior is unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8909dd12-8a4b-4e3b-b18d-9c52e28e87b6

📥 Commits

Reviewing files that changed from the base of the PR and between ec15e20 and 7a9a7ae.

📒 Files selected for processing (48)

app/desktop/studio_server/agent_api.py
app/desktop/studio_server/eval_api.py
app/desktop/studio_server/finetune_api.py
app/desktop/studio_server/test_data_gen_api.py
app/desktop/studio_server/test_eval_api.py
app/desktop/studio_server/test_finetune_api.py
app/desktop/studio_server/test_repair_api.py
app/web_ui/src/lib/api_schema.d.ts
app/web_ui/src/lib/types.ts
app/web_ui/src/lib/ui/conversation/multiturn_composer.svelte
app/web_ui/src/lib/ui/conversation/multiturn_composer.test.ts
app/web_ui/src/lib/ui/icons/fork_icon.svelte
app/web_ui/src/lib/ui/icons/info_circle_icon.svelte
app/web_ui/src/lib/ui/run_sidebar.svelte
app/web_ui/src/lib/ui/trace/trace.svelte
app/web_ui/src/lib/ui/trace/trace.test.ts
app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte
app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.test.ts
app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.ts
app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/multiturn_send.test.ts
app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/multiturn_send.ts
app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/+page.svelte
app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
app/web_ui/src/routes/(app)/run/+page.svelte
app/web_ui/src/routes/(app)/run/run.svelte
app/web_ui/src/routes/(app)/run/run_input_form.svelte
app/web_ui/src/routes/(app)/settings/edit_task/[project_id]/[task_id]/load_task_editor.svelte
app/web_ui/src/routes/(app)/specs/[project_id]/[task_id]/[spec_id]/[eval_id]/create_eval_config/get_eval_steps.test.ts
app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte
app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.test.ts
libs/core/kiln_ai/adapters/eval/eval_runner.py
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
libs/core/kiln_ai/adapters/fine_tune/test_dataset_formatter.py
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
libs/core/kiln_ai/adapters/prompt_builders.py
libs/core/kiln_ai/cli/commands/test_package_project.py
libs/core/kiln_ai/datamodel/datamodel_enums.py
libs/core/kiln_ai/datamodel/dataset_split.py
libs/core/kiln_ai/datamodel/task.py
libs/core/kiln_ai/datamodel/test_dataset_split.py
libs/core/kiln_ai/datamodel/test_example_models.py
libs/core/kiln_ai/datamodel/test_models.py
libs/core/kiln_ai/datamodel/test_task.py
libs/core/kiln_ai/utils/test_dataset_import.py
libs/server/kiln_server/run_api.py
libs/server/kiln_server/task_api.py
libs/server/kiln_server/test_run_api.py
libs/server/kiln_server/test_task_api.py

leonardmq · 2026-05-15T04:13:20Z

After merging #1415 - we can ditch filter_runs and just call runs directly

Add multiturn CSV import path with parsing, validation, and TaskRun chain materialization. `import_csv` dispatches on `task.turn_mode`. Introduces `ImportResult` and adds `imported_conversation_count` to the bulk upload response. Covered by unit and endpoint tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Make the dataset upload dialog turn-mode aware: multiturn tasks get a tailored title, help block describing the `trace`/`tags` columns with an inline example, an alternation/no-system-messages callout, and a link to a new `static/sample_multiturn.csv` asset. Adds Vitest coverage for the single-turn, multiturn, and null-task branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eat-multiturn-task

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@libs/core/kiln_ai/datamodel/task.py`:
- Around line 283-286: The docstring for filter_runs is inconsistent with
runs(): update the docstring to state that runs() returns leaf-only runs by
default and only includes intermediate (non-leaf) turns when
include_intermediate_runs=True; reference the runs() behavior and the
include_intermediate_runs parameter in the filter_runs docstring so it
accurately describes filtering semantics for multiturn tasks using filter_runs
and runs().

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 85623d66-30e5-4884-b457-8c88c1e1776a

📥 Commits

Reviewing files that changed from the base of the PR and between ce9c09d and 42bb2f4.

📒 Files selected for processing (12)

app/desktop/studio_server/finetune_api.py
app/desktop/studio_server/test_finetune_api.py
app/web_ui/src/lib/api_schema.d.ts
app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
libs/core/kiln_ai/datamodel/task.py
libs/core/kiln_ai/datamodel/test_models.py
libs/core/kiln_ai/datamodel/test_task.py
libs/core/kiln_ai/utils/test_dataset_import.py
libs/server/kiln_server/run_api.py
libs/server/kiln_server/test_run_api.py

🚧 Files skipped from review as they are similar to previous changes (8)

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
app/desktop/studio_server/test_finetune_api.py
app/web_ui/src/lib/api_schema.d.ts
app/desktop/studio_server/finetune_api.py
libs/core/kiln_ai/utils/test_dataset_import.py
app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
libs/server/kiln_server/run_api.py
libs/server/kiln_server/test_run_api.py

…-AI/Kiln into leonard/kil-658-multiturn-support-csv

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@libs/server/kiln_server/run_api.py`:
- Around line 439-445: The batch delete endpoint currently calls run.delete()
directly and must be changed to use the same cascade behavior as the single-run
delete: use the same helper _collect_cascade_delete_runs (the code that builds
runs_to_delete) to expand each target run into its cascade set and then delete
the resulting runs in that set rather than deleting individual runs directly;
update the POST /runs/delete handler (the batch deletion logic that currently
iterates runs and calls run.delete()) to call task_and_run_from_id for each id,
call _collect_cascade_delete_runs(task, run) to gather deletions, deduplicate
the combined set, and then delete each run in that unified runs_to_delete
collection so both endpoints behave identically.
- Around line 633-641: After loading the parent via
TaskRun.from_id_and_parent_path, ensure you reject continuation requests when
the parent has no trace: check parent_task_run.trace (referenced as prior_trace
in this block) and if it is None raise an HTTPException (4xx) with a clear
message including request.parent_task_run_id rather than proceeding; place this
check immediately after the parent_task_run is validated and before any use of
prior_trace or calling the adapter so a child cannot be persisted as a
continuation of a parent with no trace.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4509f62f-81ff-43d2-b916-5d5f4c348454

📥 Commits

Reviewing files that changed from the base of the PR and between 42bb2f4 and 4a15bc0.

📒 Files selected for processing (16)

app/desktop/studio_server/finetune_api.py
app/desktop/studio_server/test_data_gen_api.py
app/desktop/studio_server/test_finetune_api.py
app/desktop/studio_server/test_repair_api.py
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
libs/core/kiln_ai/adapters/fine_tune/test_dataset_formatter.py
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
libs/core/kiln_ai/cli/commands/test_package_project.py
libs/core/kiln_ai/datamodel/dataset_split.py
libs/core/kiln_ai/datamodel/task.py
libs/core/kiln_ai/datamodel/test_example_models.py
libs/core/kiln_ai/datamodel/test_models.py
libs/core/kiln_ai/datamodel/test_task.py
libs/core/kiln_ai/utils/test_dataset_import.py
libs/server/kiln_server/run_api.py
libs/server/kiln_server/test_run_api.py

🚧 Files skipped from review as they are similar to previous changes (6)

libs/core/kiln_ai/datamodel/test_example_models.py
libs/core/kiln_ai/datamodel/test_models.py
app/desktop/studio_server/test_finetune_api.py
libs/core/kiln_ai/datamodel/dataset_split.py
app/desktop/studio_server/test_data_gen_api.py
app/desktop/studio_server/test_repair_api.py

…-AI/Kiln into leonard/kil-658-multiturn-support-csv

…ln-AI/Kiln into leonard/kil-658-multiturn-support-csv

…task-2 refactor: multiturn UI more like chat

…rt-csv feat: multiturn csv import

…eat-multiturn-task

…-AI/Kiln into leonard/kil-632-feat-multiturn-task

…eat-multiturn-task

- Hide the redundant "Plaintext Input" header in the composer and use a "Write a message…" placeholder (RunInputForm gains label/placeholder/ hide_label props). - Disable composer autofocus so loading a turn no longer yanks the viewport to the textbox. - Restyle the fork context strip as a status header (label + helper text) instead of a chat-bubble container. - Make the multiturn view a chat layout on xl+: scrollable transcript with the composer pinned to the bottom, Options column kept at natural height. "Show Raw Data" stays under the composer; transcript has no horizontal scroll and uses the Assistant scrollbar styling. - Scroll the transcript to the latest turn on load and after sending, re-pinning to the bottom across late markdown reflow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

On send, append the user's message to the transcript optimistically so it shows immediately, disable the composer input/Send while the run is in flight, then redirect to the new run. The run-load reactive clears the placeholder once the real trace arrives (and on_send_settled clears it on error), avoiding a duplicate bubble. Append-mode only; fork mode just locks the input and navigates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Tag multiturn turns as "manual_run" instead of "multiturn_run" so they aren't singled out from other manually-created runs. - In the multiturn sidebar, title the usage section "Final Turn Usage" and drop the "Total" prefix from the Cost/Tokens/Latency line items (it's the final turn's run, not a whole-conversation aggregate). Single-turn is unchanged. - Open Raw Data in a modal instead of expanding inline and reflowing the chat column. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…lish - Move the per-message usage/fork actions out of the chat bubbles into a dedicated row below each message that fades in on hover/focus (Claude desktop style). Space is reserved so hovering never reflows the conversation, and bubbles stay clean (no always-empty line). - Label the actions ("Usage" with a new dollar icon component, "Fork"); usage is right-aligned to the bubble's edge. - Add a height prop to RunInputForm; the composer uses a shorter textarea. - Fix assistant bubbles (esp. collapsed thinking/tool blocks) shrinking to content width instead of filling their allocated 70%. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Disable the Evals and Synthetic Data screens for multi-turn tasks (warning + no action button + no redirect into the flow), matching the fine-tuning behavior. - Remove a stray zero-width space in a comment that failed the web UI no-irregular-whitespace lint check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…send - Show the assistant ChatLoading indicator below the optimistic message while a turn is generating, kept up continuously across the redirect until the new run renders. - Stop nulling run / flipping loading on send so the transcript stays mounted during navigation instead of flashing the full-page spinner; keep the composer disabled (busy) until the new run loads to prevent a double-send against the stale parent. - Fix the cleared composer flagging "Required" after a successful send by keying the plaintext field so clear_input resets its validation state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When the current task is multi-turn, the /run page now renders a chat-style new-conversation layout (empty transcript + pinned MultiturnComposer + Options sidebar) mirroring the in-run multiturn page, instead of the Input/Run/Output layout. The first message starts a root run (optimistic message + loading indicator) then redirects to the dataset run page. send_multiturn gains an allow_root_turn flag so a parentless first turn is permitted from /run; the in-chat composer leaves it false. Single-turn tasks keep the existing layout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

leonardmq and others added 12 commits May 5, 2026 00:15

Merge branch 'main' of github.com:Kiln-AI/Kiln into leonard/kil-632-f…

1a38220

…eat-multiturn-task # Conflicts: # app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte

fix: ui error on next turn and clearing form input

020128c

feat: do not support finetuning for multiturn tasks

bcfff17

refactor: multiturn ui tweaks

6751617

refactor: exclude intermediate runs

4b4fb28

leonardmq added 3 commits May 14, 2026 16:33

fix: lint, schema

ffd0857

Merge branch 'main' of github.com:Kiln-AI/Kiln into leonard/kil-632-f…

fc4e050

…eat-multiturn-task

chore: remove specs

7a9a7ae

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread app/web_ui/src/lib/api_schema.d.ts Outdated

Comment thread app/web_ui/src/lib/ui/run_sidebar.svelte

Comment thread libs/server/kiln_server/run_api.py

leonardmq mentioned this pull request May 15, 2026

feat: model relationship configuration override #1415

Merged

2 tasks

leonardmq and others added 6 commits May 15, 2026 12:53

chore: annotations

b703874

fix: update playwright tests

ce9c09d

chore: delete spec

fa21f58

Merge branch 'main' of github.com:Kiln-AI/Kiln into leonard/kil-632-f…

42bb2f4

…eat-multiturn-task

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread libs/core/kiln_ai/datamodel/task.py Outdated

leonardmq added 2 commits May 19, 2026 04:40

Merge branch 'leonard/kil-632-feat-multiturn-task' of github.com:Kiln…

77044ec

…-AI/Kiln into leonard/kil-658-multiturn-support-csv

refactor: replace filter_runs with runs

efc525a

Merge branch 'leonard/kil-632-feat-multiturn-task' of github.com:Kiln…

1cf6bba

…-AI/Kiln into leonard/kil-658-multiturn-support-csv

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread libs/server/kiln_server/run_api.py

Comment thread libs/server/kiln_server/run_api.py

leonardmq and others added 28 commits May 19, 2026 20:44

fix tests

5710176

fix: columns

2093a93

feat: link to parent in UI

0481046

cr and finishing

39e7b1b

Merge branch 'leonard/kil-632-feat-multiturn-task' of github.com:Kiln…

98a07f5

…-AI/Kiln into leonard/kil-658-multiturn-support-csv

refactor: extra validation

9d8a5e7

cr

45daf03

cr

e7a8877

Merge branch 'leonard/kil-632-feat-multiturn-task' of github.com:Kiln…

9f9caf2

…-AI/Kiln into leonard/kil-658-multiturn-support-csv

Merge branch 'leonard/kil-658-multiturn-support-csv' of github.com:Ki…

b4e14dc

…ln-AI/Kiln into leonard/kil-658-multiturn-support-csv

refactor: chat-style UI for multiturn

507dec2

refactor: ui tweaks

c3120e7

refactor: pull out reasoning into own block

9a818a4

refactor: ui changes

0ad495f

Merge pull request #1426 from Kiln-AI/leonard/kil-632-feat-multiturn-…

6479c32

…task-2 refactor: multiturn UI more like chat

Merge pull request #1420 from Kiln-AI/leonard/kil-658-multiturn-suppo…

50f9c5c

…rt-csv feat: multiturn csv import

Merge branch 'main' of github.com:Kiln-AI/Kiln into leonard/kil-632-f…

70385f2

…eat-multiturn-task

Merge branch 'leonard/kil-632-feat-multiturn-task' of github.com:Kiln…

8436138

…-AI/Kiln into leonard/kil-632-feat-multiturn-task

Merge branch 'main' of github.com:Kiln-AI/Kiln into leonard/kil-632-f…

95080cd

…eat-multiturn-task

refactor: remove direct link to ancestor turn (cannot map properly)

9022428

Merge branch 'main' of github.com:Kiln-AI/Kiln into leonard/kil-632-f…

60942bb

…eat-multiturn-task

Conversation

leonardmq commented May 14, 2026

What does this PR do?

Checklists

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

leonardmq May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Coverage Report

Diff: origin/main...HEAD

Summary

Line-by-line

libs/core/kiln_ai/utils/dataset_import.py

libs/server/kiln_server/run_api.py

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leonardmq commented May 15, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading