Skip to content

feat: multiturn tasks#1409

Open
leonardmq wants to merge 53 commits into
mainfrom
leonard/kil-632-feat-multiturn-task
Open

feat: multiturn tasks#1409
leonardmq wants to merge 53 commits into
mainfrom
leonard/kil-632-feat-multiturn-task

Conversation

@leonardmq
Copy link
Copy Markdown
Collaborator

What does this PR do?

Add support for multiturn tasks:

  • datamodel:
    • feat: Task now has a turn_mode: 'single_turn' | 'multiturn'
    • refactor: migrate callers to use task.filter_runs() instead of task.runs() to filter out intermediate TaskRuns
  • UI:
    • task creation: can pick between multiturn and single turn
    • now supports multiturn conversations
    • allow forking conversations
    • dataset only shows leaf TaskRuns
    • finetuning: not supported for now

Checklists

  • Tests have been run locally and passed
  • New tests have been added to any work in /lib

leonardmq and others added 12 commits May 5, 2026 00:15
Introduce the backbone for multiturn tasks at the data layer:

- Add TurnMode enum (single_turn, multiturn) in datamodel_enums.
- Add Task.turn_mode field defaulting to single_turn, with a
  model_validator rejecting multiturn combined with input_json_schema
  or output_json_schema.
- Enforce turn_mode immutability in task_api.update_task (400 on
  attempted change post-creation), with string-to-enum coercion for
  patch payloads.
- Add backend pytest coverage for defaults, JSON migration of legacy
  tasks without the field, forbidden schema combinations, and the
  update_task immutability guard.
- Add Phase 1 spec artifacts under specs/projects/multiturn_tasks/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build on the Phase 1 data model to plumb continuation runs through the
run endpoint and exclude interior nodes from list views and dataset
splits:

- Add RunTaskRequest.parent_task_run_id field on the run endpoint, so
  callers can mark a run as a continuation of an existing one.
- Plumb parent_task_run_id through run_task: load the parent TaskRun,
  pass its trace as prior_trace and the run itself as parent_task_run
  to the adapter; reject single-turn tasks with 400 and missing parent
  ids with 404.
- Filter get_runs_summary down to leaves only — drop runs whose ids
  appear as parent_task_run_id on any other run, so the runs table
  shows one row per conversation.
- Apply the same leaf-only filter in DatasetSplit.build_split_contents
  so dataset splits sample whole conversations rather than interior
  turns.
- Backend pytest coverage for the continuation flow, validation
  errors, both leaf filters, and regenerate api_schema.d.ts.
- Mark Phase 2 complete in the implementation plan and add the
  Phase 2 spec artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surface the Phase 1 data model in the task creation/edit UI:

- Add a radio-style turn_mode picker (Single-turn / Multiturn) as a new
  "Part 2: Task Type" section in edit_task.svelte, with renumbered
  Input/Output/Requirements parts. Defaults to single_turn for new
  tasks and reflects task.turn_mode for existing ones.
- Conditionally hide the input + output schema editors when multiturn
  is selected, replacing them with short explanatory notes
  ("Multiturn tasks use plain-text input." / "Structured output is not
  supported for multiturn tasks yet."), and skip the schema fields in
  the create POST body so the backend doesn't get a stale string.
- Guard inputSchemaSection / outputSchemaSection access in has_edits
  with optional chaining so multiturn forms (where the bind:this
  targets don't mount) don't blow up.
- Add a read_only_turn_mode prop wired by load_task_editor.svelte so
  the edit page renders turn_mode as an immutable label with a hint,
  while clone mode still gets the editable picker (cloned task gets a
  fresh id, so a new turn_mode is allowed).
- Export TurnMode from $lib/types so the component can reference the
  literal-union type.
- New vitest suite edit_task.test.ts covering default render, toggle
  behavior, conditional schema sections, read-only mode, and POST
  payload contents for both turn modes.
- Mark Phase 3 complete in the implementation plan and add the Phase 3
  spec artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Frontend-only phase that completes the multiturn-task UX end-to-end,
building on the Phase 1-3 data model, run plumbing, and task-edit UI:

- Branch /run send-success on multiturn tasks to goto the dataset
  run-detail page for the newly-created TaskRun, leaving single-turn
  flows untouched.
- Two-column multiturn layout on the dataset run-detail page: left
  column shows ConversationView + composer (RunInputForm wrapped in
  FormContainer with keyboard submit); right column shows the existing
  PropertyList plus a SavedRunConfigurationsDropdown +
  RunConfigComponent so the user can switch model/prompt before each
  Send. Single-turn run-detail rendering preserved unchanged.
- New conversation_view.svelte + message_block.svelte under
  lib/ui/conversation/: flat thread of role-based message blocks
  rendered from TaskRun.trace, with reasoning + tool calls collapsed
  inside the assistant block and a wrap fix so long messages do not
  blow out the column.
- New multiturn_send.ts helper that owns the run-config-component
  resolve/build, POST to /run with parent_task_run_id, and post-send
  cleanup; covered by multiturn_send.test.ts.
- URL replaceState on each Send so back-button stays sane while the
  conversation grows.
- Run-config picker is wrapped in {#key run.id} so it re-seeds from
  the new run's config after every Send instead of sticking to the
  initial selection.
- Frontend test coverage for conversation_view, message_block, and
  multiturn_send.
- Mark Phase 4 complete in the implementation plan and add the
  Phase 4 spec artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eat-multiturn-task

# Conflicts:
#	app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte
Replace the multiturn ConversationView with the shared Trace component used
by single-turn runs so messages render in collapsible role-labeled blocks
(system, user, assistant, tool). Add a markdown_content flag to Trace that
renders non-tool message content (and reasoning) via ChatMarkdown, preserving
the markdown rendering the multiturn UI previously had. The plaintext input
form continues to render below the trace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds GET /api/projects/{project_id}/tasks/{task_id}/runs/{run_id}/ancestors
which walks parent_task_run_id from leaf to root and returns the ordered
chain with a chain_broken flag. Defensively handles missing parents,
cycles, runaway depth, and chains longer than the leaf trace's user-message
count. Includes pytest coverage and regenerated OpenAPI types.

This is Phase 1 of the multiturn trace forking project; the endpoint will
back the frontend fork affordance in Phase 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of the multiturn-trace-forking project. Adds the user-facing
fork affordance on the run-detail page so users can branch a multiturn
conversation from any eligible past user turn.

- Extract the inline append composer into a reusable
  `MultiturnComposer` and add a fork mode (prefill, context strip,
  Cancel with dirty-confirm dialog, dirty-switch confirmation when
  changing fork target mid-edit).
- Add `ForkIcon.svelte` to match the existing icon conventions.
- Extend `trace.svelte` with optional `forkable_run_ids`,
  `truncate_at_trace_index`, and `on_fork` props; render the fork
  button inside the expanded user block body so it only appears once
  a turn is opened.
- Wire fork state in the run-detail `+page.svelte`: ancestors fetch,
  `compute_forkable_run_ids`, fork-target state, and non-blocking
  banners for `chain_broken` and ancestors load failures.
- Extract `RunSidebar` and mount it on both the single-turn and
  multiturn run pages so Rating, Tags, and Usage have parity.
- Add vitest suites for `fork_helpers`, `trace`, and
  `multiturn_composer`.
- Update `functional_spec.md` and `ui_design.md` to reflect the
  in-expanded-body fork-button placement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This pull request introduces multiturn task support: a turn_mode field controls whether tasks run single-turn or allow conversational continuations. The backend implements run ancestry tracking via parent_task_run_id, ancestor querying, and cascade deletion. The frontend provides task editor UI for turn mode selection, a new MultiturnComposer component for fork/append modes with discard confirmation, trace forking via ancestor navigation, and a delegated RunSidebar for rating/feedback management.

Changes

Multiturn task and run system

Layer / File(s) Summary
Core data model: TurnMode enum and task validation
libs/core/kiln_ai/datamodel/datamodel_enums.py, libs/core/kiln_ai/datamodel/task.py
Introduces TurnMode enum (single_turn, multiturn) and adds turn_mode field to Task with validation rejecting structured schemas for multiturn tasks.
API schema and type definitions
app/web_ui/src/lib/api_schema.d.ts, app/web_ui/src/lib/types.ts
Auto-generated OpenAPI schema adds TurnMode enum, TaskRunAncestor, TaskRunAncestorsResponse types, parent_task_run_id field to RunTaskRequest, and ancestors endpoint definition; TypeScript type aliases exported.
Server: run ancestry endpoint, parent-child execution, cascade deletion
libs/server/kiln_server/run_api.py, libs/server/kiln_server/task_api.py
Implements GET /ancestors endpoint with cycle detection and chain-broken tracking; adds parent_task_run_id support for run continuation passing prior_trace to adapter; implements cascade-delete for ancestor runs; validates task turn_mode immutability on PATCH.
Server tests: ancestry, continuation, cascade deletion
libs/server/kiln_server/test_run_api.py, libs/server/kiln_server/test_task_api.py
Tests multiturn run continuation, ancestors endpoint edge cases (missing parents, cycles, chain truncation), cascade deletion semantics, and task turn_mode change rejection.
Datamodel: Task.turn_mode and dataset splits
libs/core/kiln_ai/datamodel/task.py, libs/core/kiln_ai/datamodel/dataset_split.py, libs/core/kiln_ai/datamodel/test_dataset_split.py, libs/core/kiln_ai/datamodel/test_task.py
Adds turn_mode validation to Task; updates DatasetSplit methods to include intermediate multiturn runs when resolving references; adds test helpers and regression tests for leaf-only vs all-run split inclusion.
Backend: dataset formatter and test run lookups
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py, multiple test files across adapters/, cli/, utils/
Updates dataset formatter to resolve split entries using intermediate runs; sweeps test files to use include_intermediate_runs=True when fetching task runs.
Frontend task editor: turn_mode UI and validation
app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte, app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.test.ts, app/web_ui/src/routes/(app)/settings/edit_task/[project_id]/[task_id]/load_task_editor.svelte
Adds "Part 2: Task Type" section with single-turn/multiturn radio controls (read-only when cloning); hides schema inputs for multiturn; includes tests for toggle behavior, read-only mode, and payload construction.
Frontend: fine-tune pages, single-turn routing, task loading
app/web_ui/src/routes/(app)/fine_tune/..., app/web_ui/src/routes/(app)/run/+page.svelte
Fine-tune pages load task and display unsupported warnings for multiturn; single-turn run page routes to dataset run page for multiturn; task loading store helpers added to routes.
Frontend: send_multiturn implementation and tests
app/web_ui/src/routes/(app)/dataset/[...]/run/multiturn_send.ts, app/web_ui/src/routes/(app)/dataset/[...]/run/multiturn_send.test.ts
Implements send_multiturn helper POSTing to run endpoint with parent_task_run_id, run config, and tags; tests cover request construction, success/error flows, model validation, and input preservation.
Frontend: MultiturnComposer component and tests
app/web_ui/src/lib/ui/conversation/multiturn_composer.svelte, app/web_ui/src/lib/ui/conversation/multiturn_composer.test.ts
Svelte component supporting append and fork modes with dirty-state tracking, discard confirmation dialog, is_dirty() and request_swap() exports; tests validate UI rendering and interaction across modes.
Frontend: fork helpers for ancestry navigation
app/web_ui/src/routes/(app)/dataset/[...]/run/fork_helpers.ts, app/web_ui/src/routes/(app)/dataset/[...]/run/fork_helpers.test.ts
Implements fork_target_from_user_block, compute_forkable_run_ids, and content_string_from_user_message helpers for trace forking; tests cover all scenarios including edge cases.
Frontend: trace component enhancements
app/web_ui/src/lib/ui/trace/trace.svelte, app/web_ui/src/lib/ui/trace/trace.test.ts
Adds optional Markdown rendering for non-tool messages, fork button on user turns, usage tooltips (latency/tokens/cost), and trace truncation; tests cover rendering rules, fork affordance conditions, and truncation.
Frontend: RunSidebar component for rating and feedback
app/web_ui/src/lib/ui/run_sidebar.svelte
New component providing task-rating selection, requirement ratings, feedback management (load/sort/submit/delete), and recursive subtask usage aggregation with async safety guards.
Frontend: run.svelte refactored to use RunSidebar
app/web_ui/src/routes/(app)/run/run.svelte, app/web_ui/src/routes/(app)/run/run_input_form.svelte
Simplifies run.svelte by delegating sidebar to RunSidebar component; removes local rating/feedback/usage logic; adds textarea seed/focus methods to run_input_form.
Frontend: dataset run page multiturn layout
app/web_ui/src/routes/(app)/dataset/[...]/run/+page.svelte
Adds conditional multiturn layout (trace, composer, raw-data, options) vs single-turn; includes ancestor loading, forkable-run-id computation, fork handling, and handle_send() navigation.
Frontend: UI icons and backend finetune rejection
app/web_ui/src/lib/ui/icons/fork_icon.svelte, app/web_ui/src/lib/ui/icons/info_circle_icon.svelte, app/desktop/studio_server/finetune_api.py
New fork and info-circle icons; desktop finetune API rejects multiturn tasks with HTTP 400.
Backend desktop tests and config updates
app/desktop/studio_server/test_*.py, app/web_ui/tests/e2e/..., libs/server/kiln_server/utils/agent_checks/annotations/...
Desktop tests updated for intermediate run handling; e2e test expectations updated for Part section headings; agent policy annotation added for ancestors endpoint.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Kiln-AI/Kiln#1415: Introduces ParentOfRelationship and include_intermediate_runs parameter to control leaf-vs-intermediate run handling, which this PR relies on throughout the multiturn API/UI.
  • Kiln-AI/Kiln#1176: Refactors backend data model to persist parent_task_run_id field that this PR builds upon for ancestry and continuation.
  • Kiln-AI/Kiln#601: Previous trace/run UI refactoring that the main PR extends with markdown rendering, fork affordances, and sidebar extraction.

Suggested reviewers

  • scosman
  • chiang-daniel

🐰 A multiturn maze of traces deep,
Where runs can fork and ancestors sleep,
With composers dancing, sidebars bright,
Conversations bloom—single turn takes flight!

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch leonard/kil-632-feat-multiturn-task

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements multiturn task support across the stack, introducing task turn modes, conversation-based run views, and a forking mechanism for trace branches. Key backend additions include ancestor chain traversal and cascade deletion logic, while the frontend sees new components for conversation management and a refactored run sidebar. Review feedback highlights opportunities for performance optimization in subtask usage calculations and disk I/O during deletions. It also identifies a UX bug regarding event propagation on the fork button, an architectural violation in component dependencies, and a missing reactive dependency in the ratings UI.

Comment thread app/web_ui/src/lib/ui/run_sidebar.svelte
Comment on lines +93 to +130
# the loader for each ancestor. We need the full chain view here.
all_runs = task.filter_runs(include_intermediate_runs=True, readonly=True)
children_by_parent: Dict[str, list[str]] = {}
for r in all_runs:
if r.parent_task_run_id and r.id is not None:
children_by_parent.setdefault(r.parent_task_run_id, []).append(str(r.id))

to_delete: list[TaskRun] = [target]
deleted_ids: set[str] = set()
if target.id is not None:
deleted_ids.add(str(target.id))

visited: set[str] = set(deleted_ids)
current = target
for _ in range(_MAX_ANCESTOR_DEPTH):
parent_id = current.parent_task_run_id
if parent_id is None:
break
if parent_id in visited:
# Cycle: stop here, but everything queued so far is still valid.
break
parent = TaskRun.from_id_and_parent_path(parent_id, task.path)
if parent is None:
# Chain broken: stop the cascade, don't 500.
break
live_children = [
cid
for cid in children_by_parent.get(parent_id, [])
if cid not in deleted_ids
]
if live_children:
# A sibling branch survives — keep this parent.
break
to_delete.append(parent)
if parent.id is not None:
deleted_ids.add(str(parent.id))
visited.add(str(parent.id))
current = parent
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Redundant disk I/O in _collect_cascade_delete_runs. The function already loads all runs for the task into all_runs at line 93. Instead of calling TaskRun.from_id_and_parent_path in a loop (which hits the filesystem repeatedly), you should use a lookup dictionary built from the all_runs list already in memory.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — _collect_cascade_delete_runs now builds a runs_by_id dict alongside children_by_parent from the single all_runs load, and the ancestor walk uses runs_by_id.get(parent_id) instead of TaskRun.from_id_and_parent_path.

Comment thread app/web_ui/src/lib/ui/trace/trace.svelte
Comment thread app/web_ui/src/lib/ui/conversation/multiturn_composer.svelte Outdated
Comment thread app/web_ui/src/lib/ui/run_sidebar.svelte
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 14, 2026

📊 Coverage Report

Overall Coverage: 92%

Diff: origin/main...HEAD

  • app/desktop/studio_server/finetune_api.py (100%)
  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (100%)
  • libs/core/kiln_ai/datamodel/datamodel_enums.py (100%)
  • libs/core/kiln_ai/datamodel/dataset_split.py (100%)
  • libs/core/kiln_ai/datamodel/task.py (100%)
  • libs/core/kiln_ai/datamodel/task_run.py (100%)
  • libs/core/kiln_ai/utils/dataset_import.py (93.4%): Missing lines 360,396,451,563,578,585,597-599
  • libs/server/kiln_server/run_api.py (96.9%): Missing lines 77-78,151,711
  • libs/server/kiln_server/task_api.py (100%)

Summary

  • Total: 308 lines
  • Missing: 13 lines
  • Coverage: 95%

Line-by-line

View line-by-line diff coverage

libs/core/kiln_ai/utils/dataset_import.py

Lines 356-364

  356     (`tags -> 0: ...`) when constructing TaskRuns downstream.
  357     """
  358     for tag in tags:
  359         if not tag:
! 360             raise KilnInvalidImportFormat(
  361                 "Tags cannot be empty strings.",
  362                 row_number=row_number,
  363             )
  364         if " " in tag:

Lines 392-400

  392 
  393     messages: list[ValidatedMessage] = []
  394     for k, msg in enumerate(trace, start=1):
  395         if not isinstance(msg, dict):
! 396             raise KilnInvalidImportFormat(
  397                 f"message {k}: must be a JSON object.",
  398                 row_number=row_number,
  399             )

Lines 447-455

  447         if role == "assistant":
  448             rc = msg.get("reasoning_content")
  449             if rc is not None:
  450                 if not isinstance(rc, str):
! 451                     raise KilnInvalidImportFormat(
  452                         f"message {k}: 'reasoning_content' must be a string.",
  453                         row_number=row_number,
  454                     )
  455                 if not rc:

Lines 559-567

  559     with open(dataset_path, "r", newline="", encoding="utf-8") as csvfile:
  560         reader = csv.DictReader(csvfile)
  561 
  562         if not reader.fieldnames:
! 563             raise KilnInvalidImportFormat(
  564                 "CSV file appears to be empty or missing headers"
  565             )
  566 
  567         actual_headers = set(reader.fieldnames)

Lines 574-582

  574                 f"Got: {', '.join(sorted(actual_headers))}."
  575             )
  576         missing_headers = required_headers - actual_headers
  577         if missing_headers:
! 578             raise KilnInvalidImportFormat(
  579                 f"Missing required headers: {', '.join(missing_headers)}. "
  580                 f"Required headers are: {', '.join(required_headers)}"
  581             )

Lines 581-589

  581             )
  582 
  583         unknown_headers = actual_headers - (required_headers | optional_headers)
  584         if unknown_headers:
! 585             logger.warning(
  586                 f"Unknown headers in CSV file will be ignored: {', '.join(unknown_headers)}"
  587             )
  588 
  589         for row_number, row in enumerate(reader, start=2):

Lines 593-603

  593                         **row,
  594                         "tags": deserialize_tags(row.get("tags")),
  595                     }
  596                 )
! 597             except ValidationError as e:
! 598                 logger.warning(f"Invalid row {row_number}: {row}", exc_info=True)
! 599                 raise KilnInvalidImportFormat(
  600                     format_validation_error(e),
  601                     row_number=row_number,
  602                 ) from e

libs/server/kiln_server/run_api.py

Lines 73-82

  73         chain.append(parent)
  74         if parent.id is not None:
  75             visited.add(str(parent.id))
  76         current = parent
! 77     chain.reverse()
! 78     return chain, True
  79 
  80 
  81 def _collect_cascade_delete_runs(
  82     task: Task,

Lines 147-155

  147 
  148 
  149 def _count_user_messages(trace: list[Any] | None) -> int:
  150     if not trace:
! 151         return 0
  152     return sum(1 for m in trace if isinstance(m, dict) and m.get("role") == "user")
  153 
  154 
  155 def deep_update(

Lines 707-715

  707                     status_code=404,
  708                     detail=f"Parent run not found. ID: {request.parent_task_run_id}",
  709                 )
  710             if not parent_task_run.trace:
! 711                 raise HTTPException(
  712                     status_code=400,
  713                     detail="Parent run cannot be continued because it has no trace.",
  714                 )
  715             prior_trace = parent_task_run.trace


Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/web_ui/src/lib/ui/icons/info_circle_icon.svelte (1)

1-15: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add TypeScript script block per coding guidelines.

The component should include a <script lang="ts"> block for consistency with the project's TypeScript requirement, even if currently empty. This ensures the guideline is met and makes future prop additions straightforward. As per coding guidelines, all frontend code in app/web_ui/**/*.svelte should use TypeScript.

📝 Proposed addition
+<script lang="ts">
+  export let class: string | undefined = undefined;
+</script>
+
 <svg
-  class="w-full h-full"
+  class="w-full h-full {class}"
   fill="currentColor"
   viewBox="0 0 1024 1024"
   xmlns="http://www.w3.org/2000/svg"
   aria-hidden="true"
 >
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/web_ui/src/lib/ui/icons/info_circle_icon.svelte` around lines 1 - 15, The
SVG Svelte component file lacks the required TypeScript script block; add a
top-level <script lang="ts"> block in
app/web_ui/src/lib/ui/icons/info_circle_icon.svelte (even if empty) to satisfy
the project's TypeScript guideline for Svelte components and make future prop
typing easier; place it before the SVG markup and include any future exports
(e.g., export let ...) there when needed.
🧹 Nitpick comments (6)
app/desktop/studio_server/test_eval_api.py (1)

1570-1570: ⚡ Quick win

Assert readonly is forwarded to filter_runs.

This test now mocks filter_runs, but it doesn’t assert the expected call signature. Add an assertion to ensure runs_in_filter(..., readonly=True) actually calls mock_task.filter_runs(readonly=True).

Suggested patch
         result = runs_in_filter(mock_task, "tag::some_filter", readonly=True)

         # Verify the results
         assert len(result) == 2
         assert result[0].id == "run1"
         assert result[1].id == "run3"

+        mock_task.filter_runs.assert_called_once_with(readonly=True)
         # Verify the filter was called for each run
         assert mock_filter.call_count == 3
         mock_dataset_filter_from_id.assert_called_once_with("tag::some_filter")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/desktop/studio_server/test_eval_api.py` at line 1570, The test sets
mock_task.filter_runs.return_value but doesn't assert that the readonly flag is
forwarded; update the test to assert that runs_in_filter (the code under test)
calls mock_task.filter_runs with readonly=True by adding an assertion like
mock_task.filter_runs.assert_called_once_with(readonly=True) (or
assert_called_with(readonly=True) if multiple calls are expected) after the
invocation so the test verifies the call signature.
app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.ts (1)

38-65: 💤 Low value

Consider logging when ancestor count exceeds user turns.

The suffix-alignment logic at lines 55–60 handles the edge case where ancestors.length > total_turns (which would be a data inconsistency) by allowing offset to be negative. When offset + k is negative for small k, user_trace_indices[offset + k] returns undefined, and line 60 guards against that. While this is safe, silently ignoring the mismatch could make debugging harder if the data model is violated. Adding a console.warn when offset < 0 would surface this edge case without breaking functionality.

📝 Optional improvement to log edge case
   const total_turns = user_trace_indices.length
   if (total_turns === 0 || ancestors.length === 0) {
     return result
   }
   // Suffix-align the ancestor list against the user messages: the last
   // ancestor (the leaf) corresponds to the last user message.
   const offset = total_turns - ancestors.length
+  if (offset < 0) {
+    console.warn(
+      "More ancestors than user turns in trace — data inconsistency",
+      { ancestors_count: ancestors.length, user_turns: total_turns },
+    )
+  }
   for (let k = 0; k < ancestors.length; k++) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.ts
around lines 38 - 65, The function compute_forkable_run_ids silently tolerates
ancestors.length > total_turns by letting offset be negative; add a warning when
this happens to surface the data inconsistency: inside compute_forkable_run_ids,
after computing const offset = total_turns - ancestors.length, check if offset <
0 and emit a console.warn (or use an existing logger) that includes values for
total_turns, ancestors.length, and maybe a brief context (e.g., project/run IDs
if available) so developers can detect when ancestors and user_trace_indices are
misaligned; keep the existing guard for user_trace_indices[offset + k] so
behavior is unchanged.
app/web_ui/src/lib/ui/trace/trace.svelte (2)

37-37: 💤 Low value

messageExpanded is initialized once and not re-synced with trace length.

If the trace prop is replaced with a different-length array (e.g., page-level state swaps the trace without unmounting Trace), messageExpanded keeps its old length, which can desync bind:checked={messageExpanded[index]} against the new messages and persist stale expansion state across runs. In the multiturn page today the parent does run = null before navigating, so this component is currently torn down and remounted — but that's a fragile contract for a reusable UI component.

Consider re-syncing reactively when the trace identity changes:

♻️ Suggested change
-  // Track collapsed state for each message (true = expanded, false = collapsed)
-  let messageExpanded: boolean[] = trace.map(() => false)
+  // Track collapsed state for each message (true = expanded, false = collapsed).
+  // Re-sync whenever the trace identity or length changes so we don't carry
+  // stale expansion state across run navigations.
+  let messageExpanded: boolean[] = trace.map(() => false)
+  $: messageExpanded =
+    messageExpanded.length === trace.length
+      ? messageExpanded
+      : trace.map(() => false)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/web_ui/src/lib/ui/trace/trace.svelte` at line 37, The component
initializes messageExpanded once from trace but doesn't resync when trace is
replaced, causing stale expansion state; update the component to recompute
messageExpanded whenever the trace identity or its length changes (e.g., add a
reactive statement that sets messageExpanded = trace.map(() => false) when trace
changes or when trace.length differs) so bind:checked={messageExpanded[index]}
stays in sync; reference the existing messageExpanded and trace variables and
ensure any user expansion state is preserved or intentionally reset per the
desired behavior.

263-265: 💤 Low value

Cost formatting hardcodes USD and 6 decimals.

$${usage.cost.toFixed(6)} will render values like $0.000000 for zero-cost runs and assumes USD. If cost is ever expressed in a different currency or accumulates significant whole-dollar values, this display becomes misleading. Consider routing through a shared formatter (similar to formatLatency) so currency/precision live in one place and can be locale-aware later.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/web_ui/src/lib/ui/trace/trace.svelte` around lines 263 - 265, The UI
hardcodes USD and 6 decimal places when showing usage.cost (lines.push(`Cost:
$${usage.cost.toFixed(6)}`)); extract this into a shared formatter (e.g.,
formatCost) similar to formatLatency, and replace the direct string construction
with a call to formatCost(usage.cost, { currency?: 'USD' }) so currency and
precision are centralized and locale-aware; update trace.svelte to use
formatCost for all cost displays and add tests/usages where appropriate.
app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte (2)

478-500: 💤 Low value

Replace setTimeout(..., 100) with Svelte's tick() for the raw-data scroll.

The 100 ms delay is a magic number guessing how long Svelte takes to render the now-unhidden #multiturn_raw_data block. await tick() is the deterministic primitive for "wait until reactive DOM updates have flushed" and avoids both the flicker if rendering takes longer than 100 ms and the unnecessary wait if it takes less.

♻️ Suggested change
-  function multiturn_toggle_raw_data() {
-    multiturn_show_raw_data = !multiturn_show_raw_data
-    if (multiturn_show_raw_data) {
-      setTimeout(() => {
-        const rawDataElement = document.getElementById("multiturn_raw_data")
-        if (rawDataElement) {
-          rawDataElement.scrollIntoView({
-            behavior: "smooth",
-            block: "start",
-          })
-        }
-      }, 100)
-    }
-  }
+  async function multiturn_toggle_raw_data() {
+    multiturn_show_raw_data = !multiturn_show_raw_data
+    if (!multiturn_show_raw_data) return
+    await tick()
+    document.getElementById("multiturn_raw_data")?.scrollIntoView({
+      behavior: "smooth",
+      block: "start",
+    })
+  }

(Plus import { tick } from "svelte" at the top.)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte
around lines 478 - 500, The setTimeout(…, 100) in multiturn_toggle_raw_data is a
brittle race; replace it with Svelte's tick() to wait for DOM updates: import {
tick } from "svelte", then after setting multiturn_show_raw_data to true await
tick() and only then query document.getElementById("multiturn_raw_data") and
call scrollIntoView as currently done; keep the same scroll options and remove
the setTimeout wrapper.

548-555: 💤 Low value

ancestors_loaded_for_run_id is not reset alongside the other ancestor state.

The reset block at lines 541‑546 clears ancestors, ancestors_chain_broken, ancestors_load_failed and fork_target on every run_id change, but intentionally leaves ancestors_loaded_for_run_id unchanged so the trigger condition ancestors_loaded_for_run_id !== run_id can fire the fetch. That works today, but it's worth a one‑line comment near line 535 explaining why this single piece of state is deliberately excluded from the reset — otherwise a future change that "completes" the reset by adding ancestors_loaded_for_run_id = null would silently re‑fetch on every load even when task.turn_mode !== "multiturn".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte
around lines 548 - 555, Add a one-line comment next to the reset block that
clears ancestors, ancestors_chain_broken, ancestors_load_failed and fork_target
explaining that ancestors_loaded_for_run_id is intentionally NOT reset so the
reactive check (ancestors_loaded_for_run_id !== run_id) can trigger
load_ancestors(project_id, task_id, run_id) only when needed; mention that
clearing ancestors_loaded_for_run_id would cause unintended re-fetches even when
task.turn_mode !== "multiturn".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/web_ui/src/lib/api_schema.d.ts`:
- Around line 9804-9811: The ancestors property description is ambiguous about
ordering when chain_broken=true (it currently says root→leaf but also “from the
leaf back”); update the OpenAPI schema so the canonical ordering is explicit
(e.g., always root-to-leaf, and when chain_broken=true the array contains only
the intact suffix ordered root-to-leaf for that suffix) and mention that
chain_broken indicates a truncated prefix was dropped; modify the
TaskRunAncestors/ancestors description accordingly and then regenerate the
client schema by running the app/web_ui/src/lib/generate_schema.sh script so
app/web_ui/src/lib/api_schema.d.ts reflects the clarified wording.

In `@app/web_ui/src/lib/ui/run_sidebar.svelte`:
- Around line 201-217: In extract_subtask_references, avoid destructuring the
result of message.kiln_task_tool_data.split(":::") directly; first call const
parts = message.kiln_task_tool_data.split(":::") and validate parts.length >= 4
and that parts[0], parts[2], and parts[3] are non-empty before assigning
project_id, task_id, run_id (or pushing into references). This ensures malformed
kiln_task_tool_data does not produce undefined values or invalid API calls and
only well-formed triples are added to the references array.

In `@libs/server/kiln_server/run_api.py`:
- Around line 617-632: When task.turn_mode == TurnMode.multiturn, require
request.parent_task_run_id to be present: if request.parent_task_run_id is None
then raise HTTPException(status_code=400, detail="parent_task_run_id is required
for multiturn tasks.") before calling TaskRun.from_id_and_parent_path; keep the
existing validation that parent_task_run_id is invalid for non-multiturn tasks
(the current check that raises when task.turn_mode != TurnMode.multiturn) and
preserve the subsequent lookup using TaskRun.from_id_and_parent_path and raising
404 if not found.

---

Outside diff comments:
In `@app/web_ui/src/lib/ui/icons/info_circle_icon.svelte`:
- Around line 1-15: The SVG Svelte component file lacks the required TypeScript
script block; add a top-level <script lang="ts"> block in
app/web_ui/src/lib/ui/icons/info_circle_icon.svelte (even if empty) to satisfy
the project's TypeScript guideline for Svelte components and make future prop
typing easier; place it before the SVG markup and include any future exports
(e.g., export let ...) there when needed.

---

Nitpick comments:
In `@app/desktop/studio_server/test_eval_api.py`:
- Line 1570: The test sets mock_task.filter_runs.return_value but doesn't assert
that the readonly flag is forwarded; update the test to assert that
runs_in_filter (the code under test) calls mock_task.filter_runs with
readonly=True by adding an assertion like
mock_task.filter_runs.assert_called_once_with(readonly=True) (or
assert_called_with(readonly=True) if multiple calls are expected) after the
invocation so the test verifies the call signature.

In `@app/web_ui/src/lib/ui/trace/trace.svelte`:
- Line 37: The component initializes messageExpanded once from trace but doesn't
resync when trace is replaced, causing stale expansion state; update the
component to recompute messageExpanded whenever the trace identity or its length
changes (e.g., add a reactive statement that sets messageExpanded = trace.map(()
=> false) when trace changes or when trace.length differs) so
bind:checked={messageExpanded[index]} stays in sync; reference the existing
messageExpanded and trace variables and ensure any user expansion state is
preserved or intentionally reset per the desired behavior.
- Around line 263-265: The UI hardcodes USD and 6 decimal places when showing
usage.cost (lines.push(`Cost: $${usage.cost.toFixed(6)}`)); extract this into a
shared formatter (e.g., formatCost) similar to formatLatency, and replace the
direct string construction with a call to formatCost(usage.cost, { currency?:
'USD' }) so currency and precision are centralized and locale-aware; update
trace.svelte to use formatCost for all cost displays and add tests/usages where
appropriate.

In
`@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte:
- Around line 478-500: The setTimeout(…, 100) in multiturn_toggle_raw_data is a
brittle race; replace it with Svelte's tick() to wait for DOM updates: import {
tick } from "svelte", then after setting multiturn_show_raw_data to true await
tick() and only then query document.getElementById("multiturn_raw_data") and
call scrollIntoView as currently done; keep the same scroll options and remove
the setTimeout wrapper.
- Around line 548-555: Add a one-line comment next to the reset block that
clears ancestors, ancestors_chain_broken, ancestors_load_failed and fork_target
explaining that ancestors_loaded_for_run_id is intentionally NOT reset so the
reactive check (ancestors_loaded_for_run_id !== run_id) can trigger
load_ancestors(project_id, task_id, run_id) only when needed; mention that
clearing ancestors_loaded_for_run_id would cause unintended re-fetches even when
task.turn_mode !== "multiturn".

In
`@app/web_ui/src/routes/`(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.ts:
- Around line 38-65: The function compute_forkable_run_ids silently tolerates
ancestors.length > total_turns by letting offset be negative; add a warning when
this happens to surface the data inconsistency: inside compute_forkable_run_ids,
after computing const offset = total_turns - ancestors.length, check if offset <
0 and emit a console.warn (or use an existing logger) that includes values for
total_turns, ancestors.length, and maybe a brief context (e.g., project/run IDs
if available) so developers can detect when ancestors and user_trace_indices are
misaligned; keep the existing guard for user_trace_indices[offset + k] so
behavior is unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8909dd12-8a4b-4e3b-b18d-9c52e28e87b6

📥 Commits

Reviewing files that changed from the base of the PR and between ec15e20 and 7a9a7ae.

📒 Files selected for processing (48)
  • app/desktop/studio_server/agent_api.py
  • app/desktop/studio_server/eval_api.py
  • app/desktop/studio_server/finetune_api.py
  • app/desktop/studio_server/test_data_gen_api.py
  • app/desktop/studio_server/test_eval_api.py
  • app/desktop/studio_server/test_finetune_api.py
  • app/desktop/studio_server/test_repair_api.py
  • app/web_ui/src/lib/api_schema.d.ts
  • app/web_ui/src/lib/types.ts
  • app/web_ui/src/lib/ui/conversation/multiturn_composer.svelte
  • app/web_ui/src/lib/ui/conversation/multiturn_composer.test.ts
  • app/web_ui/src/lib/ui/icons/fork_icon.svelte
  • app/web_ui/src/lib/ui/icons/info_circle_icon.svelte
  • app/web_ui/src/lib/ui/run_sidebar.svelte
  • app/web_ui/src/lib/ui/trace/trace.svelte
  • app/web_ui/src/lib/ui/trace/trace.test.ts
  • app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/+page.svelte
  • app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.test.ts
  • app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/fork_helpers.ts
  • app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/multiturn_send.test.ts
  • app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/[run_id]/run/multiturn_send.ts
  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/+page.svelte
  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
  • app/web_ui/src/routes/(app)/run/+page.svelte
  • app/web_ui/src/routes/(app)/run/run.svelte
  • app/web_ui/src/routes/(app)/run/run_input_form.svelte
  • app/web_ui/src/routes/(app)/settings/edit_task/[project_id]/[task_id]/load_task_editor.svelte
  • app/web_ui/src/routes/(app)/specs/[project_id]/[task_id]/[spec_id]/[eval_id]/create_eval_config/get_eval_steps.test.ts
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.test.ts
  • libs/core/kiln_ai/adapters/eval/eval_runner.py
  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
  • libs/core/kiln_ai/adapters/fine_tune/test_dataset_formatter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • libs/core/kiln_ai/adapters/prompt_builders.py
  • libs/core/kiln_ai/cli/commands/test_package_project.py
  • libs/core/kiln_ai/datamodel/datamodel_enums.py
  • libs/core/kiln_ai/datamodel/dataset_split.py
  • libs/core/kiln_ai/datamodel/task.py
  • libs/core/kiln_ai/datamodel/test_dataset_split.py
  • libs/core/kiln_ai/datamodel/test_example_models.py
  • libs/core/kiln_ai/datamodel/test_models.py
  • libs/core/kiln_ai/datamodel/test_task.py
  • libs/core/kiln_ai/utils/test_dataset_import.py
  • libs/server/kiln_server/run_api.py
  • libs/server/kiln_server/task_api.py
  • libs/server/kiln_server/test_run_api.py
  • libs/server/kiln_server/test_task_api.py

Comment thread app/web_ui/src/lib/api_schema.d.ts Outdated
Comment thread app/web_ui/src/lib/ui/run_sidebar.svelte
Comment thread libs/server/kiln_server/run_api.py
@leonardmq
Copy link
Copy Markdown
Collaborator Author

After merging #1415 - we can ditch filter_runs and just call runs directly

leonardmq and others added 6 commits May 15, 2026 12:53
Add multiturn CSV import path with parsing, validation, and TaskRun chain
materialization. `import_csv` dispatches on `task.turn_mode`. Introduces
`ImportResult` and adds `imported_conversation_count` to the bulk upload
response. Covered by unit and endpoint tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make the dataset upload dialog turn-mode aware: multiturn tasks get a
tailored title, help block describing the `trace`/`tags` columns with an
inline example, an alternation/no-system-messages callout, and a link to
a new `static/sample_multiturn.csv` asset. Adds Vitest coverage for the
single-turn, multiturn, and null-task branches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@libs/core/kiln_ai/datamodel/task.py`:
- Around line 283-286: The docstring for filter_runs is inconsistent with
runs(): update the docstring to state that runs() returns leaf-only runs by
default and only includes intermediate (non-leaf) turns when
include_intermediate_runs=True; reference the runs() behavior and the
include_intermediate_runs parameter in the filter_runs docstring so it
accurately describes filtering semantics for multiturn tasks using filter_runs
and runs().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 85623d66-30e5-4884-b457-8c88c1e1776a

📥 Commits

Reviewing files that changed from the base of the PR and between ce9c09d and 42bb2f4.

📒 Files selected for processing (12)
  • app/desktop/studio_server/finetune_api.py
  • app/desktop/studio_server/test_finetune_api.py
  • app/web_ui/src/lib/api_schema.d.ts
  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • libs/core/kiln_ai/datamodel/task.py
  • libs/core/kiln_ai/datamodel/test_models.py
  • libs/core/kiln_ai/datamodel/test_task.py
  • libs/core/kiln_ai/utils/test_dataset_import.py
  • libs/server/kiln_server/run_api.py
  • libs/server/kiln_server/test_run_api.py
🚧 Files skipped from review as they are similar to previous changes (8)
  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
  • app/desktop/studio_server/test_finetune_api.py
  • app/web_ui/src/lib/api_schema.d.ts
  • app/desktop/studio_server/finetune_api.py
  • libs/core/kiln_ai/utils/test_dataset_import.py
  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
  • libs/server/kiln_server/run_api.py
  • libs/server/kiln_server/test_run_api.py

Comment thread libs/core/kiln_ai/datamodel/task.py Outdated
…-AI/Kiln into leonard/kil-658-multiturn-support-csv
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@libs/server/kiln_server/run_api.py`:
- Around line 439-445: The batch delete endpoint currently calls run.delete()
directly and must be changed to use the same cascade behavior as the single-run
delete: use the same helper _collect_cascade_delete_runs (the code that builds
runs_to_delete) to expand each target run into its cascade set and then delete
the resulting runs in that set rather than deleting individual runs directly;
update the POST /runs/delete handler (the batch deletion logic that currently
iterates runs and calls run.delete()) to call task_and_run_from_id for each id,
call _collect_cascade_delete_runs(task, run) to gather deletions, deduplicate
the combined set, and then delete each run in that unified runs_to_delete
collection so both endpoints behave identically.
- Around line 633-641: After loading the parent via
TaskRun.from_id_and_parent_path, ensure you reject continuation requests when
the parent has no trace: check parent_task_run.trace (referenced as prior_trace
in this block) and if it is None raise an HTTPException (4xx) with a clear
message including request.parent_task_run_id rather than proceeding; place this
check immediately after the parent_task_run is validated and before any use of
prior_trace or calling the adapter so a child cannot be persisted as a
continuation of a parent with no trace.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4509f62f-81ff-43d2-b916-5d5f4c348454

📥 Commits

Reviewing files that changed from the base of the PR and between 42bb2f4 and 4a15bc0.

📒 Files selected for processing (16)
  • app/desktop/studio_server/finetune_api.py
  • app/desktop/studio_server/test_data_gen_api.py
  • app/desktop/studio_server/test_finetune_api.py
  • app/desktop/studio_server/test_repair_api.py
  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
  • libs/core/kiln_ai/adapters/fine_tune/test_dataset_formatter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • libs/core/kiln_ai/cli/commands/test_package_project.py
  • libs/core/kiln_ai/datamodel/dataset_split.py
  • libs/core/kiln_ai/datamodel/task.py
  • libs/core/kiln_ai/datamodel/test_example_models.py
  • libs/core/kiln_ai/datamodel/test_models.py
  • libs/core/kiln_ai/datamodel/test_task.py
  • libs/core/kiln_ai/utils/test_dataset_import.py
  • libs/server/kiln_server/run_api.py
  • libs/server/kiln_server/test_run_api.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • libs/core/kiln_ai/datamodel/test_example_models.py
  • libs/core/kiln_ai/datamodel/test_models.py
  • app/desktop/studio_server/test_finetune_api.py
  • libs/core/kiln_ai/datamodel/dataset_split.py
  • app/desktop/studio_server/test_data_gen_api.py
  • app/desktop/studio_server/test_repair_api.py

Comment thread libs/server/kiln_server/run_api.py
Comment thread libs/server/kiln_server/run_api.py
leonardmq and others added 28 commits May 19, 2026 20:44
…-AI/Kiln into leonard/kil-658-multiturn-support-csv
…-AI/Kiln into leonard/kil-658-multiturn-support-csv
…ln-AI/Kiln into leonard/kil-658-multiturn-support-csv
…task-2

refactor: multiturn UI more like chat
…-AI/Kiln into leonard/kil-632-feat-multiturn-task
- Hide the redundant "Plaintext Input" header in the composer and use a
  "Write a message…" placeholder (RunInputForm gains label/placeholder/
  hide_label props).
- Disable composer autofocus so loading a turn no longer yanks the
  viewport to the textbox.
- Restyle the fork context strip as a status header (label + helper text)
  instead of a chat-bubble container.
- Make the multiturn view a chat layout on xl+: scrollable transcript
  with the composer pinned to the bottom, Options column kept at natural
  height. "Show Raw Data" stays under the composer; transcript has no
  horizontal scroll and uses the Assistant scrollbar styling.
- Scroll the transcript to the latest turn on load and after sending,
  re-pinning to the bottom across late markdown reflow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On send, append the user's message to the transcript optimistically so it
shows immediately, disable the composer input/Send while the run is in
flight, then redirect to the new run. The run-load reactive clears the
placeholder once the real trace arrives (and on_send_settled clears it on
error), avoiding a duplicate bubble. Append-mode only; fork mode just locks
the input and navigates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Tag multiturn turns as "manual_run" instead of "multiturn_run" so they
  aren't singled out from other manually-created runs.
- In the multiturn sidebar, title the usage section "Final Turn Usage" and
  drop the "Total" prefix from the Cost/Tokens/Latency line items (it's the
  final turn's run, not a whole-conversation aggregate). Single-turn is
  unchanged.
- Open Raw Data in a modal instead of expanding inline and reflowing the
  chat column.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lish

- Move the per-message usage/fork actions out of the chat bubbles into a
  dedicated row below each message that fades in on hover/focus (Claude
  desktop style). Space is reserved so hovering never reflows the
  conversation, and bubbles stay clean (no always-empty line).
- Label the actions ("Usage" with a new dollar icon component, "Fork");
  usage is right-aligned to the bubble's edge.
- Add a height prop to RunInputForm; the composer uses a shorter textarea.
- Fix assistant bubbles (esp. collapsed thinking/tool blocks) shrinking
  to content width instead of filling their allocated 70%.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Disable the Evals and Synthetic Data screens for multi-turn tasks
  (warning + no action button + no redirect into the flow), matching the
  fine-tuning behavior.
- Remove a stray zero-width space in a comment that failed the web UI
  no-irregular-whitespace lint check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…send

- Show the assistant ChatLoading indicator below the optimistic message
  while a turn is generating, kept up continuously across the redirect
  until the new run renders.
- Stop nulling run / flipping loading on send so the transcript stays
  mounted during navigation instead of flashing the full-page spinner;
  keep the composer disabled (busy) until the new run loads to prevent a
  double-send against the stale parent.
- Fix the cleared composer flagging "Required" after a successful send by
  keying the plaintext field so clear_input resets its validation state.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When the current task is multi-turn, the /run page now renders a chat-style
new-conversation layout (empty transcript + pinned MultiturnComposer +
Options sidebar) mirroring the in-run multiturn page, instead of the
Input/Run/Output layout. The first message starts a root run (optimistic
message + loading indicator) then redirects to the dataset run page.

send_multiturn gains an allow_root_turn flag so a parentless first turn is
permitted from /run; the in-chat composer leaves it false. Single-turn
tasks keep the existing layout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant