Skip to content

qa trial: outer flow ignores 'Seed aborted on blueprint drift' — continues with empty DB #1077

@manwithacat

Description

@manwithacat

Summary

dazzle qa trial --fresh-db continues running the LLM scenario even when _seed_demo_data_for_trial hard-aborts on blueprint validation errors. The trial then explores an empty DB and the LLM persona's verdict becomes misleading ("cannot recommend, no data") rather than reflecting framework UX. #826 fixed the secondary 400-flood timeout but left the outer flow unaware that no rows were seeded.

Reproduction (cycle 135 of the autonomous improve loop)

The examples/ops_dashboard Alert entity field was renamed acknowledged (bool) → status (3-state enum) in DSL #999, but dsl/seeds/demo_data/blueprint.json was not updated. Run:

cd examples/ops_dashboard
dazzle qa trial --scenario trend_spike_detection --fresh-db

Observed:

  1. _reset_db_for_trial truncates the DB (correct).
  2. _seed_demo_data_for_trial calls verify_blueprint → 1 error (Alert.acknowledged is unknown) → prints "Seed aborted:..." to stderr and returns from the seed helper.
  3. The outer trial loop has no awareness of this, so it continues to launch_interaction_server, authenticates the persona, and runs the LLM agent for 41 steps against a 0-row DB.
  4. Final verdict: "I'm disappointed that I cannot answer the VP's urgent question about alert spikes…" — a verdict about data emptiness, not framework UX.

dazzle db status post-trial confirms 0 rows in System/Alert/Integration/DeployHistory. So the trial spent ~145k tokens producing zero useful signal.

Why this matters

The trial loop is the most expensive cycle in the /improve autonomous loop (50–100k tokens). Each empty-data verdict pollutes the friction backlog with non-actionable rows (TR-22 and similar) and burns time on root-causing what is actually a stale-blueprint issue. The diagnostic chain is also long — the stderr "Seed aborted" message is buried among httpx DEBUG logs (~68KB of output for a trial), so the misleading verdict is usually noticed first.

Suggested fix

Promote the seed step from "best-effort + continue" to "preflight + abort the trial." Either:

  1. At qa command level: call verify_blueprint before launching the interaction server. On error: print the abort message and raise typer.Exit(code=3). Saves the server-launch + LLM tokens.
  2. At seed-helper level: change _seed_demo_data_for_trial to return a typed result (SeedOutcome.aborted_on_drift vs succeeded) and have the outer trial flow inspect it. Skip the LLM run + write a placeholder verdict if seed aborted.

Option 1 is smaller and clearer. It also keeps the failure mode point-rich (dazzle demo verify is the canonical follow-up command).

Acceptance

  • Stale-blueprint trial against any example app exits non-zero before the LLM agent runs, prints the abort message + first 5 errors, and consumes <1s.
  • Regression test alongside tests/unit/test_qa_trial.py::TestSeedPreflightAndCircuitBreaker.

Related

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions