Summary
dazzle qa trial --fresh-db continues running the LLM scenario even when _seed_demo_data_for_trial hard-aborts on blueprint validation errors. The trial then explores an empty DB and the LLM persona's verdict becomes misleading ("cannot recommend, no data") rather than reflecting framework UX. #826 fixed the secondary 400-flood timeout but left the outer flow unaware that no rows were seeded.
Reproduction (cycle 135 of the autonomous improve loop)
The examples/ops_dashboard Alert entity field was renamed acknowledged (bool) → status (3-state enum) in DSL #999, but dsl/seeds/demo_data/blueprint.json was not updated. Run:
cd examples/ops_dashboard
dazzle qa trial --scenario trend_spike_detection --fresh-db
Observed:
_reset_db_for_trial truncates the DB (correct).
_seed_demo_data_for_trial calls verify_blueprint → 1 error (Alert.acknowledged is unknown) → prints "Seed aborted:..." to stderr and returns from the seed helper.
- The outer trial loop has no awareness of this, so it continues to
launch_interaction_server, authenticates the persona, and runs the LLM agent for 41 steps against a 0-row DB.
- Final verdict: "I'm disappointed that I cannot answer the VP's urgent question about alert spikes…" — a verdict about data emptiness, not framework UX.
dazzle db status post-trial confirms 0 rows in System/Alert/Integration/DeployHistory. So the trial spent ~145k tokens producing zero useful signal.
Why this matters
The trial loop is the most expensive cycle in the /improve autonomous loop (50–100k tokens). Each empty-data verdict pollutes the friction backlog with non-actionable rows (TR-22 and similar) and burns time on root-causing what is actually a stale-blueprint issue. The diagnostic chain is also long — the stderr "Seed aborted" message is buried among httpx DEBUG logs (~68KB of output for a trial), so the misleading verdict is usually noticed first.
Suggested fix
Promote the seed step from "best-effort + continue" to "preflight + abort the trial." Either:
- At qa command level: call
verify_blueprint before launching the interaction server. On error: print the abort message and raise typer.Exit(code=3). Saves the server-launch + LLM tokens.
- At seed-helper level: change
_seed_demo_data_for_trial to return a typed result (SeedOutcome.aborted_on_drift vs succeeded) and have the outer trial flow inspect it. Skip the LLM run + write a placeholder verdict if seed aborted.
Option 1 is smaller and clearer. It also keeps the failure mode point-rich (dazzle demo verify is the canonical follow-up command).
Acceptance
- Stale-blueprint trial against any example app exits non-zero before the LLM agent runs, prints the abort message + first 5 errors, and consumes <1s.
- Regression test alongside
tests/unit/test_qa_trial.py::TestSeedPreflightAndCircuitBreaker.
Related
Summary
dazzle qa trial --fresh-dbcontinues running the LLM scenario even when_seed_demo_data_for_trialhard-aborts on blueprint validation errors. The trial then explores an empty DB and the LLM persona's verdict becomes misleading ("cannot recommend, no data") rather than reflecting framework UX. #826 fixed the secondary 400-flood timeout but left the outer flow unaware that no rows were seeded.Reproduction (cycle 135 of the autonomous improve loop)
The
examples/ops_dashboardAlert entity field was renamedacknowledged(bool) →status(3-state enum) in DSL #999, butdsl/seeds/demo_data/blueprint.jsonwas not updated. Run:cd examples/ops_dashboard dazzle qa trial --scenario trend_spike_detection --fresh-dbObserved:
_reset_db_for_trialtruncates the DB (correct)._seed_demo_data_for_trialcallsverify_blueprint→ 1 error (Alert.acknowledgedis unknown) → prints "Seed aborted:..." to stderr andreturns from the seed helper.launch_interaction_server, authenticates the persona, and runs the LLM agent for 41 steps against a 0-row DB.dazzle db statuspost-trial confirms 0 rows in System/Alert/Integration/DeployHistory. So the trial spent ~145k tokens producing zero useful signal.Why this matters
The trial loop is the most expensive cycle in the
/improveautonomous loop (50–100k tokens). Each empty-data verdict pollutes the friction backlog with non-actionable rows (TR-22 and similar) and burns time on root-causing what is actually a stale-blueprint issue. The diagnostic chain is also long — the stderr "Seed aborted" message is buried among httpx DEBUG logs (~68KB of output for a trial), so the misleading verdict is usually noticed first.Suggested fix
Promote the seed step from "best-effort + continue" to "preflight + abort the trial." Either:
verify_blueprintbefore launching the interaction server. On error: print the abort message andraise typer.Exit(code=3). Saves the server-launch + LLM tokens._seed_demo_data_for_trialto return a typed result (SeedOutcome.aborted_on_driftvssucceeded) and have the outer trial flow inspect it. Skip the LLM run + write a placeholder verdict if seed aborted.Option 1 is smaller and clearer. It also keeps the failure mode point-rich (
dazzle demo verifyis the canonical follow-up command).Acceptance
tests/unit/test_qa_trial.py::TestSeedPreflightAndCircuitBreaker.Related