Stabilize preflight, runners, and ERF artifacts by jmsexton03 · Pull Request #55 · AMReX-Codes/amrex-agent

jmsexton03 · 2026-03-19T21:48:11Z

Summary

Startup readiness failures around missing/stale local ERF repo state were causing users to hit run-time dead-ends after launch; startup preflight is now wired into app startup (src/main.py + src/first_run.py) so issues are surfaced early with guided remediation instead of failing deeper in the flow.
ERF commit/dependency mismatch handling previously forced brittle retry loops or opaque stop conditions; the mismatch path now includes interactive continue/rebuild choices and rerun guidance (TTY/non-TTY aware), so users can either proceed intentionally or repair deterministically.
Executable discovery/build behavior diverged between local and superfacility runners, producing inconsistent solver selection and fallback behavior; solver/build policy was centralized and both run_local and run_superfacility now resolve executables through shared policy logic for consistent runner outcomes.
Preflight clone/setup diagnostics were previously hard to interpret during first-run setup; startup preflight now emits progress-oriented messaging and remediation hints (including SSH fallback pathing) to reduce setup ambiguity.
Visualization behavior had ambiguity around run-directory precedence and intent translation, which could produce outputs from the wrong run context; visualization now prefers runner run-directory resolution and routes through explicit visualization-intent flow across model/node/service layers.
ERF schema/index artifacts could drift from expected repo state; branch net changes refresh ERF schema/FAISS artifacts into the ...erf5613ec3 lineage (including renamed ERF schema family and updated provenance) so retrieval/schema assumptions align with expected ERF revisioning.
The branch includes a broad stacked history, but user-impacting net behavior converges on startup readiness + deterministic execution policy + visualization intent correctness; this summary is intentionally organized by those outcomes rather than by file churn.
Post-merge repair closed regressions introduced during merge resolution: restored missing test-contract imports, restored missing re import in input writer, and cleaned conflict-marker contamination in database/schemas/erf_complete_v1_amrexbac2457_erf5613ec3.json, which returned affected tests to green.

Related or overlapping functionality / DRY guidance

Shared logic convergence:
- src/services/solver_build_policy.py is the central executable/build policy and is reused by both src/services/run_local.py and src/services/run_superfacility.py; avoid reintroducing duplicated fallback logic in runner-specific layers.
- Visualization intent/extraction behavior is coordinated across src/models/visualization_intent.py, src/nodes/visualization_intent_node.py, src/services/viz_param_extractor.py, and src/services/visualization.py; preserve this shared normalization path.
- Input-writer and plotfile behavior alignment is jointly validated by updated input-writer/visualization tests; keep writer-side and visualization-side run-dir/plotfile assumptions synchronized.
Contract alignment statement:
- Graph/node updates in this branch were aligned against canonical state wiring in src/models/graph_state_canonical.py plus associated unit/contract-sensitive tests.
This embeds a significant architectural decision that needs an ADR.
- If checked, add an ADR under docs/adr/ (one short file describing context, decision, consequences).

Impact checklist

fixes a bug or incorrect behavior
adds new capabilities
changes answers in the test suite to more than roundoff level
likely affects downstream users or results
includes docs updates (code/docs), if appropriate
none of the above

Tests run (CI runs: `pytest tests/unit`, `pytest tests/quality`, `pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full"`)

tests/unit: pytest tests/unit
tests/quality: pytest tests/quality
integration ladder (CI): pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full"
other (list): targeted merge-repair regression slice
Output/summary:
- pytest -q tests/unit -> 1733 passed, 31 skipped
- Targeted regression slice on this branch validated merge-repair paths:
  - tests/unit/test_component5f_standards.py
  - tests/unit/test_erf_cases.py
  - tests/unit/test_input_writer_apply_plan.py
  - tests/unit/test_input_writer_service_refactor.py
  - tests/unit/test_input_writer_viz_plotfile.py
- Interpretation: previously failing component5f/input-writer/ERF-case tests now pass after repair.
If tests require repos/schemas/indices or real services, note markers used.
requires_solver(...) implies repo + schema + default indices are available locally.
Use -k pelec|erf|amrex|warpx to filter solver-specific tests.

Examples:

pytest tests/unit --tb=short -q
pytest tests/quality --tb=short -q
pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full" --tb=short -q
pytest -m "e2e and demo" tests/e2e/test_demo_smoke.py --tb=short -q

Tests not run in CI (required if any)

CI runs tests/unit, tests/quality, and tests/integration with integration_l1..l4 + integration_full markers via micromamba; list anything else not covered by CI here.
None
tests/e2e
other (list): no additional non-CI suites were run locally in this pass
Reason for skip:
- This pass focused on unit and targeted merge-repair validation; quality/integration ladders are CI-covered and were not re-run locally here.
Risk/mitigation:
- Risk: regressions that appear only in quality/integration ladders would not be detected by this local pass.
- Mitigation: rely on CI quality/integration matrix before merge completion.

API-key/manual tests (optional, include steps)

Env vars or credentials needed:
- None for local unit/targeted runs recorded in this summary.
Manual steps/commands:
- pytest -q tests/unit
Results or logs:
- Unit suite passed as reported above.

Integration/E2E markers (optional, manual, may require API key)

Note: integration ladder runs in CI; tests/e2e runs only when selected by path or -m e2e.
e2e demo (workflow_dispatch): pytest -m "e2e and demo" tests/e2e/test_demo_smoke.py
other (markers: use_real_services, requires_repos, requires_schema):

Notes (optional)

Manual output / logs (short):
- Net diff basis: amrex-codes/development...preflight_layer_fix_stack_main_erf_test_visuals (80ba696b...e91b1e61), 119 files changed, 27974 insertions(+), 7804 deletions(-).
- Large stacked-history branch: this summary intentionally reports merged net user-impacting behavior changes rather than full commit-by-commit narration.
- Post-merge repair commit restored test contracts/imports and cleaned ERF schema merge artifact; targeted regression tests and full unit run are green.
- Follow-on operational note: docs/audit cleanup cherry-pick (61067ce6) was identified separately and is not executed as part of this markdown write.
Known limitations:
- Quality/integration suites were not re-run locally in this pass; CI coverage is the primary downstream gate.

Labels (optional)

I will check auto-labels after submitting.
Labels are okay if slightly off.

Maintainability note (optional)

Prefer the most maintainable, DRY, and human-readable option; add duplication only with a clear reason.

…prompt-suite tests

…cs' into consolidate_all

…ng' into consolidate_all

…eractive_interfaces' into consolidate_all

- UnboundLocalError in _score_kb_relevance_batch (json name collision) - LLM call enforcement violation in architect.py - Build schema integration assertion (amr.n_cell) - write_policy Mock TypeError in input_writer refactor - 8 architect LLM planning test failures in create_plan_rag path - 52 type hint violations (test_standards quality gate) 20 test_oracle_benchmarks.py failures are known and intentional: oracle scope is Pele-only, system scope is now four-solver. These are not regressions.

- Override fires when L0 confidence < 0.15 and L2 finds a high-confidence canonical case-name match in a different solver family - Threshold configurable via config (level2_override_l0_threshold) - Observability fields added to architect history: level0_solver, level0_confidence, level2_override_applied, level2_override_solver, level2_override_case, level2_override_confidence - Default threshold 0.15 pending calibration via eval_level0_ab.py

- add JSONL normalization for validation, timing, convergence, and llm count fields - derive iteration and reviewer retry values from graph-state output with unavailable-reason siblings - extend benchmark runner unit tests for field presence, defaults, and compatibility

- Strengthen remora_04 wording with ocean-specific REMORA terminology. - Add integration assertions for solver-family and expected directory routing. - Keep squall-line and full oracle regression checks green in one gate.

- Extract visualization quantities/config from prompt at graph initialization. - Apply plotfile vars in Input Writer with requested->baseline->omit priority and solver-specific param lookup. - Add unit and integration coverage for extractor behavior and plotfile injection outcomes.

- Add new GraphState fields for intent extraction, clarification, reviewer routing, gate approvals, and sweep metadata - Define canonical defaults map for additive fields with list/dict-safe initialization semantics - Expand graph state initialization tests to functional style with defaults, serialization, and list isolation checks

…_layer_fix_stack_main_erf_test_visuals # Conflicts: # .dependencies.json # database/configs/base_amrex_config.py # database/configs/erf_config.py # database/configs/remora_config.py # database/faiss/cborg/build_session_manifest.json # database/faiss/cborg/erf_case_details/index.faiss # database/faiss/cborg/erf_case_details/index.pkl # database/faiss/cborg/erf_case_names/index.faiss # database/faiss/cborg/erf_case_names/index.pkl # database/faiss/cborg/erf_case_structure/index.faiss # database/faiss/cborg/erf_case_structure/index.pkl # database/faiss/cborg/erf_input_templates/index.faiss # database/faiss/cborg/erf_input_templates/index.pkl # database/faiss/cborg/level0/code_lineage_metadata.json # database/faiss/cborg/level0/cross_cutting_guidance_metadata.json # database/faiss/cborg/level0/physics_regimes_metadata.json # database/faiss/cborg/level0/solver_capabilities.faiss # database/faiss/cborg/level0/solver_capabilities_metadata.json # database/faiss/cborg/level2/erf_case_configuration_complexity.faiss # database/faiss/cborg/level2/erf_case_configuration_complexity_metadata.json # database/faiss/cborg/level2/erf_case_development_activity.faiss # database/faiss/cborg/level2/erf_case_development_activity_metadata.json # database/faiss/cborg/level2/erf_case_domain_models.faiss # database/faiss/cborg/level2/erf_case_domain_models_metadata.json # database/faiss/cborg/level2/erf_case_grid_specifications.faiss # database/faiss/cborg/level2/erf_case_grid_specifications_metadata.json # database/faiss/cborg/level2/erf_case_path_hierarchy.faiss # database/faiss/cborg/level2/erf_case_path_hierarchy_metadata.json # database/faiss/cborg/level2/erf_case_physics_parameters.faiss # database/faiss/cborg/level2/erf_case_physics_parameters_metadata.json # database/faiss/cborg/level2/erf_case_resource_requirements.faiss # database/faiss/cborg/level2/erf_case_resource_requirements_metadata.json # database/faiss/cborg/level2/erf_faiss_provenance.json # database/schemas/amrex_schema_bac24575.json # database/schemas/erf_complete_current.json # database/schemas/erf_complete_v1_amrexbac2457_erf1df4817.json # database/schemas/erf_complete_v1_amrexbac2457_erf5613ec3.json # database/schemas/erf_complete_v1_amrexbac2457_erfeb27171.json # src/main.py # src/models/graph_state_canonical.py # src/nodes/clarification_handler_node.py # src/nodes/input_writer_node.py # src/nodes/visualization_node.py # src/policy/gate_policy.py # src/services/input_writer.py # src/services/run_local.py # src/services/run_superfacility.py # src/services/viz_param_extractor.py # tests/unit/test_component5f_standards.py # tests/unit/test_graph_state_init.py # tests/unit/test_input_writer_viz_plotfile.py # tests/unit/test_run_local.py # tests/unit/test_run_superfacility.py # tests/unit/test_visualization_axis_defaults.py # tests/unit/test_viz_param_extractor.py

jmsexton03 added 30 commits February 24, 2026 13:35

Add Academy step actions and optional response/rationale envelope

af5b93d

Adjust Academy smoke query to grid refinement and mechanism guidance

f4c571e

Add workflow architecture paper figure generator utility

e104acb

Update gitignore for local venv and Word document artifacts

72a8597

Add AISAC markdown responses and contract unit tests

ac50e90

Unify interactive invoke path with critical gating and skill runtime

710a45c

Fix MCP caller_action trust and Academy error wrapping

0758f45

Refactor level0 routing to config-driven metadata with deterministic …

c82dcd0

…prompt-suite tests

Add REMORA priority-case baseline boost and handoff

df33bd0

Rename Level-2 note and link from integration guide

dd3f703

Fix FAISS embedding wrapper to implement Embeddings interface

9c88737

Use ERF composed schema pattern for input-writer model resolution

5330639

Forgot to force database files

b7f3b86

Merge remote-tracking branch 'remotes/amrex-codes/add_benchmark_metri…

08a5773

…cs' into consolidate_all

Merge remote-tracking branch 'remotes/amrex-codes/extend_metrics_gati…

03fc473

…ng' into consolidate_all

Merge remote-tracking branch 'remotes/amrex-codes/maintainability_int…

2b10c44

…eractive_interfaces' into consolidate_all

Improve oracle routing coverage

193392e

- Strengthen remora_04 wording with ocean-specific REMORA terminology. - Add integration assertions for solver-family and expected directory routing. - Keep squall-line and full oracle regression checks green in one gate.

Merge branch 'b3_1_benchmark_runner_fields' into consolidate_all

b3bd273

Merge branch 'fix/remora-erf-level0-discrimination' into consolidate_all

97ac0f7

Merge branch 'fix/input-writer-plotfile-vars' into consolidate_all

510d00e

Add sweep schema models and validation tests

89f4d5a

Add intent extraction node with config flag and tests

70de973

Wire gate approval records into state

3794368

Merge branch 'wave2/gate-approval-wiring' into consolidate_all

30cd49e

Merge branch 'wave2/intent-extraction-node' into consolidate_all

5e19758

jmsexton03 added 15 commits March 18, 2026 09:45

Improve preflight mismatch remediation guidance and non-TTY hint

0053f42

Improve interactive preflight UX for ERF mismatch and repo selection

417e194

Refactor executable discovery policy and runner fallbacks

65d4337

Prefer case-derived solver over default in runners

c158109

Refactor solver compile flow to use config policy

f2298df

Stream and log preflight clone progress with SSH fallback

8e05375

Resolve local submit executable via solver build policy

69f2faa

Use solver policy for superfacility executable selection

45f1793

Add interactive development+rebuild path for ERF commit mismatch

8a45cf0

Propagate interactive repo selection and rerun preflight checks

a701e67

Add tracked ERF first-run auto-setup matrix script

86b5092

Add continue+rebuild option and progress logs for ERF mismatch

9e6ab1d

Make rebuild check optional and avoid repeated interactive rebuild loops

5517f94

Waive same-run ERF mismatch after successful rebuild

24dcbdc

Prefer runner run directory in visualization node

a7aac16

github-actions Bot added docs tests infra demo code data/indices labels Mar 19, 2026

jmsexton03 added 3 commits March 26, 2026 13:12

restore post-merge test contracts and clean ERF schema artifact

e91b1e6

Remove scripts and local outputs

5c3d3f1

jmsexton03 changed the title ~~Prefer runner run directory in visualization node~~ Stabilize preflight, runners, and ERF artifacts Mar 26, 2026

jmsexton03 added 4 commits March 26, 2026 13:43

Added additional skips for spurious tests

9b0d7e0

Added preflight stubs to CLI integration tests

f40a532

Implement code-gated viz mapping with clarification disambiguation

6e70c7d

Gate solver-dependent viz tests and assert catalog hard-fail

c33f5b2

jmsexton03 merged commit a1bc802 into development Mar 26, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize preflight, runners, and ERF artifacts#55

Stabilize preflight, runners, and ERF artifacts#55
jmsexton03 merged 299 commits into
developmentfrom
preflight_layer_fix_stack_main_erf_test_visuals

jmsexton03 commented Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jmsexton03 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related or overlapping functionality / DRY guidance

Impact checklist

Tests run (CI runs: pytest tests/unit, pytest tests/quality, pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full")

Tests not run in CI (required if any)

API-key/manual tests (optional, include steps)

Integration/E2E markers (optional, manual, may require API key)

Notes (optional)

Labels (optional)

Maintainability note (optional)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jmsexton03 commented Mar 19, 2026 •

edited

Loading

Tests run (CI runs: `pytest tests/unit`, `pytest tests/quality`, `pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full"`)