Skip to content

Stabilize preflight, runners, and ERF artifacts#55

Merged
jmsexton03 merged 299 commits into
developmentfrom
preflight_layer_fix_stack_main_erf_test_visuals
Mar 26, 2026
Merged

Stabilize preflight, runners, and ERF artifacts#55
jmsexton03 merged 299 commits into
developmentfrom
preflight_layer_fix_stack_main_erf_test_visuals

Conversation

@jmsexton03

@jmsexton03 jmsexton03 commented Mar 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Startup readiness failures around missing/stale local ERF repo state were causing users to hit run-time dead-ends after launch; startup preflight is now wired into app startup (src/main.py + src/first_run.py) so issues are surfaced early with guided remediation instead of failing deeper in the flow.
  • ERF commit/dependency mismatch handling previously forced brittle retry loops or opaque stop conditions; the mismatch path now includes interactive continue/rebuild choices and rerun guidance (TTY/non-TTY aware), so users can either proceed intentionally or repair deterministically.
  • Executable discovery/build behavior diverged between local and superfacility runners, producing inconsistent solver selection and fallback behavior; solver/build policy was centralized and both run_local and run_superfacility now resolve executables through shared policy logic for consistent runner outcomes.
  • Preflight clone/setup diagnostics were previously hard to interpret during first-run setup; startup preflight now emits progress-oriented messaging and remediation hints (including SSH fallback pathing) to reduce setup ambiguity.
  • Visualization behavior had ambiguity around run-directory precedence and intent translation, which could produce outputs from the wrong run context; visualization now prefers runner run-directory resolution and routes through explicit visualization-intent flow across model/node/service layers.
  • ERF schema/index artifacts could drift from expected repo state; branch net changes refresh ERF schema/FAISS artifacts into the ...erf5613ec3 lineage (including renamed ERF schema family and updated provenance) so retrieval/schema assumptions align with expected ERF revisioning.
  • The branch includes a broad stacked history, but user-impacting net behavior converges on startup readiness + deterministic execution policy + visualization intent correctness; this summary is intentionally organized by those outcomes rather than by file churn.
  • Post-merge repair closed regressions introduced during merge resolution: restored missing test-contract imports, restored missing re import in input writer, and cleaned conflict-marker contamination in database/schemas/erf_complete_v1_amrexbac2457_erf5613ec3.json, which returned affected tests to green.

Related or overlapping functionality / DRY guidance

  • Shared logic convergence:
    • src/services/solver_build_policy.py is the central executable/build policy and is reused by both src/services/run_local.py and src/services/run_superfacility.py; avoid reintroducing duplicated fallback logic in runner-specific layers.
    • Visualization intent/extraction behavior is coordinated across src/models/visualization_intent.py, src/nodes/visualization_intent_node.py, src/services/viz_param_extractor.py, and src/services/visualization.py; preserve this shared normalization path.
    • Input-writer and plotfile behavior alignment is jointly validated by updated input-writer/visualization tests; keep writer-side and visualization-side run-dir/plotfile assumptions synchronized.
  • Contract alignment statement:
    • Graph/node updates in this branch were aligned against canonical state wiring in src/models/graph_state_canonical.py plus associated unit/contract-sensitive tests.
  • This embeds a significant architectural decision that needs an ADR.
    • If checked, add an ADR under docs/adr/ (one short file describing context, decision, consequences).

Impact checklist

  • fixes a bug or incorrect behavior
  • adds new capabilities
  • changes answers in the test suite to more than roundoff level
  • likely affects downstream users or results
  • includes docs updates (code/docs), if appropriate
  • none of the above

Tests run (CI runs: pytest tests/unit, pytest tests/quality, pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full")

  • tests/unit: pytest tests/unit
  • tests/quality: pytest tests/quality
  • integration ladder (CI): pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full"
  • other (list): targeted merge-repair regression slice
  • Output/summary:
    • pytest -q tests/unit -> 1733 passed, 31 skipped
    • Targeted regression slice on this branch validated merge-repair paths:
      • tests/unit/test_component5f_standards.py
      • tests/unit/test_erf_cases.py
      • tests/unit/test_input_writer_apply_plan.py
      • tests/unit/test_input_writer_service_refactor.py
      • tests/unit/test_input_writer_viz_plotfile.py
    • Interpretation: previously failing component5f/input-writer/ERF-case tests now pass after repair.
  • If tests require repos/schemas/indices or real services, note markers used.
  • requires_solver(...) implies repo + schema + default indices are available locally.
  • Use -k pelec|erf|amrex|warpx to filter solver-specific tests.

Examples:

pytest tests/unit --tb=short -q
pytest tests/quality --tb=short -q
pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full" --tb=short -q
pytest -m "e2e and demo" tests/e2e/test_demo_smoke.py --tb=short -q

Tests not run in CI (required if any)

  • CI runs tests/unit, tests/quality, and tests/integration with integration_l1..l4 + integration_full markers via micromamba; list anything else not covered by CI here.
  • None
  • tests/e2e
  • other (list): no additional non-CI suites were run locally in this pass
  • Reason for skip:
    • This pass focused on unit and targeted merge-repair validation; quality/integration ladders are CI-covered and were not re-run locally here.
  • Risk/mitigation:
    • Risk: regressions that appear only in quality/integration ladders would not be detected by this local pass.
    • Mitigation: rely on CI quality/integration matrix before merge completion.

API-key/manual tests (optional, include steps)

  • Env vars or credentials needed:
    • None for local unit/targeted runs recorded in this summary.
  • Manual steps/commands:
    • pytest -q tests/unit
  • Results or logs:
    • Unit suite passed as reported above.

Integration/E2E markers (optional, manual, may require API key)

  • Note: integration ladder runs in CI; tests/e2e runs only when selected by path or -m e2e.
  • e2e demo (workflow_dispatch): pytest -m "e2e and demo" tests/e2e/test_demo_smoke.py
  • other (markers: use_real_services, requires_repos, requires_schema):

Notes (optional)

  • Manual output / logs (short):
    • Net diff basis: amrex-codes/development...preflight_layer_fix_stack_main_erf_test_visuals (80ba696b...e91b1e61), 119 files changed, 27974 insertions(+), 7804 deletions(-).
    • Large stacked-history branch: this summary intentionally reports merged net user-impacting behavior changes rather than full commit-by-commit narration.
    • Post-merge repair commit restored test contracts/imports and cleaned ERF schema merge artifact; targeted regression tests and full unit run are green.
    • Follow-on operational note: docs/audit cleanup cherry-pick (61067ce6) was identified separately and is not executed as part of this markdown write.
  • Known limitations:
    • Quality/integration suites were not re-run locally in this pass; CI coverage is the primary downstream gate.

Labels (optional)

  • I will check auto-labels after submitting.
  • Labels are okay if slightly off.

Maintainability note (optional)

  • Prefer the most maintainable, DRY, and human-readable option; add duplication only with a clear reason.

- UnboundLocalError in _score_kb_relevance_batch (json name collision)
- LLM call enforcement violation in architect.py
- Build schema integration assertion (amr.n_cell)
- write_policy Mock TypeError in input_writer refactor
- 8 architect LLM planning test failures in create_plan_rag path
- 52 type hint violations (test_standards quality gate)

20 test_oracle_benchmarks.py failures are known and intentional:
oracle scope is Pele-only, system scope is now four-solver.
These are not regressions.
- Override fires when L0 confidence < 0.15 and L2 finds a
  high-confidence canonical case-name match in a different
  solver family
- Threshold configurable via config (level2_override_l0_threshold)
- Observability fields added to architect history:
  level0_solver, level0_confidence, level2_override_applied,
  level2_override_solver, level2_override_case,
  level2_override_confidence
- Default threshold 0.15 pending calibration via eval_level0_ab.py
- add JSONL normalization for validation, timing, convergence, and llm count fields

- derive iteration and reviewer retry values from graph-state output with unavailable-reason siblings

- extend benchmark runner unit tests for field presence, defaults, and compatibility
- Strengthen remora_04 wording with ocean-specific REMORA terminology.

- Add integration assertions for solver-family and expected directory routing.

- Keep squall-line and full oracle regression checks green in one gate.
- Extract visualization quantities/config from prompt at graph initialization.

- Apply plotfile vars in Input Writer with requested->baseline->omit priority and solver-specific param lookup.

- Add unit and integration coverage for extractor behavior and plotfile injection outcomes.
- Add new GraphState fields for intent extraction, clarification, reviewer routing, gate approvals, and sweep metadata

- Define canonical defaults map for additive fields with list/dict-safe initialization semantics

- Expand graph state initialization tests to functional style with defaults, serialization, and list isolation checks
…_layer_fix_stack_main_erf_test_visuals

# Conflicts:
#	.dependencies.json
#	database/configs/base_amrex_config.py
#	database/configs/erf_config.py
#	database/configs/remora_config.py
#	database/faiss/cborg/build_session_manifest.json
#	database/faiss/cborg/erf_case_details/index.faiss
#	database/faiss/cborg/erf_case_details/index.pkl
#	database/faiss/cborg/erf_case_names/index.faiss
#	database/faiss/cborg/erf_case_names/index.pkl
#	database/faiss/cborg/erf_case_structure/index.faiss
#	database/faiss/cborg/erf_case_structure/index.pkl
#	database/faiss/cborg/erf_input_templates/index.faiss
#	database/faiss/cborg/erf_input_templates/index.pkl
#	database/faiss/cborg/level0/code_lineage_metadata.json
#	database/faiss/cborg/level0/cross_cutting_guidance_metadata.json
#	database/faiss/cborg/level0/physics_regimes_metadata.json
#	database/faiss/cborg/level0/solver_capabilities.faiss
#	database/faiss/cborg/level0/solver_capabilities_metadata.json
#	database/faiss/cborg/level2/erf_case_configuration_complexity.faiss
#	database/faiss/cborg/level2/erf_case_configuration_complexity_metadata.json
#	database/faiss/cborg/level2/erf_case_development_activity.faiss
#	database/faiss/cborg/level2/erf_case_development_activity_metadata.json
#	database/faiss/cborg/level2/erf_case_domain_models.faiss
#	database/faiss/cborg/level2/erf_case_domain_models_metadata.json
#	database/faiss/cborg/level2/erf_case_grid_specifications.faiss
#	database/faiss/cborg/level2/erf_case_grid_specifications_metadata.json
#	database/faiss/cborg/level2/erf_case_path_hierarchy.faiss
#	database/faiss/cborg/level2/erf_case_path_hierarchy_metadata.json
#	database/faiss/cborg/level2/erf_case_physics_parameters.faiss
#	database/faiss/cborg/level2/erf_case_physics_parameters_metadata.json
#	database/faiss/cborg/level2/erf_case_resource_requirements.faiss
#	database/faiss/cborg/level2/erf_case_resource_requirements_metadata.json
#	database/faiss/cborg/level2/erf_faiss_provenance.json
#	database/schemas/amrex_schema_bac24575.json
#	database/schemas/erf_complete_current.json
#	database/schemas/erf_complete_v1_amrexbac2457_erf1df4817.json
#	database/schemas/erf_complete_v1_amrexbac2457_erf5613ec3.json
#	database/schemas/erf_complete_v1_amrexbac2457_erfeb27171.json
#	src/main.py
#	src/models/graph_state_canonical.py
#	src/nodes/clarification_handler_node.py
#	src/nodes/input_writer_node.py
#	src/nodes/visualization_node.py
#	src/policy/gate_policy.py
#	src/services/input_writer.py
#	src/services/run_local.py
#	src/services/run_superfacility.py
#	src/services/viz_param_extractor.py
#	tests/unit/test_component5f_standards.py
#	tests/unit/test_graph_state_init.py
#	tests/unit/test_input_writer_viz_plotfile.py
#	tests/unit/test_run_local.py
#	tests/unit/test_run_superfacility.py
#	tests/unit/test_visualization_axis_defaults.py
#	tests/unit/test_viz_param_extractor.py
@jmsexton03 jmsexton03 changed the title Prefer runner run directory in visualization node Stabilize preflight, runners, and ERF artifacts Mar 26, 2026
@jmsexton03 jmsexton03 merged commit a1bc802 into development Mar 26, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant