Stabilize preflight, runners, and ERF artifacts#55
Merged
jmsexton03 merged 299 commits intoMar 26, 2026
Conversation
…prompt-suite tests
…cs' into consolidate_all
…ng' into consolidate_all
…eractive_interfaces' into consolidate_all
- UnboundLocalError in _score_kb_relevance_batch (json name collision) - LLM call enforcement violation in architect.py - Build schema integration assertion (amr.n_cell) - write_policy Mock TypeError in input_writer refactor - 8 architect LLM planning test failures in create_plan_rag path - 52 type hint violations (test_standards quality gate) 20 test_oracle_benchmarks.py failures are known and intentional: oracle scope is Pele-only, system scope is now four-solver. These are not regressions.
- Override fires when L0 confidence < 0.15 and L2 finds a high-confidence canonical case-name match in a different solver family - Threshold configurable via config (level2_override_l0_threshold) - Observability fields added to architect history: level0_solver, level0_confidence, level2_override_applied, level2_override_solver, level2_override_case, level2_override_confidence - Default threshold 0.15 pending calibration via eval_level0_ab.py
- add JSONL normalization for validation, timing, convergence, and llm count fields - derive iteration and reviewer retry values from graph-state output with unavailable-reason siblings - extend benchmark runner unit tests for field presence, defaults, and compatibility
- Strengthen remora_04 wording with ocean-specific REMORA terminology. - Add integration assertions for solver-family and expected directory routing. - Keep squall-line and full oracle regression checks green in one gate.
- Extract visualization quantities/config from prompt at graph initialization. - Apply plotfile vars in Input Writer with requested->baseline->omit priority and solver-specific param lookup. - Add unit and integration coverage for extractor behavior and plotfile injection outcomes.
- Add new GraphState fields for intent extraction, clarification, reviewer routing, gate approvals, and sweep metadata - Define canonical defaults map for additive fields with list/dict-safe initialization semantics - Expand graph state initialization tests to functional style with defaults, serialization, and list isolation checks
…_layer_fix_stack_main_erf_test_visuals # Conflicts: # .dependencies.json # database/configs/base_amrex_config.py # database/configs/erf_config.py # database/configs/remora_config.py # database/faiss/cborg/build_session_manifest.json # database/faiss/cborg/erf_case_details/index.faiss # database/faiss/cborg/erf_case_details/index.pkl # database/faiss/cborg/erf_case_names/index.faiss # database/faiss/cborg/erf_case_names/index.pkl # database/faiss/cborg/erf_case_structure/index.faiss # database/faiss/cborg/erf_case_structure/index.pkl # database/faiss/cborg/erf_input_templates/index.faiss # database/faiss/cborg/erf_input_templates/index.pkl # database/faiss/cborg/level0/code_lineage_metadata.json # database/faiss/cborg/level0/cross_cutting_guidance_metadata.json # database/faiss/cborg/level0/physics_regimes_metadata.json # database/faiss/cborg/level0/solver_capabilities.faiss # database/faiss/cborg/level0/solver_capabilities_metadata.json # database/faiss/cborg/level2/erf_case_configuration_complexity.faiss # database/faiss/cborg/level2/erf_case_configuration_complexity_metadata.json # database/faiss/cborg/level2/erf_case_development_activity.faiss # database/faiss/cborg/level2/erf_case_development_activity_metadata.json # database/faiss/cborg/level2/erf_case_domain_models.faiss # database/faiss/cborg/level2/erf_case_domain_models_metadata.json # database/faiss/cborg/level2/erf_case_grid_specifications.faiss # database/faiss/cborg/level2/erf_case_grid_specifications_metadata.json # database/faiss/cborg/level2/erf_case_path_hierarchy.faiss # database/faiss/cborg/level2/erf_case_path_hierarchy_metadata.json # database/faiss/cborg/level2/erf_case_physics_parameters.faiss # database/faiss/cborg/level2/erf_case_physics_parameters_metadata.json # database/faiss/cborg/level2/erf_case_resource_requirements.faiss # database/faiss/cborg/level2/erf_case_resource_requirements_metadata.json # database/faiss/cborg/level2/erf_faiss_provenance.json # database/schemas/amrex_schema_bac24575.json # database/schemas/erf_complete_current.json # database/schemas/erf_complete_v1_amrexbac2457_erf1df4817.json # database/schemas/erf_complete_v1_amrexbac2457_erf5613ec3.json # database/schemas/erf_complete_v1_amrexbac2457_erfeb27171.json # src/main.py # src/models/graph_state_canonical.py # src/nodes/clarification_handler_node.py # src/nodes/input_writer_node.py # src/nodes/visualization_node.py # src/policy/gate_policy.py # src/services/input_writer.py # src/services/run_local.py # src/services/run_superfacility.py # src/services/viz_param_extractor.py # tests/unit/test_component5f_standards.py # tests/unit/test_graph_state_init.py # tests/unit/test_input_writer_viz_plotfile.py # tests/unit/test_run_local.py # tests/unit/test_run_superfacility.py # tests/unit/test_visualization_axis_defaults.py # tests/unit/test_viz_param_extractor.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
src/main.py+src/first_run.py) so issues are surfaced early with guided remediation instead of failing deeper in the flow.run_localandrun_superfacilitynow resolve executables through shared policy logic for consistent runner outcomes....erf5613ec3lineage (including renamed ERF schema family and updated provenance) so retrieval/schema assumptions align with expected ERF revisioning.reimport in input writer, and cleaned conflict-marker contamination indatabase/schemas/erf_complete_v1_amrexbac2457_erf5613ec3.json, which returned affected tests to green.Related or overlapping functionality / DRY guidance
src/services/solver_build_policy.pyis the central executable/build policy and is reused by bothsrc/services/run_local.pyandsrc/services/run_superfacility.py; avoid reintroducing duplicated fallback logic in runner-specific layers.src/models/visualization_intent.py,src/nodes/visualization_intent_node.py,src/services/viz_param_extractor.py, andsrc/services/visualization.py; preserve this shared normalization path.src/models/graph_state_canonical.pyplus associated unit/contract-sensitive tests.docs/adr/(one short file describing context, decision, consequences).Impact checklist
Tests run (CI runs:
pytest tests/unit,pytest tests/quality,pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full")pytest tests/unitpytest tests/qualitypytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full"pytest -q tests/unit->1733 passed, 31 skippedtests/unit/test_component5f_standards.pytests/unit/test_erf_cases.pytests/unit/test_input_writer_apply_plan.pytests/unit/test_input_writer_service_refactor.pytests/unit/test_input_writer_viz_plotfile.pyrequires_solver(...)implies repo + schema + default indices are available locally.-k pelec|erf|amrex|warpxto filter solver-specific tests.Examples:
Tests not run in CI (required if any)
tests/unit,tests/quality, andtests/integrationwithintegration_l1..l4+integration_fullmarkers via micromamba; list anything else not covered by CI here.API-key/manual tests (optional, include steps)
pytest -q tests/unitIntegration/E2E markers (optional, manual, may require API key)
tests/e2eruns only when selected by path or-m e2e.pytest -m "e2e and demo" tests/e2e/test_demo_smoke.pyNotes (optional)
amrex-codes/development...preflight_layer_fix_stack_main_erf_test_visuals(80ba696b...e91b1e61),119 files changed, 27974 insertions(+), 7804 deletions(-).61067ce6) was identified separately and is not executed as part of this markdown write.Labels (optional)
Maintainability note (optional)