Skip to content

Catchup 19: Fix tooling per solver#53

Merged
jmsexton03 merged 244 commits into
developmentfrom
catchup_19_fix_tooling_per_solver
Mar 26, 2026
Merged

Catchup 19: Fix tooling per solver#53
jmsexton03 merged 244 commits into
developmentfrom
catchup_19_fix_tooling_per_solver

Conversation

@jmsexton03

Copy link
Copy Markdown
Collaborator

Summary

  • Catchup context: slice 19 on branch catchup_19_fix_tooling_per_solver.
  • Ordered split wave objective: preserve parity/paper cutoff lineage by landing slices in ascending order.
  • What was added/changed in this slice:
    • updated derive wave per-solver summary labels from target case path
    • align hierarchical baseline and FAISS scoring expectations with tuned behavior
    • updated revert "fix: align hierarchical baseline candidate pool and FAISS semantic scoring"
    • align hierarchical baseline candidate pool and FAISS semantic scoring
    • add --strategy flag to benchmark runner and update wave script for single-strategy-per-process isolation
    • add wave orchestration script with explicit human tuning checkpoint
    • updated document non-ERF sanity placeholder behavior in runner
    • updated isolate benchmark strategy output roots and nest input writer runs
  • Workstreams (topic-level):
    • benchmark workflow/retry behavior updates
    • paper/reproduction workflow scripting
    • node orchestration flow updates
    • service contract/behavior updates
    • unit regression coverage updates
    • retry/timeout handling adjustments
    • FAISS/manifest compatibility handling
    • paper/wave workflow-path handling
    • ERF execution/fallback behavior handling
  • Slice metadata:
    • Commit range: 2c2d60017707..1dad54f399ff (source apply_stack_slice_119 -> canonical fix_stack_main)
    • Findings profile (P0/P1/P2/P3): 1/2/0/2 (total 5)
  • Fix implementation note: findings are reconciled/resolved on canonical stacked branch fix_stack_main at 73f37cf9e86d.

Related or overlapping functionality / DRY guidance

  • Overlap is expected with stacked fix lineage (fix_stack_main); avoid duplicating logic that is already hardened in shared services/nodes.
  • Keep node/state contract compatibility aligned with src/models/graph_state_canonical.py and tests/contracts/* when touching shared flows.
  • Evidence artifacts for cross-slice decisions: artifacts/integration/findings_reconciliation.json and artifacts/integration/fix_branch_remap_impact.md.
  • This embeds a significant architectural decision that needs an ADR.
    • If checked, add an ADR under docs/adr/ (one short file describing context, decision, consequences).

Impact checklist

  • fixes a bug or incorrect behavior
  • adds new capabilities
  • changes answers in the test suite to more than roundoff level
  • likely affects downstream users or results
  • includes docs updates (code/docs), if appropriate
  • none of the above

Tests run (CI runs: pytest tests/unit, pytest tests/quality, pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full")

  • tests/unit: pytest tests/unit
  • tests/quality: pytest tests/quality
  • integration ladder (CI): pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full"
  • other (list): final closure validation on canonical fix_stack_main
  • Output/summary:
    • per-slice branch-head run in this phase: not executed
    • canonical closure branch used for validation: fix_stack_main (73f37cf9e86d)
    • canonical unit: 1663 passed, 31 skipped, 3 warnings (coverage 56.82%)
    • canonical full: 1813 passed, 78 skipped, 10 xfailed, 11 warnings (coverage 58.63%)
    • canonical quality: 20 passed, 1 skipped, 5 warnings
    • canonical integration ladder: 47 passed, 36 skipped, 92 deselected, 1 xfailed, 7 warnings
    • canonical junit evidence: artifacts/integration/reports/fix_stack_main_20260318_034847/unit.junit.xml, artifacts/integration/reports/fix_stack_main_20260318_034847/full.junit.xml
  • If tests require repos/schemas/indices or real services, note markers used.
  • requires_solver(...) implies repo + schema + default indices are available locally.
  • Use -k pelec|erf|amrex|warpx to filter solver-specific tests.

Tests not run in CI (required if any)

  • CI runs tests/unit, tests/quality, and tests/integration with integration_l1..l4 + integration_full markers via micromamba; list anything else not covered by CI here.
  • None
  • tests/e2e
  • other (list): per-slice branch-local test reruns
  • Reason for skip: this phase validated closure on canonical stacked branch (fix_stack_main) rather than re-running each catchup branch independently.
  • Risk/mitigation: parity/open-PR coverage gate rerun is explicitly queued in the handoff for network-enabled execution.

Notes (optional)

  • Manual output / logs (short):
    • Validation bundle: artifacts/integration/reports/fix_stack_main_20260318_034847
    • Reconciliation totals: total_findings=109, addressed=109
    • Remap artifact: artifacts/integration/fix_branch_remap_impact.md
  • Known limitations:
    • Catchup PRs are split for ordering/parity traceability; final integrated evidence remains anchored on fix_stack_main artifacts.

Labels (optional)

  • Not applicable for these ordered catchup PRs; label hygiene is deferred to maintainer-side triage.

- Track 2: add scripts/paper/run_track2_waves.sh with Wave 1 stop, --wave2 path, and Wave 3 elapsed-time gate

- Track 2: add canonical benchmark split inputs train/holdout/paraphrase under benchmark/erf_llm_compare

- Track 2: add fixed reproducible train_subsample_s42.jsonl (100 rows, seed=42)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note these choices reflect an important shift in src

@jmsexton03 jmsexton03 marked this pull request as ready for review March 26, 2026 18:21
@jmsexton03 jmsexton03 merged commit 42573e0 into development Mar 26, 2026
11 of 15 checks passed
@jmsexton03 jmsexton03 deleted the catchup_19_fix_tooling_per_solver branch March 31, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant