Skip to content

Catchup 10: Merge changes sync#44

Merged
jmsexton03 merged 138 commits into
developmentfrom
catchup_10_merge_changes_sync
Mar 26, 2026
Merged

Catchup 10: Merge changes sync#44
jmsexton03 merged 138 commits into
developmentfrom
catchup_10_merge_changes_sync

Conversation

@jmsexton03

Copy link
Copy Markdown
Collaborator

Summary

  • Catchup context: slice 10 on branch catchup_10_merge_changes_sync.
  • Ordered split wave objective: preserve parity/paper cutoff lineage by landing slices in ascending order.
  • What was added/changed in this slice:
    • merge wave 1 changes from wt-3
    • updated consolidate canonical test coverage and rename metrics helper
    • updated expand MCP and retrieval regression test coverage
    • updated tighten physics validator behavior and regression coverage
    • updated improve benchmark and metrics audit reporting paths
    • harden workflow state contracts across graph and session flows
    • add versioned FAISS manifest metadata and compatibility checks
    • updated use codex exec for non-interactive wave worktree runs
  • Workstreams (topic-level):
    • index/schema artifact and metadata evolution
    • runtime graph routing/wiring updates
    • node orchestration flow updates
    • service contract/behavior updates
    • unit regression coverage updates
    • clarification flow behavior adjustments
    • sweep/orchestration behavior adjustments
    • MCP safety/contract handling adjustments
    • retry/timeout handling adjustments
    • FAISS/manifest compatibility handling
  • Slice metadata:
    • Commit range: 8c1a41efb924..c18845b58a7b (source apply_stack_slice_110 -> canonical fix_stack_main)
    • Findings profile (P0/P1/P2/P3): 1/2/2/0 (total 5)
  • Fix implementation note: findings are reconciled/resolved on canonical stacked branch fix_stack_main at 73f37cf9e86d.

Related or overlapping functionality / DRY guidance

  • Overlap is expected with stacked fix lineage (fix_stack_main); avoid duplicating logic that is already hardened in shared services/nodes.
  • Keep node/state contract compatibility aligned with src/models/graph_state_canonical.py and tests/contracts/* when touching shared flows.
  • Evidence artifacts for cross-slice decisions: artifacts/integration/findings_reconciliation.json and artifacts/integration/fix_branch_remap_impact.md.
  • This embeds a significant architectural decision that needs an ADR.
    • If checked, add an ADR under docs/adr/ (one short file describing context, decision, consequences).

Impact checklist

  • fixes a bug or incorrect behavior
  • adds new capabilities
  • changes answers in the test suite to more than roundoff level
  • likely affects downstream users or results
  • includes docs updates (code/docs), if appropriate
  • none of the above

Tests run (CI runs: pytest tests/unit, pytest tests/quality, pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full")

  • tests/unit: pytest tests/unit
  • tests/quality: pytest tests/quality
  • integration ladder (CI): pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full"
  • other (list): final closure validation on canonical fix_stack_main
  • Output/summary:
    • per-slice branch-head run in this phase: not executed
    • canonical closure branch used for validation: fix_stack_main (73f37cf9e86d)
    • canonical unit: 1663 passed, 31 skipped, 3 warnings (coverage 56.82%)
    • canonical full: 1813 passed, 78 skipped, 10 xfailed, 11 warnings (coverage 58.63%)
    • canonical quality: 20 passed, 1 skipped, 5 warnings
    • canonical integration ladder: 47 passed, 36 skipped, 92 deselected, 1 xfailed, 7 warnings
    • canonical junit evidence: artifacts/integration/reports/fix_stack_main_20260318_034847/unit.junit.xml, artifacts/integration/reports/fix_stack_main_20260318_034847/full.junit.xml
  • If tests require repos/schemas/indices or real services, note markers used.
  • requires_solver(...) implies repo + schema + default indices are available locally.
  • Use -k pelec|erf|amrex|warpx to filter solver-specific tests.

Tests not run in CI (required if any)

  • CI runs tests/unit, tests/quality, and tests/integration with integration_l1..l4 + integration_full markers via micromamba; list anything else not covered by CI here.
  • None
  • tests/e2e
  • other (list): per-slice branch-local test reruns
  • Reason for skip: this phase validated closure on canonical stacked branch (fix_stack_main) rather than re-running each catchup branch independently.
  • Risk/mitigation: parity/open-PR coverage gate rerun is explicitly queued in the handoff for network-enabled execution.

Notes (optional)

  • Manual output / logs (short):
    • Validation bundle: artifacts/integration/reports/fix_stack_main_20260318_034847
    • Reconciliation totals: total_findings=109, addressed=109
    • Remap artifact: artifacts/integration/fix_branch_remap_impact.md
  • Known limitations:
    • Catchup PRs are split for ordering/parity traceability; final integrated evidence remains anchored on fix_stack_main artifacts.

Labels (optional)

  • Not applicable for these ordered catchup PRs; label hygiene is deferred to maintainer-side triage.

Generated from summary.csv view1-4.
Ordered: NONE before LOW, mock drift first,
DRY refactors before dependent sessions,
integration tests after unit sessions.
Includes dependency DAG and CI merge gate.
summary.csv: 398 criteria with confidence scores
view1: 131 critical gaps
view2: 16 mock drift items
view3: 38 missing integration tests
view4: 36 DRY violations
- validate manifest metadata with backward-compatible handling for legacy manifests

- enforce embedding_model requirements for newer manifest versions and check configured model compatibility

- update FAISS manifest build/download tooling and docs to carry embedding metadata
- strengthen manifest and sequence checks in graph, plan, and session orchestration paths

- improve workflow store and reviewer-node handling for state consistency and traceability

- expand contract and unit coverage for schema alignment and history/workflow behavior
- extend benchmark runner outputs and handling for clearer audit-oriented reporting

- enhance aggregate metrics generation and metrics utility consistency

- add unit coverage for benchmark execution, metrics collection, and result aggregation
- improve validator handling for edge cases and consistency of validation outcomes

- add focused unit tests to guard expected validation behavior
- add broader MCP tool-path tests for contract and behavior stability

- strengthen level0 index tests to protect retrieval regressions
- migrate useful scratch-session coverage into canonical plan, benchmark, and metrics test modules

- remove session-specific naming from token-field normalization via normalize_average_token_fields

- add migration-plan benchmark assertions and new plan factory/checklist behavior tests
Comment thread src/services/plan.py
Comment on lines +29 to +35
_IMPLEMENTATION_LOCATION_KEYS = (
"implementation_locations",
"implementation_location",
"locations",
"files",
"services",
)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be too hacky / global

Comment thread src/benchmark_runner.py
Comment on lines +614 to +621
def _deterministic_env_overrides(controls: dict[str, Any]) -> dict[str, str]:
seed = str(controls["seed"])
return {
"PYTHONHASHSEED": seed,
"AMREX_AGENT_BENCHMARK_SEED": seed,
"AMREX_AGENT_DETERMINISTIC_REPLAY": "1",
"AMREX_AGENT_REPLAY_FINGERPRINT": str(controls["replay_fingerprint"]),
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be secure or clear how it should work

@jmsexton03 jmsexton03 marked this pull request as ready for review March 26, 2026 15:19
@jmsexton03 jmsexton03 merged commit 627ce00 into development Mar 26, 2026
11 of 15 checks passed
@jmsexton03 jmsexton03 deleted the catchup_10_merge_changes_sync branch March 31, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant