Catchup 13: Merge agent wave1 step by jmsexton03 · Pull Request #47 · AMReX-Codes/amrex-agent

jmsexton03 · 2026-03-18T12:44:22Z

Summary

Catchup context: slice 13 on branch catchup_13_merge_agent_wave1_step.
Ordered split wave objective: preserve parity/paper cutoff lineage by landing slices in ascending order.
What was added/changed in this slice:
- merge wave1-wt-4 and stabilize step-1 unit suite
- add new unnumbered regression tests for metrics, routing, and monitoring
- updated stabilize unit contracts and restore legacy compatibility paths
- updated fix unit-test collection imports for knowledge and architect services
- updated fix benchmark seed=0 handling and replay manifest privacy
- updated architect retrieval/orchestration updates and sweep normalization
- updated schema builder non-blocking tier logging and config safeguards
- updated mCP session context persistence and server concurrency guardrails
Workstreams (topic-level):
- index/schema artifact and metadata evolution
- MCP boundary and payload-handling updates
- runtime graph routing/wiring updates
- graph/model state-contract updates
- service contract/behavior updates
- unit regression coverage updates
- clarification flow behavior adjustments
- sweep/orchestration behavior adjustments
- MCP safety/contract handling adjustments
- FAISS/manifest compatibility handling
Slice metadata:
- Commit range: f4206ae0fcbd..523aba32ad20 (source apply_stack_slice_113 -> canonical fix_stack_main)
- Findings profile (P0/P1/P2/P3): 3/0/2/0 (total 5)
Fix implementation note: findings are reconciled/resolved on canonical stacked branch fix_stack_main at 73f37cf9e86d.

Related or overlapping functionality / DRY guidance

Overlap is expected with stacked fix lineage (fix_stack_main); avoid duplicating logic that is already hardened in shared services/nodes.
Keep node/state contract compatibility aligned with src/models/graph_state_canonical.py and tests/contracts/* when touching shared flows.
Evidence artifacts for cross-slice decisions: artifacts/integration/findings_reconciliation.json and artifacts/integration/fix_branch_remap_impact.md.
This embeds a significant architectural decision that needs an ADR.
- If checked, add an ADR under docs/adr/ (one short file describing context, decision, consequences).

Impact checklist

fixes a bug or incorrect behavior
adds new capabilities
changes answers in the test suite to more than roundoff level
likely affects downstream users or results
includes docs updates (code/docs), if appropriate
none of the above

Tests run (CI runs: `pytest tests/unit`, `pytest tests/quality`, `pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full"`)

tests/unit: pytest tests/unit
tests/quality: pytest tests/quality
integration ladder (CI): pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full"
other (list): final closure validation on canonical fix_stack_main
Output/summary:
- per-slice branch-head run in this phase: not executed
- canonical closure branch used for validation: fix_stack_main (73f37cf9e86d)
- canonical unit: 1663 passed, 31 skipped, 3 warnings (coverage 56.82%)
- canonical full: 1813 passed, 78 skipped, 10 xfailed, 11 warnings (coverage 58.63%)
- canonical quality: 20 passed, 1 skipped, 5 warnings
- canonical integration ladder: 47 passed, 36 skipped, 92 deselected, 1 xfailed, 7 warnings
- canonical junit evidence: artifacts/integration/reports/fix_stack_main_20260318_034847/unit.junit.xml, artifacts/integration/reports/fix_stack_main_20260318_034847/full.junit.xml
If tests require repos/schemas/indices or real services, note markers used.
requires_solver(...) implies repo + schema + default indices are available locally.
Use -k pelec|erf|amrex|warpx to filter solver-specific tests.

Tests not run in CI (required if any)

CI runs tests/unit, tests/quality, and tests/integration with integration_l1..l4 + integration_full markers via micromamba; list anything else not covered by CI here.
None
tests/e2e
other (list): per-slice branch-local test reruns
Reason for skip: this phase validated closure on canonical stacked branch (fix_stack_main) rather than re-running each catchup branch independently.
Risk/mitigation: parity/open-PR coverage gate rerun is explicitly queued in the handoff for network-enabled execution.

Notes (optional)

Manual output / logs (short):
- Validation bundle: artifacts/integration/reports/fix_stack_main_20260318_034847
- Reconciliation totals: total_findings=109, addressed=109
- Remap artifact: artifacts/integration/fix_branch_remap_impact.md
Known limitations:
- Catchup PRs are split for ordering/parity traceability; final integrated evidence remains anchored on fix_stack_main artifacts.

Labels (optional)

Not applicable for these ordered catchup PRs; label hygiene is deferred to maintainer-side triage.

- Enhances aggregate-metrics extraction for model/provider and strategy fields. - Adds grouped summary outputs and publication table generation paths. - Covers CLI/report outputs for directory and file input workflows.

- Hardens MCP session context merge/persist behavior with parent inheritance. - Adds server-side concurrent session handling safeguards. - Verifies workflow-store and MCP concurrency behavior end-to-end in tests.

…eanup

- Adds non-blocking Tier 3/4 logging behavior to schema build workflow. - Improves config/schema integration guardrails for edge-case handling. - Covers builder/config boundary behavior with dedicated unit tests.

- Refines architect retrieval and solver-selection orchestration behavior. - Updates knowledge-service normalization and related orchestration tests. - Keeps sweep/orchestrator test expectations aligned with service behavior.

# Conflicts: # scripts/aggregate_metrics.py # src/config.py # src/graph.py # src/nodes/paper_validator_node.py # src/services/architect.py # src/services/plan.py # src/services/rules/base.py # src/session_manager.py # src/utils/metrics.py # tests/quality/test_standards.py # tests/unit/test_architect_node_history.py # tests/unit/test_benchmark_runner.py # tests/unit/test_level0_index.py # tests/unit/test_metrics_collector.py

…3_merge_agent_wave1_step

jmsexton03 · 2026-03-26T16:10:14Z

+MAX_NEW_INTEGRATION_LOC = 300
+LEGACY_LOC_BUDGET_EXEMPT_CODES = {
+    "AMReX",
+    "PeleC",
+    "PeleLMeX",
+    "Incflo",
+    "WarpX",
+    "ERF",
+    "REMORA",
+}
+


too specific, maybe this doesn't live here

jmsexton03

Needs cleaner test names

jmsexton03 added 30 commits February 13, 2026 07:10

Add superfacility staging fallbacks

18bf25a

Add superfacility staging tests

e45bf8d

Clarify SFAPI auth errors for remote executable lookup

1e86156

Use sfapi_client key paths for auth

c8d1008

Fix sfapi_client file listing for executables

581b272

Prefer remote_output_dir on Perlmutter

5f33c68

Drop local output_dir from remote config

91e6490

Add benchmark case grid and runner scaffold

88239bd

Add benchmark case grid and runner scaffold

f57dc65

Add metrics collection and JSONL summaries

8318355

Add metrics collection and JSONL summaries

6dd0d5d

Add benchmark runner outputs and compare_models test

9aceb6e

Add benchmark runner outputs and compare_models test

5590477

Add metrics aggregation adapter for raw benchmark records

3dd3d06

Add metrics aggregation adapter for raw benchmark records

f9f7eca

Refactor benchmark runner into shared case and model modules

9be57d1

Refactor benchmark runner into shared case and model modules

86b04d0

Add difficulty and novelty metadata to benchmark cases

429ccde

Add difficulty and novelty metadata to benchmark cases

e898b12

Add benchmark metrics context and CSV aggregation

6b797bf

Add benchmark metrics context and CSV aggregation

43b4d6f

Add difficulty and novelty aggregates to metrics output

417e053

Add difficulty and novelty aggregates to metrics output

2d2073a

Remove env-var metrics context and refine concept density

2374c39

Remove env-var metrics context and refine concept density

5cf117e

Use benchmark context sidecar for metrics

142e241

Use benchmark context sidecar for metrics

72cd9bb

Skip REST mkdir when SFAPI creds are available

28cd4fe

Fix LLM client unwrapping and SFAPI test

188500c

Document instructor usage map

2d9fe1c

jmsexton03 added 17 commits March 11, 2026 03:52

Aggregate metrics reporting and publication table generation

c225b2b

- Enhances aggregate-metrics extraction for model/provider and strategy fields. - Adds grouped summary outputs and publication table generation paths. - Covers CLI/report outputs for directory and file input workflows.

MCP session context persistence and server concurrency guardrails

5c123e1

- Hardens MCP session context merge/persist behavior with parent inheritance. - Adds server-side concurrent session handling safeguards. - Verifies workflow-store and MCP concurrency behavior end-to-end in tests.

Merge wave 1 changes from wt-2 through b04973f

4afb0b7

Clean session/PRD-coded naming in new work

3f74640

Rename numbered session/PRD test and helper names after merge-base cl…

5fba19a

…eanup

Finalize wt-2 cleanup conflict resolution follow-ups

68888a3

Schema builder non-blocking tier logging and config safeguards

34dc827

- Adds non-blocking Tier 3/4 logging behavior to schema build workflow. - Improves config/schema integration guardrails for edge-case handling. - Covers builder/config boundary behavior with dedicated unit tests.

Fix benchmark seed=0 handling and replay manifest privacy

4523e5e

Step 1 compatibility: restore graph gate API and metrics/plan helpers

9e1694b

Fix dependency routing default path and taxonomy reason-code allowlist

920125d

Fix unit-test collection imports for knowledge and architect services

511a86f

Harden solver selection compatibility for legacy architect call sites

f316e66

Stabilize unit contracts and restore legacy compatibility paths

3c5ee42

Add new unnumbered regression tests for metrics, routing, and monitoring

ccd6b41

Merge wave1-wt-4 and stabilize step-1 unit suite

523aba3

github-actions Bot added docs tests infra demo code data/indices labels Mar 18, 2026

jmsexton03 added 2 commits March 26, 2026 09:09

Merge remote-tracking branch 'amrex-codes/development' into catchup_1…

1f5fc86

…3_merge_agent_wave1_step

Remove scripts and local outputs

e58ee18

jmsexton03 commented Mar 26, 2026

View reviewed changes

jmsexton03 marked this pull request as ready for review March 26, 2026 16:12

jmsexton03 merged commit 05a5a18 into development Mar 26, 2026
11 of 15 checks passed

jmsexton03 deleted the catchup_13_merge_agent_wave1_step branch March 31, 2026 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catchup 13: Merge agent wave1 step#47

Catchup 13: Merge agent wave1 step#47
jmsexton03 merged 186 commits into
developmentfrom
catchup_13_merge_agent_wave1_step

jmsexton03 commented Mar 18, 2026

Uh oh!

jmsexton03 Mar 26, 2026

Uh oh!

jmsexton03 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jmsexton03 commented Mar 18, 2026

Summary

Related or overlapping functionality / DRY guidance

Impact checklist

Tests run (CI runs: pytest tests/unit, pytest tests/quality, pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full")

Tests not run in CI (required if any)

Notes (optional)

Labels (optional)

Uh oh!

jmsexton03 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

jmsexton03 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Tests run (CI runs: `pytest tests/unit`, `pytest tests/quality`, `pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full"`)