Skip to content

Catchup 18: Add agent paper reproduction#52

Merged
jmsexton03 merged 235 commits into
developmentfrom
catchup_18_add_agent_paper_reproduction
Mar 26, 2026
Merged

Catchup 18: Add agent paper reproduction#52
jmsexton03 merged 235 commits into
developmentfrom
catchup_18_add_agent_paper_reproduction

Conversation

@jmsexton03

Copy link
Copy Markdown
Collaborator

Summary

  • Catchup context: slice 18 on branch catchup_18_add_agent_paper_reproduction.
  • Ordered split wave objective: preserve parity/paper cutoff lineage by landing slices in ascending order.
  • What was added/changed in this slice:
    • add paper reproduction and repo-validation snapshot scripts with configs
    • add Level0 flat-score disambiguation for solver routing
    • updated unify requested viz vars across inputs and slices with solver-driven defaults
    • updated tune simple baseline scoring for weak KB signal and case-hint promotion
    • updated track erf_benchmark lib modules and unignore path
    • updated pass user prompt into llm_compare inputs selection
    • updated improve ERF executable fallback and central-build compile targeting
    • add benchmark explainability pipeline and sanity-set tooling
  • Workstreams (topic-level):
    • index/schema artifact and metadata evolution
    • benchmark workflow/retry behavior updates
    • paper/reproduction workflow scripting
    • node orchestration flow updates
    • service contract/behavior updates
    • unit regression coverage updates
    • FAISS/manifest compatibility handling
    • visualization config/plot behavior adjustments
    • paper/wave workflow-path handling
    • ERF execution/fallback behavior handling
  • Slice metadata:
    • Commit range: c6bf31ae0ca6..2c2d60017707 (source apply_stack_slice_118 -> canonical fix_stack_main)
    • Findings profile (P0/P1/P2/P3): 1/2/2/0 (total 5)
  • Fix implementation note: findings are reconciled/resolved on canonical stacked branch fix_stack_main at 73f37cf9e86d.

Related or overlapping functionality / DRY guidance

  • Overlap is expected with stacked fix lineage (fix_stack_main); avoid duplicating logic that is already hardened in shared services/nodes.
  • Keep node/state contract compatibility aligned with src/models/graph_state_canonical.py and tests/contracts/* when touching shared flows.
  • Evidence artifacts for cross-slice decisions: artifacts/integration/findings_reconciliation.json and artifacts/integration/fix_branch_remap_impact.md.
  • This embeds a significant architectural decision that needs an ADR.
    • If checked, add an ADR under docs/adr/ (one short file describing context, decision, consequences).

Impact checklist

  • fixes a bug or incorrect behavior
  • adds new capabilities
  • changes answers in the test suite to more than roundoff level
  • likely affects downstream users or results
  • includes docs updates (code/docs), if appropriate
  • none of the above

Tests run (CI runs: pytest tests/unit, pytest tests/quality, pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full")

  • tests/unit: pytest tests/unit
  • tests/quality: pytest tests/quality
  • integration ladder (CI): pytest tests/integration -m "integration_l1 or integration_l2 or integration_l3 or integration_l4 or integration_full"
  • other (list): final closure validation on canonical fix_stack_main
  • Output/summary:
    • per-slice branch-head run in this phase: not executed
    • canonical closure branch used for validation: fix_stack_main (73f37cf9e86d)
    • canonical unit: 1663 passed, 31 skipped, 3 warnings (coverage 56.82%)
    • canonical full: 1813 passed, 78 skipped, 10 xfailed, 11 warnings (coverage 58.63%)
    • canonical quality: 20 passed, 1 skipped, 5 warnings
    • canonical integration ladder: 47 passed, 36 skipped, 92 deselected, 1 xfailed, 7 warnings
    • canonical junit evidence: artifacts/integration/reports/fix_stack_main_20260318_034847/unit.junit.xml, artifacts/integration/reports/fix_stack_main_20260318_034847/full.junit.xml
  • If tests require repos/schemas/indices or real services, note markers used.
  • requires_solver(...) implies repo + schema + default indices are available locally.
  • Use -k pelec|erf|amrex|warpx to filter solver-specific tests.

Tests not run in CI (required if any)

  • CI runs tests/unit, tests/quality, and tests/integration with integration_l1..l4 + integration_full markers via micromamba; list anything else not covered by CI here.
  • None
  • tests/e2e
  • other (list): per-slice branch-local test reruns
  • Reason for skip: this phase validated closure on canonical stacked branch (fix_stack_main) rather than re-running each catchup branch independently.
  • Risk/mitigation: parity/open-PR coverage gate rerun is explicitly queued in the handoff for network-enabled execution.

Notes (optional)

  • Manual output / logs (short):
    • Validation bundle: artifacts/integration/reports/fix_stack_main_20260318_034847
    • Reconciliation totals: total_findings=109, addressed=109
    • Remap artifact: artifacts/integration/fix_branch_remap_impact.md
  • Known limitations:
    • Catchup PRs are split for ordering/parity traceability; final integrated evidence remains anchored on fix_stack_main artifacts.

Labels (optional)

  • Not applicable for these ordered catchup PRs; label hygiene is deferred to maintainer-side triage.

Comment on lines +156 to +220
@classmethod
def get_viz_variable_catalog(cls, repo_root: Path | None = None) -> list[dict[str, Any]]:
"""
Build ERF visualization variable catalog from live ERF source files.
"""
if repo_root:
root = Path(repo_root)
else:
root = Path(__file__).resolve().parents[2].parent / "ERF"

header = root / "Source" / "ERF.H"
if not header.exists():
return []

try:
text = header.read_text(encoding="utf-8", errors="ignore")
except OSError:
return []

vector_pattern = re.compile(
r'const\s+amrex::Vector<std::string>\s+(cons_names|derived_names|derived_names_2d)\s*\{(.*?)\};',
re.DOTALL,
)
names: list[str] = []
for _, body in vector_pattern.findall(text):
names.extend(re.findall(r'"([^"]+)"', body))

units = {
"density": "kg/m^3",
"temp": "K",
"pressure": "Pa",
"qv": "kg/kg",
"qc": "kg/kg",
"qi": "kg/kg",
"qrain": "kg/kg",
"qsnow": "kg/kg",
"qgraup": "kg/kg",
"qt": "kg/kg",
}
aliases = {
"temp": ["temperature"],
"magvel": ["velocity", "speed"],
"vorticity_z": ["vorticity", "vertical vorticity"],
"qc": ["cloud water", "cloud_water", "liquid water", "cloud liquid"],
"qv": ["water vapor", "vapor mixing ratio", "humidity"],
}

catalog: list[dict[str, Any]] = []
seen: set[str] = set()
source_ref = str(header)
for name in names:
if name in seen:
continue
seen.add(name)
catalog.append(
{
"name": name,
"aliases": aliases.get(name, []),
"units": units.get(name),
"description": None,
"source": source_ref,
}
)
return catalog

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe too specific and you could get it from the llm instead of more catalog listing

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a better place for this to live or way to be set up?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe too hard-coded

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hardcoded which direction is interesting may not be documented well even though it's tested

@jmsexton03 jmsexton03 marked this pull request as ready for review March 26, 2026 18:11
@jmsexton03 jmsexton03 merged commit 34388c1 into development Mar 26, 2026
11 of 15 checks passed
@jmsexton03 jmsexton03 deleted the catchup_18_add_agent_paper_reproduction branch March 31, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant