Skip to content

Agent health check#623

Draft
jmelburg wants to merge 20 commits into
masterfrom
agent-health-check
Draft

Agent health check#623
jmelburg wants to merge 20 commits into
masterfrom
agent-health-check

Conversation

@jmelburg

@jmelburg jmelburg commented Apr 2, 2026

Copy link
Copy Markdown
Collaborator

Agent-ready markdown version of the health check, to be added to the MCP

@codecov

codecov Bot commented Apr 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.89977% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.76%. Comparing base (4ad3d47) to head (c27e8e3).

Files with missing lines Patch % Lines
python/pdstools/adm/Analysis.py 95.87% 18 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #623      +/-   ##
==========================================
+ Coverage   91.57%   91.76%   +0.18%     
==========================================
  Files         110      111       +1     
  Lines        9636    10075     +439     
==========================================
+ Hits         8824     9245     +421     
- Misses        812      830      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jmelburg jmelburg marked this pull request as draft April 2, 2026 11:22
@operdeck

operdeck commented Apr 8, 2026

Copy link
Copy Markdown
Collaborator

@jmelburg are we still using the create_metric_itable method? I had introduced it to behave similar to the create_metric_gttable that we use in the health checks.

@jmelburg

jmelburg commented Apr 8, 2026

Copy link
Copy Markdown
Collaborator Author

@operdeck We use it for DSI, I customized it a bit when upgrading DSI but it should still have the same formatting and thresholds as health check

@StijnKas StijnKas force-pushed the agent-health-check branch from 42c4697 to 1523d07 Compare April 23, 2026 08:51
StijnKas and others added 14 commits April 23, 2026 14:56
- Reports.health_check_agent: PathLike (was undefined os.PathLike)
- test_Analysis: zip(..., strict=True) for B905
- test_Reports.py: trailing newline cleanup from end-of-file-fixer

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Analysis is an ADMDatamart namespace, so tests/adm/ is the right home
(matches tests/adm/test_Aggregates.py for the sibling namespace).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per AGENTS.md: `verbose` is reserved for genuine user-facing progress
(tqdm bars, multi-stage CLI). Internal path echoes are diagnostic detail
and belong on the logger. `Reports.health_check` was already using
`logger.info` on its failure path; this aligns `health_check_agent`
with that pattern.

Side effects:
- Drops two latent run_quarto kwargs that don't exist on the function
  (`verbose=`, `size_reduction_method=`). Tests are Quarto-gated and
  marked slow, so the call would have crashed at first real invocation.
- CHANGELOG entry under [5.0.0] Changed documenting the breaking removal.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per AGENTS.md "exact-value assertions over structural checks". Six
tests in test_Analysis.py used `> 0` / `len(...) > 0` to assert the
shape of finding output. All replaced with exact counts derived from
the fixture data (cdh_sample or _make_dm), with comments noting the
re-pin trigger if upstream sample data changes.

- TestFindings::test_findings_have_nonempty_title_and_detail: also
  asserts at least one finding exists (previously vacuously passed
  on an empty list).
- TestModelMaturityChecks: 3 maturity findings on cdh_sample.
- TestPredictorChecks: 2 predictor findings on cdh_sample.
- TestFullAnalysis: 10 total findings, exact category set
  {model, taxonomy, predictor}.
- TestCheckTaxonomyEdgeCases (RED/AMBER): 4 findings each (one per
  metric: actions, treatments, issues, channels) plus title check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three plan files capturing follow-ups identified in the PR #623
architecture-review pass that were too invasive for a single review-pass
PR (per the scope-bail rule in the review brief):

- adm/analysis-namespace-facade.md (P3): `Analysis` has 9 private
  helpers but only 1 public method. AGENTS.md namespace-facade trigger
  (>20 public methods) isn't met. Split would be ~4h of mechanical
  work + 73 test-mock-path rewrites for pure dotted-name benefit.
- adm/analysis-aggregates-overlap-audit.md (P3): per-method audit
  result. Most checks already delegate to dm.aggregates; the only
  borderline duplicate (_check_trends per-snapshot per-channel
  weighted avg) has no existing aggregate to delegate to. No code
  changes recommended at this time.
- health-check/agent-report-qmd-dedup.md (P3): HealthCheck.qmd vs
  HealthCheckAgent.qmd share substantial preamble. Quarto restructure
  is risky (slow Quarto-gated tests; subtle include semantics) — split
  into a focused PR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@StijnKas StijnKas force-pushed the agent-health-check branch from 1523d07 to 38ccb9a Compare April 23, 2026 13:11
@StijnKas StijnKas marked this pull request as ready for review April 23, 2026 14:21
StijnKas and others added 6 commits April 23, 2026 17:09
Validated against 8 production CDH datasets (Rabobank, Wells Fargo,
Achmea, Bank of Ireland, Verizon, Standard Chartered, T-Mobile US,
Singtel). Output went from 32-39 findings down to 18-25 with the
following changes:

- Collapse N x 'Channel X has zero responses' into one aggregate
  finding per direction with the channel list truncated at 5
  (Verizon: 16 dead-channel lines -> 2; T-Mobile: 14 -> 2).
- Add a 'data_quality' category and _check_data_quality helper that
  flags malformed Channel/Direction values (empty, '/', '/Inbound',
  'Web/'). Invalid channels are excluded from the dead-channel
  aggregate so the counts stay clean.
- Drop the per-dead-channel 'has N model configurations' info loop
  in _check_channels - it added zero actionable detail and only ever
  fired for already-dead channels.
- Add a 'summary' category with a �� stethoscope icon and a single
  headline finding at index 0 derived from the underlying check
  output (% never used, dead-channel count, weighted average AUC).
  Verdict thresholds: NEEDS ATTENTION (>25% never-used / >2 dead
  channels with mature low-perf / AUC<55), MIXED, HEALTHY.
- Tier the overall-AUC finding: <58 critical, 58-62 warning, 62-70
  info, ≥70 success (new severity with ✅ icon).
- Add a 'mature_low_perf' percentage bucket so the five maturity
  bins (never_used / responses_no_positives / immature /
  mature_low_perf / mature_decent) are mutually exclusive and sum
  to exactly 100% on every real dataset. Use bp_min ModelPerformance
  (0.55) as the cutoff so 'healthy' and 'low' agree.
- Surface explicit thresholds and direction in taxonomy findings:
  'Total treatments (1) is below minimum (2)',
  'Total actions (7,059) is above typical range (>2500)' instead of
  the generic 'outside recommended limits'.
- Append the dominant Channel/Direction to 'Configuration X has no
  positives' findings so triagers know which channel to investigate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Added exact-value tests in test_Analysis.py for:

- _check_data_quality detection of empty / '/' / '/Inbound' values
  and exact model count (3 invalid out of 4 rows)
- Collapsed dead-channel findings with singular/plural noun and
  '+N more' truncation past 5 channels
- Headline summary finding (HEALTHY / MIXED / NEEDS ATTENTION)
  with 🩺 icon
- Tiered AUC: critical < 58, warning 58-62, info 62-70, success ≥ 70
- Maturity buckets summing exactly to total model count
- Taxonomy 'below minimum' / 'above typical range' direction in title
- Configuration channel hint '(Push/Outbound)' on no-positives finding

Updated existing tests for:

- New 'success' severity in valid_severities
- New 'data_quality' / 'summary' categories in valid_categories
- _check_model_performance now returns (findings, avg_auc) tuple
- _check_channels no longer emits 'has N model configurations'
- Pinned counts: maturity 3 -> 4, full findings 10 -> 12
- Taxonomy title format check uses new direction-aware strings

Deferred to plan files:

- docs/plans/adm/findings-severity-ranking.md (issue #10)
- docs/plans/adm/findings-recommendations.md (issue #8)

89 tests pass; ruff and pre-commit clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The visible maturity buckets are intentionally a 'visible cohorts' view,
not a 100 % decomposition. Mature low-performance models are surfaced
through dedicated count findings ('N mature models have low performance',
'N stuck at AUC=50') rather than a percentage bucket — surfacing them
twice produced duplicate signal in the agent output.

Restores the 'decent' cutoff to min_perf (0.52) so the bucket boundary
matches the AUC threshold used elsewhere in the report.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The architecture pass moved test_Reports.py into python/tests/healthcheck/
but the workflow step still pointed at the old top-level path, so the
agent-report job collected zero tests and failed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jmelburg jmelburg marked this pull request as draft June 11, 2026 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants