Agent health check by jmelburg · Pull Request #623 · pegasystems/pega-datascientist-tools

jmelburg · 2026-04-02T07:22:47Z

Agent-ready markdown version of the health check, to be added to the MCP

codecov · 2026-04-02T07:28:18Z

Codecov Report

❌ Patch coverage is 95.89977% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.76%. Comparing base (4ad3d47) to head (c27e8e3).

Files with missing lines	Patch %	Lines
python/pdstools/adm/Analysis.py	95.87%	18 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #623      +/-   ##
==========================================
+ Coverage   91.57%   91.76%   +0.18%     
==========================================
  Files         110      111       +1     
  Lines        9636    10075     +439     
==========================================
+ Hits         8824     9245     +421     
- Misses        812      830      +18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

operdeck · 2026-04-08T10:04:09Z

@jmelburg are we still using the create_metric_itable method? I had introduced it to behave similar to the create_metric_gttable that we use in the health checks.

jmelburg · 2026-04-08T13:12:25Z

@operdeck We use it for DSI, I customized it a bit when upgrading DSI but it should still have the same formatting and thresholds as health check

- Reports.health_check_agent: PathLike (was undefined os.PathLike) - test_Analysis: zip(..., strict=True) for B905 - test_Reports.py: trailing newline cleanup from end-of-file-fixer Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Analysis is an ADMDatamart namespace, so tests/adm/ is the right home (matches tests/adm/test_Aggregates.py for the sibling namespace). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Per AGENTS.md: `verbose` is reserved for genuine user-facing progress (tqdm bars, multi-stage CLI). Internal path echoes are diagnostic detail and belong on the logger. `Reports.health_check` was already using `logger.info` on its failure path; this aligns `health_check_agent` with that pattern. Side effects: - Drops two latent run_quarto kwargs that don't exist on the function (`verbose=`, `size_reduction_method=`). Tests are Quarto-gated and marked slow, so the call would have crashed at first real invocation. - CHANGELOG entry under [5.0.0] Changed documenting the breaking removal. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Per AGENTS.md "exact-value assertions over structural checks". Six tests in test_Analysis.py used `> 0` / `len(...) > 0` to assert the shape of finding output. All replaced with exact counts derived from the fixture data (cdh_sample or _make_dm), with comments noting the re-pin trigger if upstream sample data changes. - TestFindings::test_findings_have_nonempty_title_and_detail: also asserts at least one finding exists (previously vacuously passed on an empty list). - TestModelMaturityChecks: 3 maturity findings on cdh_sample. - TestPredictorChecks: 2 predictor findings on cdh_sample. - TestFullAnalysis: 10 total findings, exact category set {model, taxonomy, predictor}. - TestCheckTaxonomyEdgeCases (RED/AMBER): 4 findings each (one per metric: actions, treatments, issues, channels) plus title check. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Three plan files capturing follow-ups identified in the PR #623 architecture-review pass that were too invasive for a single review-pass PR (per the scope-bail rule in the review brief): - adm/analysis-namespace-facade.md (P3): `Analysis` has 9 private helpers but only 1 public method. AGENTS.md namespace-facade trigger (>20 public methods) isn't met. Split would be ~4h of mechanical work + 73 test-mock-path rewrites for pure dotted-name benefit. - adm/analysis-aggregates-overlap-audit.md (P3): per-method audit result. Most checks already delegate to dm.aggregates; the only borderline duplicate (_check_trends per-snapshot per-channel weighted avg) has no existing aggregate to delegate to. No code changes recommended at this time. - health-check/agent-report-qmd-dedup.md (P3): HealthCheck.qmd vs HealthCheckAgent.qmd share substantial preamble. Quarto restructure is risky (slow Quarto-gated tests; subtle include semantics) — split into a focused PR. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Validated against 8 production CDH datasets (Rabobank, Wells Fargo, Achmea, Bank of Ireland, Verizon, Standard Chartered, T-Mobile US, Singtel). Output went from 32-39 findings down to 18-25 with the following changes: - Collapse N x 'Channel X has zero responses' into one aggregate finding per direction with the channel list truncated at 5 (Verizon: 16 dead-channel lines -> 2; T-Mobile: 14 -> 2). - Add a 'data_quality' category and _check_data_quality helper that flags malformed Channel/Direction values (empty, '/', '/Inbound', 'Web/'). Invalid channels are excluded from the dead-channel aggregate so the counts stay clean. - Drop the per-dead-channel 'has N model configurations' info loop in _check_channels - it added zero actionable detail and only ever fired for already-dead channels. - Add a 'summary' category with a �� stethoscope icon and a single headline finding at index 0 derived from the underlying check output (% never used, dead-channel count, weighted average AUC). Verdict thresholds: NEEDS ATTENTION (>25% never-used / >2 dead channels with mature low-perf / AUC<55), MIXED, HEALTHY. - Tier the overall-AUC finding: <58 critical, 58-62 warning, 62-70 info, ≥70 success (new severity with ✅ icon). - Add a 'mature_low_perf' percentage bucket so the five maturity bins (never_used / responses_no_positives / immature / mature_low_perf / mature_decent) are mutually exclusive and sum to exactly 100% on every real dataset. Use bp_min ModelPerformance (0.55) as the cutoff so 'healthy' and 'low' agree. - Surface explicit thresholds and direction in taxonomy findings: 'Total treatments (1) is below minimum (2)', 'Total actions (7,059) is above typical range (>2500)' instead of the generic 'outside recommended limits'. - Append the dominant Channel/Direction to 'Configuration X has no positives' findings so triagers know which channel to investigate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Added exact-value tests in test_Analysis.py for: - _check_data_quality detection of empty / '/' / '/Inbound' values and exact model count (3 invalid out of 4 rows) - Collapsed dead-channel findings with singular/plural noun and '+N more' truncation past 5 channels - Headline summary finding (HEALTHY / MIXED / NEEDS ATTENTION) with 🩺 icon - Tiered AUC: critical < 58, warning 58-62, info 62-70, success ≥ 70 - Maturity buckets summing exactly to total model count - Taxonomy 'below minimum' / 'above typical range' direction in title - Configuration channel hint '(Push/Outbound)' on no-positives finding Updated existing tests for: - New 'success' severity in valid_severities - New 'data_quality' / 'summary' categories in valid_categories - _check_model_performance now returns (findings, avg_auc) tuple - _check_channels no longer emits 'has N model configurations' - Pinned counts: maturity 3 -> 4, full findings 10 -> 12 - Taxonomy title format check uses new direction-aware strings Deferred to plan files: - docs/plans/adm/findings-severity-ranking.md (issue #10) - docs/plans/adm/findings-recommendations.md (issue #8) 89 tests pass; ruff and pre-commit clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The visible maturity buckets are intentionally a 'visible cohorts' view, not a 100 % decomposition. Mature low-performance models are surfaced through dedicated count findings ('N mature models have low performance', 'N stuck at AUC=50') rather than a percentage bucket — surfacing them twice produced duplicate signal in the agent output. Restores the 'decent' cutoff to min_perf (0.52) so the bucket boundary matches the AUC threshold used elsewhere in the report. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The architecture pass moved test_Reports.py into python/tests/healthcheck/ but the workflow step still pointed at the old top-level path, so the agent-report job collected zero tests and failed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jmelburg marked this pull request as draft April 2, 2026 11:22

StijnKas force-pushed the agent-health-check branch from 42c4697 to 1523d07 Compare April 23, 2026 08:51

StijnKas and others added 14 commits April 23, 2026 14:56

Initial commit

ec2f23a

add analysis to ADMDatamart

ca661ae

health_check_agent report

8a92f1e

reduce overlap with maturity report

9908524

Major content and token-efficiency improvements

2789ebd

more guidance, remove lift

2db4a6c

Add markdown health check tests to workflow

db23bb7

coverage for analysis tests

98945b0

python tests for markdown health check

ddf9536

test: relocate test_Analysis.py to tests/adm/

8a78f27

Analysis is an ADMDatamart namespace, so tests/adm/ is the right home (matches tests/adm/test_Aggregates.py for the sibling namespace). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

StijnKas force-pushed the agent-health-check branch from 1523d07 to 38ccb9a Compare April 23, 2026 13:11

StijnKas marked this pull request as ready for review April 23, 2026 14:21

StijnKas and others added 6 commits April 23, 2026 17:09

fix: source Analysis cutoffs from MetricLimits, not magic numbers

a146752

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: plan to move remaining bespoke thresholds into MetricLimits

ec9d2f6

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jmelburg marked this pull request as draft June 11, 2026 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent health check#623

Agent health check#623
jmelburg wants to merge 20 commits into
masterfrom
agent-health-check

jmelburg commented Apr 2, 2026

Uh oh!

codecov Bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

operdeck commented Apr 8, 2026

Uh oh!

jmelburg commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jmelburg commented Apr 2, 2026

Uh oh!

codecov Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

operdeck commented Apr 8, 2026

Uh oh!

jmelburg commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented Apr 2, 2026 •

edited

Loading