Add monitor tuning meta-guide for run-volume and false-positive avoidance by samgutentag · Pull Request #53 · trunk-io/docs2

samgutentag · 2026-05-20T22:59:23Z

Summary

New page: flaky-tests/detection/tuning-monitors.mdx
Adds a docs.json nav entry under the Flaky test detection group, slotted after the three monitor-type pages
Ties together run-volume → monitor-type recommendations, single-failure-flap avoidance, branch coverage, recovery vs activation, monitor states (active / inactive / disabled), and a pre-auto-quarantine checklist

Why

Sourced from customer feedback mining (cluster monitor-tuning-thresholds, verdict partial + first-class IA candidate, 15 pairs across 7 customers). The individual monitor pages already document each monitor type. Customers consistently ask the same set of system-level tuning questions — when to use failure-count vs failure-rate, how to avoid single-failure flips, why a monitor scoped to main misses queue-branch failures, what "inactive" means in the UI, what to check before turning on auto-quarantine.

Items flagged for review

Page location. Slotted under flaky-tests/detection/ rather than flaky-tests/management/ because the page is about tuning detection behavior, not managing already-detected tests. The cluster suggestion mentioned either location; this felt cleaner since every link inside the page points at detection pages. Confirm or move.
Auto-quarantine recommended window: "1-3 days." Lifted from the cluster Q&A (Caseware thread). Confirm this still matches current eng guidance.
Pass-on-Retry default recovery = 7 days, range 1-15. Pulled from pass-on-retry-monitor.mdx and matches the cluster Gusto thread.
Branch patterns table (Trunk Merge Queue / GitHub Merge Queue / Graphite Merge Queue) mirrors the table in failure-rate-monitor.mdx. GitLab Merge Trains intentionally omitted since the cluster didn't surface a question about them — failure-rate-monitor.mdx notes they run on the target branch directly.
The "gap" section explicitly calls out that there's no way to distinguish "flakes detected in MQ" from "bad PR in MQ" at the monitor level, and proposes a >=2 failures in 1h failure-count threshold on queue branches as a proxy. This came directly from the Gusto thread reply ("Higher-threshold failure count monitor that marks broken is the right pattern... No good way to distinguish flakes-detected-in-MQ from actual-bad-PRs-in-MQ today."). Confirm the proxy guidance is still accurate.
"Inactive" state definition. Cluster note said "Copy will be improved" in the UI — the doc currently defines it as "previously triggered, no longer triggered, still enabled." Confirm this matches the latest UI state and whether the copy change has shipped.
Pre-auto-quarantine cross-link points at ../agents/autofix-flaky-tests. That page exists but its content is more about the auto-investigation/PR flow than the auto-quarantine toggle. If there's a better target page for the auto-quarantine setting itself, swap it.

Customer signal

Cluster: monitor-tuning-thresholds (verdict: partial, 15 pairs / 7 customers, first-class IA candidate)
Channels: trunk-gusto, trunk-retool, trunk-descript, trunk-chainlink, trunk-healthie, trunk-caseware
Source threads (full list in findings/clusters/monitor-tuning-thresholds.json):

mintlify · 2026-05-20T23:03:10Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
trunk	🟢 Ready	View Preview	May 20, 2026, 11:05 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

samgutentag · 2026-05-26T18:38:05Z

Verification status (2026-05-29): unknown

Could not determine rollout state from available signals. Chaining to verify-docs-against-code for content-accuracy check.

Flag state: LaunchDarkly not consulted per flag (no eng PR, no flag to read). LD MCP was reachable this sweep.
Eng PR: none referenced in PR body
Flag: n/a (no eng work to verify)
Signals checked:
- No trunk-io/<repo>#NNN or PR URL references in PR body
- No TRUNK-XXX Linear ticket linked
- PR scope is content (meta-guide for tuning existing flaky test monitors), not a flag-gated new feature

This is a content-accuracy PR documenting existing behavior. The code-verification comment found one contradiction: Pass-on-Retry recovery range documented as "1-15" but code enforces 1-14. Correct before merge. No rollout dependency to wait on.

samgutentag · 2026-05-26T18:48:47Z

Code verification (2026-06-01): 1 contradicted (unresolved from prior sweep)

Claim	Verdict	Source
Pass-on-Retry recovery days range "1-15"	`contradicted`	`flake-detection.ts`

The earlier code-verification findings still hold. Re-checked this sweep: the recovery-days range remains stated as "1-15" in the diff, but source still enforces a maximum of 14. The frontend validation throws Recovery days must be between 1 and 14 for recoveryDays > 14. Correct the doc to "range 1-14" before merge.

All other claims from the prior sweep remain confirmed (recovery_days default 7, trunk-merge/* pattern, resolution-vs-activation threshold separation, failure-count resolve_after_minutes, stale timeout, auto-quarantine excluding broken tests, active/inactive monitor states). The two graphite-merge/* and minimum-sample-size items remain unverifiable against Trunk source (Graphite-defined / behavioral), not contradictions.

Source #1 — Pass-on-Retry recovery days max is 14, not 15 (contradicted)

File: trunk-io/trunk2/ts/apps/frontend/src/lib/services/flake-detection.ts

if (recoveryDays < 1 || recoveryDays > 14) {
  throw new Error("Recovery days must be between 1 and 14");
}

Reasoning: The validation rejects any value above 14, so the valid range is 1-14, not 1-15. The doc's "range 1–15" is off by one at the top of the range.

…e-failure-flap avoidance, branch coverage

samgutentag · 2026-05-31T14:18:16Z

Verification status (May 31, 2026): unknown

Editorial monitor-tuning content, but internal content contradictions prevent confident auto-verification.

Flag state: not consulted
Eng PR: none found
Flag: none
Signals: content appears to describe existing behavior, but review flagged contradictions requiring human check before publish

Next action: author or reviewer to resolve content contradictions and confirm accuracy, then merge.

Generated by Claude Code

samgutentag · 2026-06-01T14:21:25Z

Verification status (2026-06-01): unknown

Could not determine rollout state from available signals. Chaining to verify-docs-against-code for content-accuracy check.

Flag state: LaunchDarkly not consulted (no eng PR, no flag to read).
Eng PR: none referenced in PR body.
Flag: n/a (no eng work to verify).
Signals checked:
- No trunk-io/<repo>#NNN or PR URL references in PR body.
- No TRUNK-XXX Linear ticket linked.
- Scope is a content meta-guide tying together existing monitor-tuning behavior, not a flag-gated new feature.

This is a content-accuracy PR. The publish gate is the open content contradiction tracked in the verify-docs-against-code comment (recovery-days range). PR is currently CONFLICTING (needs rebase); does not affect this verdict.

Unchanged from prior sweep: still unknown.

Generated by Claude Code

samgutentag added the needs review PR sourced from customer-feedback-mining; needs human scrutiny for accuracy before merge label May 20, 2026

mintlify Bot deployed to staging May 20, 2026 23:05 View deployment

samgutentag marked this pull request as draft May 26, 2026 18:39

samgutentag added the needs eng review verify-docs-against-code: at least one claim contradicts source. label May 26, 2026

add monitor tuning meta-guide: matching monitors to run volume, singl…

2542725

…e-failure-flap avoidance, branch coverage

samgutentag force-pushed the sam-gutentag/monitor-tuning branch from 46dc9a5 to 2542725 Compare May 26, 2026 18:58

mintlify Bot deployed to staging May 26, 2026 19:16 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add monitor tuning meta-guide for run-volume and false-positive avoidance#53

Add monitor tuning meta-guide for run-volume and false-positive avoidance#53
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/monitor-tuning

samgutentag commented May 20, 2026

Uh oh!

mintlify Bot commented May 20, 2026 •

edited

Loading

Uh oh!

samgutentag commented May 26, 2026 •

edited

Loading

Uh oh!

samgutentag commented May 26, 2026 •

edited

Loading

Uh oh!

samgutentag commented May 31, 2026

Uh oh!

samgutentag commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

samgutentag commented May 20, 2026

Summary

Why

Items flagged for review

Customer signal

Uh oh!

mintlify Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samgutentag commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samgutentag commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samgutentag commented May 31, 2026

Uh oh!

samgutentag commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

mintlify Bot commented May 20, 2026 •

edited

Loading

samgutentag commented May 26, 2026 •

edited

Loading

samgutentag commented May 26, 2026 •

edited

Loading

samgutentag commented Jun 1, 2026 •

edited

Loading