Skip to content

Add monitor tuning meta-guide for run-volume and false-positive avoidance#53

Draft
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/monitor-tuning
Draft

Add monitor tuning meta-guide for run-volume and false-positive avoidance#53
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/monitor-tuning

Conversation

@samgutentag
Copy link
Copy Markdown
Member

Summary

  • New page: flaky-tests/detection/tuning-monitors.mdx
  • Adds a docs.json nav entry under the Flaky test detection group, slotted after the three monitor-type pages
  • Ties together run-volume → monitor-type recommendations, single-failure-flap avoidance, branch coverage, recovery vs activation, monitor states (active / inactive / disabled), and a pre-auto-quarantine checklist

Why

Sourced from customer feedback mining (cluster monitor-tuning-thresholds, verdict partial + first-class IA candidate, 15 pairs across 7 customers). The individual monitor pages already document each monitor type. Customers consistently ask the same set of system-level tuning questions — when to use failure-count vs failure-rate, how to avoid single-failure flips, why a monitor scoped to main misses queue-branch failures, what "inactive" means in the UI, what to check before turning on auto-quarantine.

Items flagged for review

  • Page location. Slotted under flaky-tests/detection/ rather than flaky-tests/management/ because the page is about tuning detection behavior, not managing already-detected tests. The cluster suggestion mentioned either location; this felt cleaner since every link inside the page points at detection pages. Confirm or move.
  • Auto-quarantine recommended window: "1-3 days." Lifted from the cluster Q&A (Caseware thread). Confirm this still matches current eng guidance.
  • Pass-on-Retry default recovery = 7 days, range 1-15. Pulled from pass-on-retry-monitor.mdx and matches the cluster Gusto thread.
  • Branch patterns table (Trunk Merge Queue / GitHub Merge Queue / Graphite Merge Queue) mirrors the table in failure-rate-monitor.mdx. GitLab Merge Trains intentionally omitted since the cluster didn't surface a question about them — failure-rate-monitor.mdx notes they run on the target branch directly.
  • The "gap" section explicitly calls out that there's no way to distinguish "flakes detected in MQ" from "bad PR in MQ" at the monitor level, and proposes a >=2 failures in 1h failure-count threshold on queue branches as a proxy. This came directly from the Gusto thread reply ("Higher-threshold failure count monitor that marks broken is the right pattern... No good way to distinguish flakes-detected-in-MQ from actual-bad-PRs-in-MQ today."). Confirm the proxy guidance is still accurate.
  • "Inactive" state definition. Cluster note said "Copy will be improved" in the UI — the doc currently defines it as "previously triggered, no longer triggered, still enabled." Confirm this matches the latest UI state and whether the copy change has shipped.
  • Pre-auto-quarantine cross-link points at ../agents/autofix-flaky-tests. That page exists but its content is more about the auto-investigation/PR flow than the auto-quarantine toggle. If there's a better target page for the auto-quarantine setting itself, swap it.

Customer signal

@samgutentag samgutentag added the needs review PR sourced from customer-feedback-mining; needs human scrutiny for accuracy before merge label May 20, 2026
@mintlify
Copy link
Copy Markdown
Contributor

mintlify Bot commented May 20, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
trunk 🟢 Ready View Preview May 20, 2026, 11:05 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@samgutentag
Copy link
Copy Markdown
Member Author

samgutentag commented May 26, 2026

Verification status (2026-05-29): unknown

Could not determine rollout state from available signals. Chaining to verify-docs-against-code for content-accuracy check.

  • Flag state: LaunchDarkly not consulted per flag (no eng PR, no flag to read). LD MCP was reachable this sweep.
  • Eng PR: none referenced in PR body
  • Flag: n/a (no eng work to verify)
  • Signals checked:
    • No trunk-io/<repo>#NNN or PR URL references in PR body
    • No TRUNK-XXX Linear ticket linked
    • PR scope is content (meta-guide for tuning existing flaky test monitors), not a flag-gated new feature

This is a content-accuracy PR documenting existing behavior. The code-verification comment found one contradiction: Pass-on-Retry recovery range documented as "1-15" but code enforces 1-14. Correct before merge. No rollout dependency to wait on.

@samgutentag samgutentag marked this pull request as draft May 26, 2026 18:39
@samgutentag
Copy link
Copy Markdown
Member Author

samgutentag commented May 26, 2026

Code verification (2026-06-01): 1 contradicted (unresolved from prior sweep)

Claim Verdict Source
Pass-on-Retry recovery days range "1-15" contradicted flake-detection.ts

The earlier code-verification findings still hold. Re-checked this sweep: the recovery-days range remains stated as "1-15" in the diff, but source still enforces a maximum of 14. The frontend validation throws Recovery days must be between 1 and 14 for recoveryDays > 14. Correct the doc to "range 1-14" before merge.

All other claims from the prior sweep remain confirmed (recovery_days default 7, trunk-merge/* pattern, resolution-vs-activation threshold separation, failure-count resolve_after_minutes, stale timeout, auto-quarantine excluding broken tests, active/inactive monitor states). The two graphite-merge/* and minimum-sample-size items remain unverifiable against Trunk source (Graphite-defined / behavioral), not contradictions.


Source #1 — Pass-on-Retry recovery days max is 14, not 15 (contradicted)

File: trunk-io/trunk2/ts/apps/frontend/src/lib/services/flake-detection.ts

if (recoveryDays < 1 || recoveryDays > 14) {
  throw new Error("Recovery days must be between 1 and 14");
}

Reasoning: The validation rejects any value above 14, so the valid range is 1-14, not 1-15. The doc's "range 1–15" is off by one at the top of the range.

@samgutentag samgutentag added the needs eng review verify-docs-against-code: at least one claim contradicts source. label May 26, 2026
Copy link
Copy Markdown
Member Author

Verification status (May 31, 2026): unknown

Editorial monitor-tuning content, but internal content contradictions prevent confident auto-verification.

  • Flag state: not consulted
  • Eng PR: none found
  • Flag: none
  • Signals: content appears to describe existing behavior, but review flagged contradictions requiring human check before publish

Next action: author or reviewer to resolve content contradictions and confirm accuracy, then merge.


Generated by Claude Code

Copy link
Copy Markdown
Member Author

samgutentag commented Jun 1, 2026

Verification status (2026-06-01): unknown

Could not determine rollout state from available signals. Chaining to verify-docs-against-code for content-accuracy check.

  • Flag state: LaunchDarkly not consulted (no eng PR, no flag to read).
  • Eng PR: none referenced in PR body.
  • Flag: n/a (no eng work to verify).
  • Signals checked:
    • No trunk-io/<repo>#NNN or PR URL references in PR body.
    • No TRUNK-XXX Linear ticket linked.
    • Scope is a content meta-guide tying together existing monitor-tuning behavior, not a flag-gated new feature.

This is a content-accuracy PR. The publish gate is the open content contradiction tracked in the verify-docs-against-code comment (recovery-days range). PR is currently CONFLICTING (needs rebase); does not affect this verdict.

Unchanged from prior sweep: still unknown.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs eng review verify-docs-against-code: at least one claim contradicts source. needs review PR sourced from customer-feedback-mining; needs human scrutiny for accuracy before merge

Development

Successfully merging this pull request may close these issues.

1 participant