Add monitor tuning meta-guide for run-volume and false-positive avoidance#53
Add monitor tuning meta-guide for run-volume and false-positive avoidance#53samgutentag wants to merge 1 commit into
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
Verification status (2026-05-29): Could not determine rollout state from available signals. Chaining to
This is a content-accuracy PR documenting existing behavior. The code-verification comment found one contradiction: Pass-on-Retry recovery range documented as "1-15" but code enforces 1-14. Correct before merge. No rollout dependency to wait on. |
|
Code verification (2026-06-01): 1 contradicted (unresolved from prior sweep)
The earlier code-verification findings still hold. Re-checked this sweep: the recovery-days range remains stated as "1-15" in the diff, but source still enforces a maximum of 14. The frontend validation throws All other claims from the prior sweep remain confirmed ( Source #1 — Pass-on-Retry recovery days max is 14, not 15 (contradicted)File: if (recoveryDays < 1 || recoveryDays > 14) {
throw new Error("Recovery days must be between 1 and 14");
}Reasoning: The validation rejects any value above 14, so the valid range is 1-14, not 1-15. The doc's "range 1–15" is off by one at the top of the range. |
…e-failure-flap avoidance, branch coverage
46dc9a5 to
2542725
Compare
|
Verification status (May 31, 2026): Editorial monitor-tuning content, but internal content contradictions prevent confident auto-verification.
Next action: author or reviewer to resolve content contradictions and confirm accuracy, then merge. Generated by Claude Code |
|
Verification status (2026-06-01): Could not determine rollout state from available signals. Chaining to
This is a content-accuracy PR. The publish gate is the open content contradiction tracked in the Unchanged from prior sweep: still Generated by Claude Code |
Summary
flaky-tests/detection/tuning-monitors.mdxdocs.jsonnav entry under the Flaky test detection group, slotted after the three monitor-type pagesWhy
Sourced from customer feedback mining (cluster
monitor-tuning-thresholds, verdictpartial+ first-class IA candidate, 15 pairs across 7 customers). The individual monitor pages already document each monitor type. Customers consistently ask the same set of system-level tuning questions — when to use failure-count vs failure-rate, how to avoid single-failure flips, why a monitor scoped tomainmisses queue-branch failures, what "inactive" means in the UI, what to check before turning on auto-quarantine.Items flagged for review
flaky-tests/detection/rather thanflaky-tests/management/because the page is about tuning detection behavior, not managing already-detected tests. The cluster suggestion mentioned either location; this felt cleaner since every link inside the page points at detection pages. Confirm or move.pass-on-retry-monitor.mdxand matches the cluster Gusto thread.failure-rate-monitor.mdx. GitLab Merge Trains intentionally omitted since the cluster didn't surface a question about them — failure-rate-monitor.mdx notes they run on the target branch directly.>=2 failures in 1hfailure-count threshold on queue branches as a proxy. This came directly from the Gusto thread reply ("Higher-threshold failure count monitor that marks broken is the right pattern... No good way to distinguish flakes-detected-in-MQ from actual-bad-PRs-in-MQ today."). Confirm the proxy guidance is still accurate.../agents/autofix-flaky-tests. That page exists but its content is more about the auto-investigation/PR flow than the auto-quarantine toggle. If there's a better target page for the auto-quarantine setting itself, swap it.Customer signal
monitor-tuning-thresholds(verdict: partial, 15 pairs / 7 customers, first-class IA candidate)findings/clusters/monitor-tuning-thresholds.json):