Skip to content

feat(seer): Suppress re-triage of skipped issues in night shift#114915

Open
trevor-e wants to merge 4 commits intomasterfrom
telkins/night-shift-skip-cache
Open

feat(seer): Suppress re-triage of skipped issues in night shift#114915
trevor-e wants to merge 4 commits intomasterfrom
telkins/night-shift-skip-cache

Conversation

@trevor-e
Copy link
Copy Markdown
Member

@trevor-e trevor-e commented May 5, 2026

Persist SKIP verdicts from night-shift triage to Redis with a 3.5-day TTL, then exclude those group ids from candidate selection on subsequent nightly runs. Stops the agent from repeatedly re-evaluating issues it already classified as not worth fixing. The TTL exists at all because it's possible we may get new information in a few days (better tag distribution, new recommended event, etc) so we do eventually want to re-run our triage against it.

The TTL is padded past 3 days so nightly-run jitter cannot expire a key right at the boundary, guaranteeing suppression for the next 3 runs.

Persist SKIP verdicts to a Redis cache keyed by group id with a 3.5-day
TTL, then exclude those ids from candidate selection on subsequent
nightly runs. Stops the agent from repeatedly re-evaluating the same
issues it already classified as not worth fixing, saving compute and
quota.

The TTL is padded past 3 days so nightly-run jitter cannot expire a
key right at the boundary; this guarantees the next 3 runs suppress
the issue.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 5, 2026
trevor-e and others added 3 commits May 5, 2026 17:07
The old name suggested filtering out recently-skipped ids, but the
function actually returns the subset that ARE recently skipped.
Rename so the name matches the return value.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Mark the group via mark_skipped() before the run so the test exercises
the real read path through Redis instead of stubbing recently_skipped.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
@trevor-e trevor-e marked this pull request as ready for review May 5, 2026 21:23
@trevor-e trevor-e requested a review from a team as a code owner May 5, 2026 21:23
Comment on lines +195 to +197
for v in triage_response.verdicts:
if v.group_id in groups_by_id and v.action == TriageAction.SKIP:
mark_skipped(v.group_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: A Redis connection failure in mark_skipped after the main agent logic will cause an unhandled exception, discarding all previously computed triage results.
Severity: MEDIUM

Suggested Fix

Wrap the mark_skipped call in its own try/except block to catch potential Redis connection errors. Log the error for observability but do not re-raise it, allowing the function to return the successfully computed triage results. This ensures that failures in the caching optimization do not cause the loss of primary results.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/tasks/seer/night_shift/agentic_triage.py#L195-L197

Potential issue: The `mark_skipped` function is called outside the `try/except` block
that wraps the expensive Seer agent interactions. If a Redis connection error occurs
during this call, the exception is not handled locally. It propagates up to the
`run_night_shift_execution` function, which then marks the entire run as failed and
discards all the triage results (e.g., `AUTOFIX`, `ROOT_CAUSE_ONLY`) that were
successfully generated by the agent. This wastes significant LLM computation due to a
failure in a non-critical optimization step.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown
Contributor

@chromy chromy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants