Skip to content

ref(test-alerts): Adds exception filtering for IntegrationErrors to be shown in the UI: #78653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

GabeVillalobos
Copy link
Member

@GabeVillalobos GabeVillalobos commented Oct 4, 2024

  • Adds a new ApiInvalidRequestError type, to better differentiate between 400 and unhandled non-400 responses from downstream APIs.
  • Adds filtering to our ticket creation utils which only propagates IntegrationFormErrors when a known 400 status response is received by requests to an integration's API
  • Adds sentry error IDs to unhandled test notification errors for easier customer triage.

When an unhandled error type is provided, we will show it + the event ID associated with it for easier triage:
image

Note: This does not fix the brunt of the IntegrationFormError sentry issues we're getting for Jira, which I'll address in a follow-up PR, but this does give me more confidence to GA the UI error reporting for test notifications.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Oct 4, 2024
@@ -110,7 +111,10 @@ def execute_future_on_test_event(
logger.warning(
"%s.test_alert.unexpected_exception", callback_name, exc_info=True
)
break
error_id = sentry_sdk.capture_exception(exc)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this will add a bit more noise to our sentry issues in the short term, so I may need to split this out into a separate exception type, or perhaps just an info level log? Either way, we probably want to add custom handling for these exceptions to reduce customer confusion when they see this response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL this function has a return value :O

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! We use this in our base API exception handler and surface error IDs there as well, which is the only reason I know about this 😅

event_id = capture_exception(err, scope=scope)

Copy link

codecov bot commented Oct 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #78653       +/-   ##
===========================================
+ Coverage   64.67%   78.24%   +13.56%     
===========================================
  Files        7100     7105        +5     
  Lines      312820   313067      +247     
  Branches    51092    51121       +29     
===========================================
+ Hits       202331   244960    +42629     
+ Misses     103837    61740    -42097     
+ Partials     6652     6367      -285     

@@ -110,7 +111,10 @@ def execute_future_on_test_event(
logger.warning(
"%s.test_alert.unexpected_exception", callback_name, exc_info=True
)
break
error_id = sentry_sdk.capture_exception(exc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL this function has a return value :O

@@ -110,7 +111,10 @@ def execute_future_on_test_event(
logger.warning(
"%s.test_alert.unexpected_exception", callback_name, exc_info=True
)
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we break early so we don't have to stack error messages in the UI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do. When I was testing this with multiple integrations, it already breaks on the first error and halts execution, but it wouldn't hurt to do this just in case.

break
error_id = sentry_sdk.capture_exception(exc)
action_exceptions.append(
f"An unexpected error occurred. Error ID: '{error_id}'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We kind of know at this point that this error is related to integration. Can we say something like Jira returned an unexpected error? Basically sending people's attention to their Jira instance, and maybe they can figure out the issue without filing support.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we encounter an error here, it may not necessarily be integration related. I've also noticed that having multiple integrations in the same alert rule also makes things confusing, so I'll try to filter this down a bit more in a follow-up PR that includes better error messaging for both of these cases.

@GabeVillalobos GabeVillalobos force-pushed the gv/better-differentiate-between-api-error-types branch from 1464eaa to 9289202 Compare October 8, 2024 00:27
@GabeVillalobos GabeVillalobos merged commit f1b30f8 into master Oct 8, 2024
50 checks passed
@GabeVillalobos GabeVillalobos deleted the gv/better-differentiate-between-api-error-types branch October 8, 2024 16:24
@github-actions github-actions bot locked and limited conversation to collaborators Oct 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants