-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(grouping): Only run grouping calculation once #78630
base: master
Are you sure you want to change the base?
Conversation
ae3a2ec
to
3cabc55
Compare
except Exception as err: | ||
sentry_sdk.capture_exception(err) | ||
|
||
return secondary_hashes | ||
# Return an empty variants dictionary because we need the signature of this function to match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does the signature have to match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so that's half a lie, and half true. The two pairs of functions in question are _calculate_primary_hashes
/_calculate_secondary_hashes
and run_primary_grouping
/maybe_run_secondary_grouping
. Technically, the signatures of the first pair don't have to match, because they're each only used in the corresponding member of the second pair. The signatures of the secondary pair really do have to match, though, because they're both passed to get_hashes_and_grouphashes
as hash_calculation_function
.
sentry/src/sentry/event_manager.py
Lines 1357 to 1364 in 779d6a2
def get_hashes_and_grouphashes( | |
job: Job, | |
hash_calculation_function: Callable[ | |
[Project, Job, MutableTags], | |
tuple[GroupingConfig, list[str]], | |
], | |
metric_tags: MutableTags, | |
) -> GroupHashInfo: |
It sort of seemed easier to just keep the structures parallel, but you're right, I could be more accurate in how I describe what's happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep that makes sense. only Q then is it a bit more explicit to use pythons unwrapping syntax instead of grabbing the 0th element? so it'd be
secondary_hashes, _ = _calculate_event_grouping(
project, event_copy, secondary_grouping_config
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK, do you think it is? It kinda seems six-of-one-half-a-dozen-of-the-other to me. Happy to change it if you think the other is better, since I'm already going back to clean up the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like it because its more explicit, as i was reviewing the PR i went to go make sure "is that the hashes being returned, that's the interface right, ah yep okay". but totally a small nit and feel free to keep it the way that it is if you like it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you're saying. Okay, sure, I'll switch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything makes sense and looks well factored. only q. around the interface and calling of _calculate_secondary_hashes
3cabc55
to
88ccde3
Compare
…d_variants` function
88ccde3
to
64f2f90
Compare
During grouping, the slowest part of the process is calculating the variants. Before Seer and before grouphash metadata, we only needed them once, but now we use them in two places in the Seer flow and are about to use them in another place for grouphash metadata. To avoid calculating them now potentially up to four times, this PR refactors things so that they're passed from the place where they're initially calculated (in
event.get_hashes
) through the various intermediate functions to the spots in the Seer flow where they're currently used. Along this path is the place where we'll need them from grouphash metadata also.Notes:
In order to not have to change unrelated uses of
get_hashes
, instead of changing its return value I instead extracted most of its inner logic into a separateget_hashes_and_variants
method. Nowget_hashes
callsget_hashes_and_variants
(and just ignores the variants) and in the spot in ingest where we used to callget_hashes
, we now callget_hashes_and_variants
.We have a few pairs of helpers, for calculating primary and secondary hashes, respectively, which need to have matching signatures - meaning if the primary-hash version returns variants, the secondary-hash version must, too. That said, we don't ever actually want to use the secondary variants, so rather than having the secondary version of each helper returning the real variants, I instead chose to return an empty dictionary. Since we ignore that part of the result it doesn't really matter, but I figured debugging-wise, it's easier to keep track of "this one I want, this one I don't" if one has real data and one is empty.