-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
fix(grouping): Only run grouping calculation once #78630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(grouping): Only run grouping calculation once #78630
Conversation
ae3a2ec
to
3cabc55
Compare
except Exception as err: | ||
sentry_sdk.capture_exception(err) | ||
|
||
return secondary_hashes | ||
# Return an empty variants dictionary because we need the signature of this function to match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does the signature have to match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so that's half a lie, and half true. The two pairs of functions in question are _calculate_primary_hashes
/_calculate_secondary_hashes
and run_primary_grouping
/maybe_run_secondary_grouping
. Technically, the signatures of the first pair don't have to match, because they're each only used in the corresponding member of the second pair. The signatures of the secondary pair really do have to match, though, because they're both passed to get_hashes_and_grouphashes
as hash_calculation_function
.
sentry/src/sentry/event_manager.py
Lines 1357 to 1364 in 779d6a2
def get_hashes_and_grouphashes( | |
job: Job, | |
hash_calculation_function: Callable[ | |
[Project, Job, MutableTags], | |
tuple[GroupingConfig, list[str]], | |
], | |
metric_tags: MutableTags, | |
) -> GroupHashInfo: |
It sort of seemed easier to just keep the structures parallel, but you're right, I could be more accurate in how I describe what's happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep that makes sense. only Q then is it a bit more explicit to use pythons unwrapping syntax instead of grabbing the 0th element? so it'd be
secondary_hashes, _ = _calculate_event_grouping(
project, event_copy, secondary_grouping_config
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK, do you think it is? It kinda seems six-of-one-half-a-dozen-of-the-other to me. Happy to change it if you think the other is better, since I'm already going back to clean up the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like it because its more explicit, as i was reviewing the PR i went to go make sure "is that the hashes being returned, that's the interface right, ah yep okay". but totally a small nit and feel free to keep it the way that it is if you like it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you're saying. Okay, sure, I'll switch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPDATE: In the end, I decided the most explicit thing would be to leave _calculate_secondary_hashes
as is (not have it return variants, and not say it's doing so in the name, the way _calculate_primary_hashes
- now called _calculate_primary_hashes_and_variants
- does). I did still make maybe_run_secondary_grouping
return an empty dictionary for the variants (because the matching-signature constraint there is real), but now the inner helpers actually tell the truth about what they do/don't do.
(As you suggested, I did still switch the [0]
to be unpacking in _calculate_secondary_hashes
, though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything makes sense and looks well factored. only q. around the interface and calling of _calculate_secondary_hashes
3cabc55
to
88ccde3
Compare
…d_variants` function
88ccde3
to
64f2f90
Compare
During grouping, the slowest part of the process is calculating the variants. Before Seer and before grouphash metadata, we only needed them once, but now we use them in two places in the Seer flow and are about to use them in another place for grouphash metadata. To avoid calculating them now potentially up to four times, this PR refactors things so that they're passed from the place where they're initially calculated (in
event.get_hashes
) through the various intermediate functions to the spots in the Seer flow where they're currently used. Along this path is the place where we'll need them from grouphash metadata also.Notes:
In order to not have to change unrelated uses of
get_hashes
, instead of changing its return value I instead extracted most of its inner logic into a separateget_hashes_and_variants
method. Nowget_hashes
callsget_hashes_and_variants
(and just ignores the variants) and in the spot in ingest where we used to callget_hashes
, we now callget_hashes_and_variants
.We have a pair of helpers,
run_primary_grouping
andmaybe_run_secondary_grouping
, for calculating primary and secondary hashes, respectively, which need to have matching signatures, because they're both passed toget_hashes_and_grouphashes
as thehash_calculation_function
parameter. We don't ever need (or want) variants from the secondary hash calculation, so in place of real variants datamaybe_run_secondary_grouping
just passes an empty dictionary.