fix: improve member deduplication by parsing GitHub noreply emails (CM-958)#3929
fix: improve member deduplication by parsing GitHub noreply emails (CM-958)#3929
Conversation
cb55819 to
1b14396
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves member deduplication by extracting GitHub usernames from @users.noreply.github.com emails and using that derived username for both ingestion-time member resolution and merge-suggestion matching.
Changes:
- Added
parseGitHubNoreplyEmailutility to derive GitHub usernames from noreply email formats. - Updated merge-suggestions similarity scoring and OpenSearch queries to consider noreply-email-to-username matches.
- Updated activity ingestion to look up existing members by derived GitHub username and to attach an unverified GitHub username identity to new members when a noreply email is present.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| services/libs/common/src/email.ts | Adds GitHub noreply email parsing helper. |
| services/apps/merge_suggestions_worker/src/memberSimilarityCalculator.ts | Incorporates noreply-email-derived username matching into similarity scoring and adjusts clashing identity logic. |
| services/apps/merge_suggestions_worker/src/activities/memberMergeSuggestions.ts | Adds OpenSearch clause to match derived GitHub usernames from noreply emails. |
| services/apps/data_sink_worker/src/service/activity.service.ts | Resolves members during ingestion via derived GitHub usernames and appends unverified GitHub username identities for new members. |
Comments suppressed due to low confidence (1)
services/apps/merge_suggestions_worker/src/memberSimilarityCalculator.ts:205
hasClashingMemberIdentitiescompares usernames using a case-sensitive inequality (i.keyword_value !== identity.value). Since identity matching elsewhere (e.g., SQL lookups) is case-insensitive, this can incorrectly treat values that differ only by casing as a clash and force a low similarity score. Consider normalizing both sides (e.g., lowercase) before comparing, and guard against missing values.
similarMember.nested_identities.some(
(i) =>
i.keyword_type === MemberIdentityType.USERNAME &&
i.string_platform === identity.platform &&
i.keyword_value !== identity.value,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
services/apps/merge_suggestions_worker/src/activities/memberMergeSuggestions.ts
Outdated
Show resolved
Hide resolved
services/apps/merge_suggestions_worker/src/memberSimilarityCalculator.ts
Show resolved
Hide resolved
ac839a5 to
9dedd19
Compare
8575348 to
d99a94d
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
services/apps/merge_suggestions_worker/src/memberSimilarityCalculator.ts
Outdated
Show resolved
Hide resolved
services/apps/merge_suggestions_worker/src/activities/memberMergeSuggestions.ts
Show resolved
Hide resolved
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
…lication Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>

Summary
parseGitHubNoreplyEmailutility to extract GitHub usernames fromID+USERNAME@users.noreply.github.comandUSERNAME@users.noreply.github.comemail formatshasClashingMemberIdentitiesto only compare USERNAME identities, preventing false clashes when EMAIL identities exist on the same platformNote
Medium Risk
Touches member-resolution and merge-suggestion logic, which can change how profiles are deduplicated and merged; mistakes could cause incorrect member attribution or missed merges, but the change is scoped to GitHub noreply email parsing/matching.
Overview
Improves member deduplication by adding
parseGitHubNoreplyEmailand using it to treat GitHub@users.noreply.github.comemails as a source of the underlying GitHub username.During activity ingestion (
activity.service.ts), verified noreply emails are parsed and used to look up existing members by verified GitHub usernames before creating new members (applies to bothmemberandobjectMember).In merge suggestions, OpenSearch queries now include noreply-email-derived username matches, similarity scoring treats noreply-email↔username matches as high confidence, and
hasClashingMemberIdentitiesis narrowed to compare onlyUSERNAMEidentities to avoid false clashes.Written by Cursor Bugbot for commit 1c760dd. This will update automatically on new commits. Configure here.