Skip to content

fix: improve member deduplication by parsing GitHub noreply emails (CM-958)#3929

Merged
skwowet merged 8 commits intomainfrom
improve/CM-958
Mar 19, 2026
Merged

fix: improve member deduplication by parsing GitHub noreply emails (CM-958)#3929
skwowet merged 8 commits intomainfrom
improve/CM-958

Conversation

@skwowet
Copy link
Copy Markdown
Collaborator

@skwowet skwowet commented Mar 18, 2026

Summary

  • Add parseGitHubNoreplyEmail utility to extract GitHub usernames from ID+USERNAME@users.noreply.github.com and USERNAME@users.noreply.github.com email formats
  • Look up existing members by derived GitHub username during activity ingestion, preventing duplicate profile creation
  • Add noreply email matching to merge suggestions OpenSearch queries and similarity scoring for better deduplication of existing profiles
  • Fix hasClashingMemberIdentities to only compare USERNAME identities, preventing false clashes when EMAIL identities exist on the same platform

Note

Medium Risk
Touches member-resolution and merge-suggestion logic, which can change how profiles are deduplicated and merged; mistakes could cause incorrect member attribution or missed merges, but the change is scoped to GitHub noreply email parsing/matching.

Overview
Improves member deduplication by adding parseGitHubNoreplyEmail and using it to treat GitHub @users.noreply.github.com emails as a source of the underlying GitHub username.

During activity ingestion (activity.service.ts), verified noreply emails are parsed and used to look up existing members by verified GitHub usernames before creating new members (applies to both member and objectMember).

In merge suggestions, OpenSearch queries now include noreply-email-derived username matches, similarity scoring treats noreply-email↔username matches as high confidence, and hasClashingMemberIdentities is narrowed to compare only USERNAME identities to avoid false clashes.

Written by Cursor Bugbot for commit 1c760dd. This will update automatically on new commits. Configure here.

@skwowet skwowet self-assigned this Mar 18, 2026
@skwowet skwowet force-pushed the improve/CM-958 branch 2 times, most recently from cb55819 to 1b14396 Compare March 18, 2026 15:25
@skwowet skwowet marked this pull request as ready for review March 18, 2026 16:11
Copilot AI review requested due to automatic review settings March 18, 2026 16:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves member deduplication by extracting GitHub usernames from @users.noreply.github.com emails and using that derived username for both ingestion-time member resolution and merge-suggestion matching.

Changes:

  • Added parseGitHubNoreplyEmail utility to derive GitHub usernames from noreply email formats.
  • Updated merge-suggestions similarity scoring and OpenSearch queries to consider noreply-email-to-username matches.
  • Updated activity ingestion to look up existing members by derived GitHub username and to attach an unverified GitHub username identity to new members when a noreply email is present.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
services/libs/common/src/email.ts Adds GitHub noreply email parsing helper.
services/apps/merge_suggestions_worker/src/memberSimilarityCalculator.ts Incorporates noreply-email-derived username matching into similarity scoring and adjusts clashing identity logic.
services/apps/merge_suggestions_worker/src/activities/memberMergeSuggestions.ts Adds OpenSearch clause to match derived GitHub usernames from noreply emails.
services/apps/data_sink_worker/src/service/activity.service.ts Resolves members during ingestion via derived GitHub usernames and appends unverified GitHub username identities for new members.
Comments suppressed due to low confidence (1)

services/apps/merge_suggestions_worker/src/memberSimilarityCalculator.ts:205

  • hasClashingMemberIdentities compares usernames using a case-sensitive inequality (i.keyword_value !== identity.value). Since identity matching elsewhere (e.g., SQL lookups) is case-insensitive, this can incorrectly treat values that differ only by casing as a clash and force a low similarity score. Consider normalizing both sides (e.g., lowercase) before comparing, and guard against missing values.
          similarMember.nested_identities.some(
            (i) =>
              i.keyword_type === MemberIdentityType.USERNAME &&
              i.string_platform === identity.platform &&
              i.keyword_value !== identity.value,
          )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@skwowet skwowet force-pushed the improve/CM-958 branch 2 times, most recently from ac839a5 to 9dedd19 Compare March 18, 2026 17:36
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 18, 2026

CLA assistant check
All committers have signed the CLA.

@skwowet skwowet force-pushed the improve/CM-958 branch 4 times, most recently from 8575348 to d99a94d Compare March 18, 2026 18:17
@skwowet skwowet requested a review from themarolt March 18, 2026 18:17
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

themarolt
themarolt previously approved these changes Mar 18, 2026
skwowet added 7 commits March 19, 2026 15:14
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
…lication

Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
@skwowet skwowet merged commit 6fa8cb2 into main Mar 19, 2026
10 checks passed
@skwowet skwowet deleted the improve/CM-958 branch March 19, 2026 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants