Skip to content

fix: redact pending primary email before retirement deletion#38426

Open
ktyagiapphelix2u wants to merge 14 commits into
openedx:masterfrom
ktyagiapphelix2u:ktyagi/primaryemail
Open

fix: redact pending primary email before retirement deletion#38426
ktyagiapphelix2u wants to merge 14 commits into
openedx:masterfrom
ktyagiapphelix2u:ktyagi/primaryemail

Conversation

@ktyagiapphelix2u
Copy link
Copy Markdown
Contributor

@ktyagiapphelix2u ktyagiapphelix2u commented Apr 23, 2026

Summary

This change addresses a privacy issue in the retirement flow for users who have a pending primary email change.

Ticket & Reference

https://2u-internal.atlassian.net/browse/BOMS-498

@ktyagiapphelix2u ktyagiapphelix2u marked this pull request as ready for review April 23, 2026 11:29
@ktyagiapphelix2u ktyagiapphelix2u requested a review from a team as a code owner April 23, 2026 11:29
original_new_email = self.email_change.new_email
original_activation_key = self.email_change.activation_key
record_was_redacted = PendingEmailChange.redact_pending_email_by_user_value(self.user2, field='user')
assert not record_was_redacted
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this test. Should it just have lines 617 and 618, where you ask to redact on a user that isn't in the table, and it returns that it didn't redact?

All the other details about the user 1 email change seem irrelevant and confusing. If you think it is important, I'd need better comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test now:

  1. Has a clear docstring explaining it verifies redacting a user with no pending email change returns False
  2. Only tests the relevant behavior - calling redact on user2 (who has no email change record) returns False
  3. Removes the confusing assertions about user1's email change data remaining unchanged

Comment on lines +918 to +921
Redact pending email change fields for records matching ``field=value``.
This method is intended for retirement flows where downstream systems
may keep soft-deleted snapshots of these rows.
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Docstrings should have a one line summary, and an optional blank line and longer description. Like a comment message.
  • Also, maybe add something like:
        Returns True if redacted, and False if no matching records found.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Condensed to a one-line summary + the Returns line

Comment thread common/djangoapps/student/tests/test_models.py Outdated
Comment on lines +930 to +931
record.new_email = get_retired_email_by_email(record.new_email)
record.save(update_fields=['new_email'])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description explicitly identifies activation_key as sensitive data that can still persist indirectly in logs, backups, or downstream systems. However, the implementation only redacts new_email . activation_key is left as-is. If downstream systems snapshot these rows, the activation key still leaks. It should be cleared before deletion, e.g.:

Suggested change
record.new_email = get_retired_email_by_email(record.new_email)
record.save(update_fields=['new_email'])
record.new_email = get_retired_email_by_email(record.new_email)
record.activation_key = '' # or a redacted value
record.save(update_fields=['new_email', 'activation_key'])

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Akanshu-2u The activation_key field has a unique=True database constraint. If we attempt to redact it to a fixed value (empty string or redacted placeholder), we'll violate this constraint when processing multiple users, causing database integrity errors.

Additionally, the activation key is a random UUID with no PII - it's just a token

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktyagiapphelix2u: The PR description mentions the activation key. I agree that this PR should not touch it, but can you update the PR description to remove any mention of it? Thanks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I have updated the PR Description

assert record_was_redacted
self.email_change.refresh_from_db()
assert self.email_change.new_email == expected_retired_email
assert self.email_change.activation_key == original_activation_key
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If activation_key redaction is added, this assertion must be updated to verify the key is also cleared/replaced. As-is, this test will need to change regardless once above issue is fixed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test assertion is correct as-is and does not need to change.

As explained in the previous comment thread, we are not adding activation_key redaction

Comment on lines +926 to +929
records_matching_user_value = cls.objects.filter(**filter_kwargs)
if not records_matching_user_value.exists():
return False
for record in records_matching_user_value:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both queries fetch the same data. Since the queryset is lazy, .exists() and the for loop each trigger a separate SQL query. Change to:

Suggested change
records_matching_user_value = cls.objects.filter(**filter_kwargs)
if not records_matching_user_value.exists():
return False
for record in records_matching_user_value:
records = list(cls.objects.filter(**filter_kwargs))
if not records:
return False
for record in records:

This is a single DB hit, which matters more if the field filter ever doesn't use the OneToOneField on user.

Note: The change is optional.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since PendingEmailChange has a OneToOneField on user, this will only ever return 0 or 1 records, so the memory impact is negligible while reducing database round-trips.

Thanks Updated

Comment on lines +930 to +931
record.new_email = get_retired_email_by_email(record.new_email)
record.save(update_fields=['new_email'])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktyagiapphelix2u: The PR description mentions the activation key. I agree that this PR should not touch it, but can you update the PR description to remove any mention of it? Thanks.

Comment on lines 1157 to 1158
PendingEmailChange.redact_pending_email_by_user_value(user, field="user")
PendingEmailChange.delete_by_user_value(user, field="user")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we just want to redact in delete_by_user_value before the delete happens? We'd use the same simple value from other PRs, like redact-before-delete@redacted.com. And this would resolve for any flow that is deleting this record, including retirement flow or completion of pending email changes, etc.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I am alluding to two bugs in my above comment. Are these accurate? The PR description does not yet mention both of these. Also, the ticket has an AC that mentions three bugs. Is there another, or is that a copy/paste issue in the ticket?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are two bugs this addresses:

Retirement flow - PendingEmailChange.delete_by_user_value() was called during account retirement without first redacting new_email, leaving the PII briefly visible (and in any soft-delete/audit snapshot taken at delete time).

Email confirmation flow - confirm_email_change in management.py also calls delete_by_user_value() when a pending email change is confirmed or cancelled, and it had the same gap. Moving the redaction inside the method fixes both flows simultaneously.

@ktyagiapphelix2u ktyagiapphelix2u force-pushed the ktyagi/primaryemail branch 2 times, most recently from 0d5d5e6 to 2593ac2 Compare May 4, 2026 08:49
@vgulati-apphelix
Copy link
Copy Markdown
Contributor

This change addresses a privacy issue in the retirement flow for users who have a pending primary email change.

Please elaborate the problem statement and solution provided.

Comment on lines +918 to +921
Redact pending email change fields for records matching ``field=value``.
This method is intended for retirement flows where downstream systems
may keep soft-deleted snapshots of these rows.
"""
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Condensed to a one-line summary + the Returns line

Comment on lines +930 to +931
record.new_email = get_retired_email_by_email(record.new_email)
record.save(update_fields=['new_email'])
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I have updated the PR Description

Comment on lines 1157 to 1158
PendingEmailChange.redact_pending_email_by_user_value(user, field="user")
PendingEmailChange.delete_by_user_value(user, field="user")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are two bugs this addresses:

Retirement flow - PendingEmailChange.delete_by_user_value() was called during account retirement without first redacting new_email, leaving the PII briefly visible (and in any soft-delete/audit snapshot taken at delete time).

Email confirmation flow - confirm_email_change in management.py also calls delete_by_user_value() when a pending email change is confirmed or cancelled, and it had the same gap. Moving the redaction inside the method fixes both flows simultaneously.

activation_key = models.CharField(('activation key'), max_length=32, unique=True, db_index=True)

@classmethod
def delete_by_user_value(cls, value, field):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robrap Redact at Call Sites (What you're suggesting)

In management.py:

# Email confirmation flow
user.email = pec.new_email
user.save()

# Redact inline before delete
pec.new_email = 'redacted@retired.invalid'
pec.save()
pec.delete()

In accounts/views.py:

# User retirement flow
# Redact inline before delete
PendingEmailChange.objects.filter(user=user).update(new_email='redacted@retired.invalid')
PendingEmailChange.objects.filter(user=user).delete()

What the current approch is

@classmethod
def delete_by_user_value(cls, value, field):
    records = cls.objects.filter(**{field: value})
    records.update(new_email='redacted@retired.invalid')
    records.delete()
    return True

In both call sites:
PendingEmailChange.delete_by_user_value(user, field="user")

# management.py - must remember to redact
pec.new_email = 'redacted@retired.invalid'
pec.save()
pec.delete()

# accounts/views.py - must remember again
records.update(new_email='redacted@retired.invalid')
records.delete()

# Future code in some other file 
pec.delete()  # ←if they forget to redact

So current implementation is correct

# Change in ONE place:
@classmethod
def delete_by_user_value(cls, value, field):
    records.update(new_email='redacted@retired.invalid')  #Only here
    # All callers automatically get the new value!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robrap
PendingEmailChange overrides delete_by_user_value from DeletableByUserValue (in model_mixins.py) because the parent's version only deletes it has no knowledge of PII fields. The issue was that Fivetran's soft-delete ETL preserves deleted rows, so if new_email isn't redacted before deletion, the PII leaks downstream. The override works via Python's MRO: when any caller invokes PendingEmailChange.delete_by_user_value(...), Python finds the method on PendingEmailChange first and uses it automatically getting redaction without any change to the callers. We can't put the redaction logic in model_mixins.py itself because it's a generic mixin used by many models (UserOrgTag, CourseEnrollmentAllowed, etc.) that don't have a new_email field adding it there would either crash other models or force callers to pass field names in, which breaks loose coupling by making callers responsible for knowing a model's internal PII fields. The override keeps the design clean: callers stay unchanged, and each model owns its own redaction logic.

@skip_unless_lms
@patch('common.djangoapps.student.signals.receivers.EmailChangeMiddleware.register_email_change')
@patch('common.djangoapps.student.views.management.ace')
def test_successful_email_change_redacts_pending_email_before_delete(self, ace_mail, mock_register): # pylint: disable=unused-argument
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock_register is silently unused, comment doesn't explain why it's patched.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I changed:

Removed the unused-argument suppression from the test.
Added an explicit assertion to use the patched mock:

mock_register.assert_called_once_with(self.request, expected_new_email)

)

assert response.status_code == 200
assert mock_render_to_response('email_change_successful.html', {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is verifying both (a) that redaction ordering is correct AND (b) that the email change itself succeeded end-to-end. The existing test_successful_email_change already covers the success path. This test should only assert the redaction ordering plus the record being gone — the response content, user email update, and ace_mail.send.call_count assertions are noise here and make it harder to understand what the test is actually for.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I narrowed test_successful_email_change_redacts_pending_email_before_delete so it now only checks:

SQL ordering: UPDATE on student_pendingemailchange occurs before DELETE.
The pending email change record is removed (PendingEmailChange.objects.count() == 0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants