Skip to content

fix: Redact SSO PII before deletion#38425

Open
ktyagiapphelix2u wants to merge 26 commits into
openedx:masterfrom
ktyagiapphelix2u:ktyagi/SSOPII
Open

fix: Redact SSO PII before deletion#38425
ktyagiapphelix2u wants to merge 26 commits into
openedx:masterfrom
ktyagiapphelix2u:ktyagi/SSOPII

Conversation

@ktyagiapphelix2u
Copy link
Copy Markdown
Contributor

@ktyagiapphelix2u ktyagiapphelix2u commented Apr 23, 2026

Description

Implements automatic PII redaction for UserSocialAuth records before deletion to prevent personally identifiable information from persisting after records are removed.

Jira Ticket

https://2u-internal.atlassian.net/browse/BOMS-514

@ktyagiapphelix2u ktyagiapphelix2u marked this pull request as ready for review April 23, 2026 11:29
@ktyagiapphelix2u ktyagiapphelix2u requested a review from a team as a code owner April 23, 2026 11:29
Comment thread openedx/core/djangoapps/user_api/accounts/tests/test_utils.py Outdated
Comment thread openedx/core/djangoapps/user_api/accounts/tests/test_utils.py Outdated
Comment thread openedx/core/djangoapps/user_api/accounts/utils.py Outdated
Comment thread openedx/core/djangoapps/user_api/accounts/signals.py Outdated
Comment thread openedx/core/djangoapps/user_api/management/tests/test_retire_user.py Outdated
Comment thread openedx/core/djangoapps/user_api/management/commands/retire_user.py Outdated
Comment thread openedx/core/djangoapps/user_api/accounts/signals.py Outdated
@ktyagiapphelix2u
Copy link
Copy Markdown
Contributor Author

@robrap We’re dealing with multiple ways SSO records can get deleted through Django admin, user actions like unlinking accounts, bulk retirement scripts. The challenge is that we don’t control all of these paths, so we can’t reliably add PII redaction directly into each one.

Instead, we’ve set up a two-layer approach.

The first layer is a Django signal that runs automatically right before any SSO record is deleted. This acts as a safety net. No matter how the deletion is triggered whether it’s from admin, user action, the signal ensures sensitive fields like the UID and extra data are redacted. It’s centralized, consistent, so it won’t cause issues if it runs more than once.

The second layer is used only in cases we fully control, like user retirement flows. There, we proactively run a bulk redaction step before deleting records. This is much faster because it uses efficient database operations. When the delete happens afterward, the signal still fires, but it detects that the data is already redacted and simply exits without doing extra work.

Together, these two layers cover both safety and performance. The signal guarantees we never miss redaction, even in code we don’t control, while the explicit bulk step keeps large-scale operations efficient.

Comment thread openedx/core/djangoapps/user_api/accounts/signals.py Outdated
Comment thread openedx/core/djangoapps/user_api/management/commands/retire_user.py Outdated
Comment thread openedx/core/djangoapps/user_api/management/commands/retire_user.py Outdated
Comment thread openedx/core/djangoapps/user_api/accounts/utils.py Outdated
Comment thread openedx/core/djangoapps/user_api/management/commands/retire_user.py Outdated
Comment thread openedx/core/djangoapps/user_api/accounts/signals.py Outdated
Comment thread openedx/core/djangoapps/user_api/accounts/signals.py Outdated
Comment thread openedx/core/djangoapps/user_api/accounts/signals.py Outdated
Comment thread openedx/core/djangoapps/user_api/accounts/signals.py Outdated
Copy link
Copy Markdown
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to look at tests, but some minor comments. Looking good so far.

Comment thread openedx/core/djangoapps/user_api/accounts/utils.py Outdated
Copy link
Copy Markdown
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly test clean-up comments at this point.

Comment thread openedx/core/djangoapps/user_api/accounts/tests/test_utils.py Outdated
Comment on lines +187 to +195
captured_states = []

def capture_state_before_delete(sender, instance, **kwargs): # pylint: disable=unused-argument
instance.refresh_from_db()
captured_states.append({
'id': instance.id,
'uid': instance.uid,
'extra_data': dict(instance.extra_data) if instance.extra_data else {},
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using a pre_delete' signal for testing a pre_deletesignal makes this confusing. Is that what is being done here? How do you know what order thepre_delete` signals will get called? I'd rather it wasn't confusing in this way, and you used some other mechanism to test, like checking that there is an appropriate UPDATE query before the DELETE query, as we did in the earlier PR. You can retain the not exists assertion at the end.

Also, If this were needed, you've got a lot of code redundancy. You could use setUpClass or setUp and tearDownClass or tearDown, or helper functions to keep things DRY (Don't Repeat Yourself).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored to use CaptureQueriesContext to assert UPDATE precedes DELETE, and moved the signal tests to a new test_signals.py.

Safety-net signal handler that redacts PII on any UserSocialAuth before deletion.

Records deleted via ``redact_and_delete_social_auth`` will already be redacted;
this handler is a fallback for any other deletion path.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this handler is a fallback for any other deletion path.
this handler is a fallback for any missed deletion path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Redaction happens before deletion so that any observers see only sanitised data.
Downstream copies of data may use soft-deletes, and redacting before deleting
ensures PII for retired users (or future retirements) is not retained.
The uid format matches ``get_redacted_social_auth_uid()``.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this below...

Suggested change
The uid format matches ``get_redacted_social_auth_uid()``.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"""
social_auth_queryset = UserSocialAuth.objects.filter(user_id=user_id)
social_auth_queryset.update(
uid=Concat(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved (and edited) comment:

Suggested change
uid=Concat(
# Important: this redacted uid must match the format used by ``get_redacted_social_auth_uid()``.
uid=Concat(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment

"""
Redact PII from all UserSocialAuth records for the given user, then delete them.

Redaction happens before deletion so that any observers see only sanitised data.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider dropping this comment. The comment about soft-deletes is probably enough.

Suggested change
Redaction happens before deletion so that any observers see only sanitised data.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



@skip_unless_lms
class RedactUserSocialAuthPIITest(TestCase):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The signal tests belong in a test_signals.py file with an appropriate class name. Some reasonable signal tests:
  • Does the signal warn and redact if not already redacted?
  • Does the signal skip warning (and redaction) if already redacted?
  • Optional: Using mock, confirm redact_and_delete_social_auth is called with skip_delete=True.
  1. For utils tests of direct calls to redact_and_delete_social_auth, you can cover any items you didn't cover in signals (like maybe test_delete_redacts_multiple_sso_providers), and this shouldn't require signal setup and teardown.

Note: You have much of what you need, so hopefully this is minor refactoring and clean-up.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signal tests moved to test_signals.py using mock.patch, and utils tests now call redact_and_delete_social_auth directly without any signal setup/teardown.


captured_states = []

def capture_state_before_delete(sender, instance, **kwargs): # pylint: disable=unused-argument
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You may want to use the same UPDATE/DELETE query assertion you set up for the other test. See other comment for details.
  2. You'll also want to ensure that the real receiver you set up is not interfering with this test. For example, if you deleted the redaction from retire_user.py, would this test still pass because the signal is taking care of the redaction for you? One way to to fix this would be to disconnect that signal in setUpClass (with an appropriate comment) and to re-connect it in tearDownClass. An alternative is to mock logging and ensure that there is no log.warn from the signal (about redacting). You can test that these assertions work by temporarily removing the redaction you are testing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to CaptureQueriesContext for UPDATE-before-DELETE assertions, and disconnected the safety-net pre_delete signal handler around both tests so they'd fail if retire_user itself stopped redacting.

@ktyagiapphelix2u ktyagiapphelix2u force-pushed the ktyagi/SSOPII branch 3 times, most recently from 3f3977a to 667de73 Compare May 11, 2026 07:46
@ktyagiapphelix2u ktyagiapphelix2u force-pushed the ktyagi/SSOPII branch 3 times, most recently from 7fc7ec0 to ebb2f96 Compare May 11, 2026 10:06


@skip_unless_lms
class RedactAndDeleteSocialAuthTest(TestCase):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this test module already uses ddt, consider parameterizing the multiple SSO provider scenarios in test_redact_and_delete_redacts_multiple_sso_providers using @ddt.data / @ddt.unpack to reduce repetitive setup and make it easier to extend with additional providers later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment thread openedx/core/djangoapps/user_api/accounts/utils.py
Copy link
Copy Markdown
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More test fun.

"""
social_auth = self._create_social_auth()

with patch(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] I often (but not always) prefer the @patch decorator, like in this example, because I think it makes the test more readable. The example I provided also happens to be a logging example.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made it more redable and update with the things we carry on with genrally.

extra_data=extra_data,
)

@ddt.data(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like test_retire_user_redacts_sso_pii_before_deletion better where you are testing a user with a single SSO and one with multiple. This is only testing a user with a single SSO, and they just happen to be different SSO, and why would that matter as much?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted back to my original code

delete_indices = [i for i, sql in enumerate(sql_list) if 'DELETE' in sql.upper() and table_key in sql.upper()]
assert update_indices, f'Expected at least one UPDATE on {table}'
assert delete_indices, f'Expected at least one DELETE on {table}'
assert any(update_idx < delete_idx for update_idx in update_indices for delete_idx in delete_indices), (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If _assert_update_before_delete were a class method named assert_update_before_delete, I think it would be reasonable to use it in test_retire_user.py, rather than duplicating it, since they both call the utils.py method.
  • If you were confirming multiple updates/deletes (as was done in the original), this assertion wouldn't work for the following queries:
    • DELETE id 1;
    • UPDATE id 1; # This is an UPDATE before DELETE, but for the wrong id.
    • DELETE id 2;
    • UPDATE id 2;
  1. Here are some possible changes:
  • Add an argument for num_redact_delete_pairs (or something like that) defaulted to 1 (but would be 2 for certain tests).
  • Filter to expected_sql_list which contains all UPDATE and DELETE queries.
  • Assert that the length is num_redact_delete_pairs*2.
  • Loop through pairs, and assert that expected_sql_list[0] and expected_sql_list[1] (first iteration) has UPDATE and then DELETE against the same id.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I moved the SQL assertion into a shared helper in openedx/core/djangoapps/user_api/accounts/tests/test_utils.py, tightened it to validate consecutive UPDATE/DELETE pairs, and reused it from openedx/core/djangoapps/user_api/management/tests/test_retire_user.py.

This verifies that redaction happens in the helper itself, not via the
fallback pre_delete signal.
"""
table_key = table.upper()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the table argument ever changed, or should this just be part of the method? The docstring could then more clearly target social auth, which is all we are using this for.

Suggested change
table_key = table.upper()
table_key = `SOCIAL_AUTH_USERSOCIALAUTH`

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines +175 to +177

This verifies that redaction happens in the helper itself, not via the
fallback pre_delete signal.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion method here doesn't actually know if this is true. I'd leave this out.

Suggested change
This verifies that redaction happens in the helper itself, not via the
fallback pre_delete signal.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

pre_delete.connect(redact_social_auth_pii_before_deletion, sender=UserSocialAuth)


def assert_update_before_delete(sql_list, table='social_auth_usersocialauth'):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you drop this and just call the test_utils helper instead?



@contextmanager
def disconnected_social_auth_redaction_signal():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange to use one pattern here and one in test_utils.py for the same thing. Pick one and be consistent. Unless there is a strong reason to do it differently in each that I am missing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I changed:

In openedx/core/djangoapps/user_api/accounts/tests/test_utils.py, replaced class-wide signal disconnect/reconnect (setUpClass/tearDownClass) with a local context manager:
disconnected_social_auth_redaction_signal()
Updated both social-auth utility tests to run inside:
with disconnected_social_auth_redaction_signal(), CaptureQueriesContext(...)

Why this is better:

Narrower scope (disconnect applies only during the specific test block)
Consistent with openedx/core/djangoapps/user_api/management/tests/test_retire_user.py

@ttak-apphelix
Copy link
Copy Markdown
Contributor

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants