[AAP-70854] Optimize JWT claims reconciliation performance by john-westcott-iv · Pull Request #979 · ansible/django-ansible-base

john-westcott-iv · 2026-04-13T17:40:09Z

Summary

Wrap save_user_claims() in no_reverse_sync() to suppress redundant HTTP calls back to Gateway during stub Org/Team creation triggered by get_or_create_resource() post_save signals
Wrap in transaction.atomic() so partial failures roll back cleanly instead of leaving users in an inconsistent permission state (especially important for services without ATOMIC_REQUESTS)
Add defer_role_evaluation() context manager in triggers.py that collects all affected ObjectRole instances and processes compute_object_role_permissions() in a single batch on context exit, reducing N individual compute+write cycles to one
Add timing and grant/removal counters to save_user_claims() and process_rbac_permissions() for production observability of claims reconciliation duration

Problem

When a user with many role assignments (e.g., superusers with hundreds of explicit assignments) authenticates via JWT to a service, save_user_claims() executes N individual give_permission()/remove_permission() calls. Each call triggers:

An individual RoleEvaluation cache recompute (compute_object_role_permissions())
Potential post_save signals from stub Org/Team creation that fire HTTP reverse sync calls back to Gateway

This caused timeouts during authentication and occasional permission loss when reconciliation didn't complete.

Changes

`ansible_base/rbac/triggers.py`

New _DeferredEvaluationState class and defer_role_evaluation() context manager
Modified update_after_assignment() to defer when evaluation is active
Supports nesting by saving/restoring previous state

`ansible_base/rbac/claims.py`

save_user_claims() now wrapped in no_reverse_sync(), transaction.atomic(), and defer_role_evaluation()
Added grant/removal counters and timing log

`ansible_base/jwt_consumer/common/auth.py`

Added timing breakdown in process_rbac_permissions(): fetch time, save time, total time

New Tests

test_app/tests/rbac/test_defer_role_evaluation.py — 14 tests covering context manager state, nesting, exception handling, deferred batching, and integration with give/remove permission
test_app/tests/rbac/test_save_user_claims_optimizations.py — 6 tests verifying atomicity, reverse sync suppression, deferred evaluation, and logging
test_app/tests/jwt_consumer/common/test_auth_timing.py — 4 tests verifying timing logs in process_rbac_permissions

Test plan

All 184 targeted tests pass (ran with py312-sqlite)
Existing test_claims.py, test_triggers.py, and test_auth.py tests continue to pass
CI pipeline passes
Verify timing logs appear in service logs during JWT authentication

Note: This PR was developed with assistance from Claude AI assistant.

Made with Cursor

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved robustness of role-based access control (RBAC) operations by detecting and handling stale data conditions in concurrent scenarios.
- Enhanced transaction atomicity for permission management operations.
Performance
- Added timing instrumentation to JWT claims reconciliation and permission synchronization for diagnostic insights.
Tests
- Comprehensive test coverage added for RBAC reliability, caching behavior, and optimization features.

…er concurrency compute_team_member_roles() collects ObjectRole IDs in a read phase then writes them to the M2M provides_teams table. A concurrent committed transaction (e.g. org deletion) can delete those ObjectRoles between the two phases, causing a foreign key violation on dab_rbac_objectrole_provides_teams. Wrap the M2M add() and RoleEvaluation bulk_create() calls in savepoints so IntegrityError from stale FK references is caught without aborting the outer transaction. On failure, re-filter to only IDs that still exist and retry. Co-authored-by: Claude <[email protected]> Made-with: Cursor

save_user_claims() previously executed N individual DB writes and RoleEvaluation cache recomputes when reconciling permissions from Gateway. For users with many role assignments this caused timeouts and occasional permission loss. Changes: - Wrap save_user_claims() in no_reverse_sync() to suppress redundant HTTP calls back to Gateway during stub Org/Team creation - Wrap in transaction.atomic() so partial failures roll back cleanly - Add defer_role_evaluation() context manager that batches all RoleEvaluation cache updates into a single pass on context exit - Add timing and grant/removal counters to save_user_claims() and process_rbac_permissions() for production observability Co-authored-by: Claude <[email protected]> Made-with: Cursor

coderabbitai · 2026-04-13T17:40:26Z

📝 Walkthrough

Walkthrough

The changes introduce timing instrumentation for JWT claim reconciliation, add stale object-role foreign-key error recovery with retry logic in RBAC caching, wrap claims synchronization in transactions with grant/removal tracking, and implement a deferred role-evaluation context manager to batch cache updates.

Changes

Cohort / File(s)	Summary
Timing Instrumentation `ansible_base/jwt_consumer/common/auth.py`	Added `time` module import and monotonic timing tracking (fetch, save, total elapsed) for JWT claims-hash mismatch reconciliation, logged in a single info message after persistence.
RBAC Stale FK Recovery `ansible_base/rbac/caching.py`	Added `_is_stale_objectrole_fk()` detector for PostgreSQL FK constraint violations, plus `_safe_m2m_add()` and `_safe_bulk_create_evaluations()` helpers that wrap operations in transactions, catch stale-FK errors, filter invalid references, and retry; updated `compute_team_member_roles()` and `compute_object_role_permissions()` to use these wrappers.
Transaction-Wrapped Claims Sync `ansible_base/rbac/claims.py`	Wrapped entire `save_user_claims()` RBAC flow in `transaction.atomic()` with context managers `no_reverse_sync()` and `defer_role_evaluation()`; added grant/removal counters and summary logging with elapsed time.
Deferred Role Evaluation `ansible_base/rbac/triggers.py`	Added `defer_role_evaluation()` context manager with internal `_DeferredEvaluationState` singleton; extended `update_after_assignment()` to accumulate object-role updates and team-recompute flags when deferral is active, then batch-execute on context exit.
Auth Timing Tests `test_app/tests/jwt_consumer/common/test_auth_timing.py`	New pytest module validating timing log output from `process_rbac_permissions()` during successful reconciliation; asserts "Claims reconciliation" message includes user ID and timing components; verifies log is absent on cache hits, early returns, or gateway fetch exceptions.
Caching Error Handling Tests `test_app/tests/rbac/test_caching.py`	New module testing stale-FK detection and recovery: validates `_is_stale_objectrole_fk()` for PostgreSQL FK errors, exercises `_safe_m2m_add()` retry-after-filtering and persistent-failure logging, and tests `_safe_bulk_create_evaluations()` selective retry of only valid evaluations.
Deferred Evaluation Tests `test_app/tests/rbac/test_defer_role_evaluation.py`	New module verifying `defer_role_evaluation()` context manager enables/disables deferral, supports nesting, batches `update_after_assignment()` calls, and correctly batch-invokes `compute_team_member_roles()` and `compute_object_role_permissions()` on exit; includes integration tests for `give_permission()` and `remove_permission()` under deferred batching.
Save Claims Optimization Tests `test_app/tests/rbac/test_save_user_claims_optimizations.py`	New module asserting `save_user_claims()` transactional atomicity (no partial writes on exception), verifying `reverse_sync_enabled` and `_deferred_evaluation.enabled` flags are correctly set during execution, and logging exactly one summary message per call with grant/removal counts and elapsed time.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 56.34% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly summarizes the main change: JWT claims reconciliation performance optimization through batching and transaction improvements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-13T17:40:54Z

DVCS PR Check Results:

PR appears valid (JIRA key(s) found)

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ansible_base/rbac/triggers.py`:
- Around line 29-78: The current _deferred_evaluation is process-global and
flushes work on every context exit, causing cross-request races and incorrect
nested behavior; change defer_role_evaluation() to use request-scoped
(thread-local or contextvar) state with a nesting counter (e.g., add a depth
attribute on the state), have inner contexts merge their object_roles and
needs_team_recompute into the parent state instead of running computes, and only
when the outermost context exits successfully (no exception) run
compute_team_member_roles() and
compute_object_role_permissions(object_roles=...) and then clear the state;
update references to _deferred_evaluation in defer_role_evaluation() (and any
callers like update_after_assignment) to use the new thread-local/context-local
state and ensure computes are skipped if an exception occurred in the context.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ff1120e3-8eeb-4c24-b29b-4a6d2d371467

📥 Commits

Reviewing files that changed from the base of the PR and between 20369e1 and 3c26729.

📒 Files selected for processing (8)

ansible_base/jwt_consumer/common/auth.py
ansible_base/rbac/caching.py
ansible_base/rbac/claims.py
ansible_base/rbac/triggers.py
test_app/tests/jwt_consumer/common/test_auth_timing.py
test_app/tests/rbac/test_caching.py
test_app/tests/rbac/test_defer_role_evaluation.py
test_app/tests/rbac/test_save_user_claims_optimizations.py

coderabbitai · 2026-04-13T17:47:53Z

+class _DeferredEvaluationState:
+    """Tracks whether RoleEvaluation cache updates should be deferred.
+
+    Used by defer_role_evaluation() to collect object_roles that need
+    recomputation and process them in a single batch on context exit.
+    """
+
+    def __init__(self):
+        self.enabled = False
+        self.object_roles = set()
+        self.needs_team_recompute = False
+
+
+_deferred_evaluation = _DeferredEvaluationState()
+
+
+@contextmanager
+def defer_role_evaluation():
+    """Defer RoleEvaluation cache updates until the context exits.
+
+    During bulk operations like save_user_claims(), each give_permission() /
+    remove_permission() call triggers compute_object_role_permissions()
+    individually. This context manager collects all affected object_roles
+    and processes them in a single batch when the context exits, reducing
+    N individual compute+write cycles to one.
+
+    Supports nesting by saving and restoring previous state.
+    """
+    previous_enabled = _deferred_evaluation.enabled
+    previous_roles = _deferred_evaluation.object_roles
+    previous_teams = _deferred_evaluation.needs_team_recompute
+
+    _deferred_evaluation.enabled = True
+    _deferred_evaluation.object_roles = set()
+    _deferred_evaluation.needs_team_recompute = False
+    try:
+        yield
+    finally:
+        deferred_roles = _deferred_evaluation.object_roles
+        deferred_teams = _deferred_evaluation.needs_team_recompute
+
+        _deferred_evaluation.enabled = previous_enabled
+        _deferred_evaluation.object_roles = previous_roles
+        _deferred_evaluation.needs_team_recompute = previous_teams
+
+        if deferred_teams:
+            compute_team_member_roles()
+        if deferred_roles:
+            compute_object_role_permissions(object_roles=deferred_roles)
+


⚠️ Potential issue | 🔴 Critical

Make deferred evaluation state request-scoped and flush only on the outermost successful exit.

_deferred_evaluation is process-global, so one thread can flip enabled while another thread is in update_after_assignment(), causing cross-request batching and skipped recomputes. The current nesting logic also computes on inner exit instead of merging the inner batch back into the outer context, so the advertised nested behavior is broken and exception paths still do work that the surrounding transaction may roll back. Use thread-local/context-local state here and only run compute_* when the outermost context exits without an error.

Suggested fix

+import threading from contextlib import contextmanager from typing import Union from uuid import UUID @@ -class _DeferredEvaluationState: +class _DeferredEvaluationState(threading.local): """Tracks whether RoleEvaluation cache updates should be deferred. @@ `@contextmanager` def defer_role_evaluation(): @@ previous_enabled = _deferred_evaluation.enabled previous_roles = _deferred_evaluation.object_roles previous_teams = _deferred_evaluation.needs_team_recompute @@ + completed = False try: yield + completed = True finally: deferred_roles = _deferred_evaluation.object_roles deferred_teams = _deferred_evaluation.needs_team_recompute _deferred_evaluation.enabled = previous_enabled _deferred_evaluation.object_roles = previous_roles _deferred_evaluation.needs_team_recompute = previous_teams - if deferred_teams: - compute_team_member_roles() - if deferred_roles: - compute_object_role_permissions(object_roles=deferred_roles) + if previous_enabled: + previous_roles.update(deferred_roles) + _deferred_evaluation.needs_team_recompute = previous_teams or deferred_teams + elif completed: + if deferred_teams: + compute_team_member_roles() + if deferred_roles: + compute_object_role_permissions(object_roles=deferred_roles)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ansible_base/rbac/triggers.py` around lines 29 - 78, The current _deferred_evaluation is process-global and flushes work on every context exit, causing cross-request races and incorrect nested behavior; change defer_role_evaluation() to use request-scoped (thread-local or contextvar) state with a nesting counter (e.g., add a depth attribute on the state), have inner contexts merge their object_roles and needs_team_recompute into the parent state instead of running computes, and only when the outermost context exits successfully (no exception) run compute_team_member_roles() and compute_object_role_permissions(object_roles=...) and then clear the state; update references to _deferred_evaluation in defer_role_evaluation() (and any callers like update_after_assignment) to use the new thread-local/context-local state and ensure computes are skipped if an exception occurred in the context.

sonarqubecloud · 2026-04-13T17:54:20Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
97.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

huffmanca

Nice work on this — the batching approach and TOCTOU guards are solid. One thing I noticed:

`defer_role_evaluation()` runs `compute_*` in two cases where it doesn't need to

The finally block in defer_role_evaluation() unconditionally calls compute_team_member_roles() / compute_object_role_permissions():

On inner context exit when nested — if previous_enabled is True, the inner context flushes its roles immediately instead of merging them into the parent's set. This defeats the batching optimization (though nesting isn't used in any current call site, so it's theoretical today).
On exception — since save_user_claims() wraps everything in transaction.atomic(), an exception rolls back the DB changes, but compute_* still runs against that rolled-back state. The results get corrected on next recompute, but it's throwaway work.

Neither is a correctness issue — compute_* is idempotent and the system self-corrects. But both are straightforward to fix by checking whether to merge vs. flush:

completed = False
try:
    yield
    completed = True
finally:
    deferred_roles = _deferred_evaluation.object_roles
    deferred_teams = _deferred_evaluation.needs_team_recompute
    _deferred_evaluation.enabled = previous_enabled
    _deferred_evaluation.object_roles = previous_roles
    _deferred_evaluation.needs_team_recompute = previous_teams

    if previous_enabled:
        previous_roles.update(deferred_roles)
        _deferred_evaluation.needs_team_recompute |= deferred_teams
    elif completed:
        if deferred_teams:
            compute_team_member_roles()
        if deferred_roles:
            compute_object_role_permissions(object_roles=deferred_roles)

AlanCoding · 2026-04-14T00:27:49Z

Wrap save_user_claims() in no_reverse_sync() to suppress redundant HTTP calls back to Gateway during stub Org/Team creation triggered by get_or_create_resource() post_save signals

This is bananas. I can hardly believe this was actually happening. This is gargantuan red flag territory.

Maybe it would be better to cut out a smaller portion of this patch doing this stuff. Because the rest of it goes really deep, and slapping on no_reverse_sync is super easy to get moving, and would be good to measure improvement for independent of the rest. In terms of priority of the patch, there's an extremely clear cliff here, so splitting it up is often a good idea in those cases.

AlanCoding · 2026-04-14T17:03:10Z

This also seems to directly overlap with my attempt at bulk re-computation here.

#840

You did defer_role_evaluation(), compare to bulk_rbac_caching() there. These are mostly trying to do the same thing, but there are 1 or 2 extremely critical logical differences. The other one throws up its hands and does the global compute_object_role_permissions() if any team roles change, whereas yours doesn't. That, and other things, need more thought about what gets parity with what already exists as the incremental update.

john-westcott-iv and others added 2 commits March 30, 2026 13:46

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

huffmanca reviewed Apr 13, 2026

View reviewed changes

AlanCoding mentioned this pull request Apr 15, 2026

AAP-72039 Introduce partial-recomputation of team permissions #980

Draft

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AAP-70854] Optimize JWT claims reconciliation performance#979

[AAP-70854] Optimize JWT claims reconciliation performance#979
john-westcott-iv wants to merge 2 commits intoansible:develfrom
john-westcott-iv:AAP-70854-jwt-claims

john-westcott-iv commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 13, 2026

Uh oh!

sonarqubecloud Bot commented Apr 13, 2026

Uh oh!

huffmanca left a comment

Uh oh!

AlanCoding commented Apr 14, 2026

Uh oh!

AlanCoding commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

john-westcott-iv commented Apr 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

ansible_base/rbac/triggers.py

ansible_base/rbac/claims.py

ansible_base/jwt_consumer/common/auth.py

New Tests

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Apr 13, 2026

Quality Gate passed

Uh oh!

huffmanca left a comment

Choose a reason for hiding this comment

defer_role_evaluation() runs compute_* in two cases where it doesn't need to

Uh oh!

AlanCoding commented Apr 14, 2026

Uh oh!

AlanCoding commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

john-westcott-iv commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

`ansible_base/rbac/triggers.py`

`ansible_base/rbac/claims.py`

`ansible_base/jwt_consumer/common/auth.py`

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

`defer_role_evaluation()` runs `compute_*` in two cases where it doesn't need to