[AAP-70854] Optimize JWT claims reconciliation performance#979
[AAP-70854] Optimize JWT claims reconciliation performance#979john-westcott-iv wants to merge 2 commits intoansible:develfrom
Conversation
…er concurrency compute_team_member_roles() collects ObjectRole IDs in a read phase then writes them to the M2M provides_teams table. A concurrent committed transaction (e.g. org deletion) can delete those ObjectRoles between the two phases, causing a foreign key violation on dab_rbac_objectrole_provides_teams. Wrap the M2M add() and RoleEvaluation bulk_create() calls in savepoints so IntegrityError from stale FK references is caught without aborting the outer transaction. On failure, re-filter to only IDs that still exist and retry. Co-authored-by: Claude <[email protected]> Made-with: Cursor
save_user_claims() previously executed N individual DB writes and RoleEvaluation cache recomputes when reconciling permissions from Gateway. For users with many role assignments this caused timeouts and occasional permission loss. Changes: - Wrap save_user_claims() in no_reverse_sync() to suppress redundant HTTP calls back to Gateway during stub Org/Team creation - Wrap in transaction.atomic() so partial failures roll back cleanly - Add defer_role_evaluation() context manager that batches all RoleEvaluation cache updates into a single pass on context exit - Add timing and grant/removal counters to save_user_claims() and process_rbac_permissions() for production observability Co-authored-by: Claude <[email protected]> Made-with: Cursor
📝 WalkthroughWalkthroughThe changes introduce timing instrumentation for JWT claim reconciliation, add stale object-role foreign-key error recovery with retry logic in RBAC caching, wrap claims synchronization in transactions with grant/removal tracking, and implement a deferred role-evaluation context manager to batch cache updates. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
DVCS PR Check Results: PR appears valid (JIRA key(s) found) |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ansible_base/rbac/triggers.py`:
- Around line 29-78: The current _deferred_evaluation is process-global and
flushes work on every context exit, causing cross-request races and incorrect
nested behavior; change defer_role_evaluation() to use request-scoped
(thread-local or contextvar) state with a nesting counter (e.g., add a depth
attribute on the state), have inner contexts merge their object_roles and
needs_team_recompute into the parent state instead of running computes, and only
when the outermost context exits successfully (no exception) run
compute_team_member_roles() and
compute_object_role_permissions(object_roles=...) and then clear the state;
update references to _deferred_evaluation in defer_role_evaluation() (and any
callers like update_after_assignment) to use the new thread-local/context-local
state and ensure computes are skipped if an exception occurred in the context.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: ff1120e3-8eeb-4c24-b29b-4a6d2d371467
📒 Files selected for processing (8)
ansible_base/jwt_consumer/common/auth.pyansible_base/rbac/caching.pyansible_base/rbac/claims.pyansible_base/rbac/triggers.pytest_app/tests/jwt_consumer/common/test_auth_timing.pytest_app/tests/rbac/test_caching.pytest_app/tests/rbac/test_defer_role_evaluation.pytest_app/tests/rbac/test_save_user_claims_optimizations.py
| class _DeferredEvaluationState: | ||
| """Tracks whether RoleEvaluation cache updates should be deferred. | ||
|
|
||
| Used by defer_role_evaluation() to collect object_roles that need | ||
| recomputation and process them in a single batch on context exit. | ||
| """ | ||
|
|
||
| def __init__(self): | ||
| self.enabled = False | ||
| self.object_roles = set() | ||
| self.needs_team_recompute = False | ||
|
|
||
|
|
||
| _deferred_evaluation = _DeferredEvaluationState() | ||
|
|
||
|
|
||
| @contextmanager | ||
| def defer_role_evaluation(): | ||
| """Defer RoleEvaluation cache updates until the context exits. | ||
|
|
||
| During bulk operations like save_user_claims(), each give_permission() / | ||
| remove_permission() call triggers compute_object_role_permissions() | ||
| individually. This context manager collects all affected object_roles | ||
| and processes them in a single batch when the context exits, reducing | ||
| N individual compute+write cycles to one. | ||
|
|
||
| Supports nesting by saving and restoring previous state. | ||
| """ | ||
| previous_enabled = _deferred_evaluation.enabled | ||
| previous_roles = _deferred_evaluation.object_roles | ||
| previous_teams = _deferred_evaluation.needs_team_recompute | ||
|
|
||
| _deferred_evaluation.enabled = True | ||
| _deferred_evaluation.object_roles = set() | ||
| _deferred_evaluation.needs_team_recompute = False | ||
| try: | ||
| yield | ||
| finally: | ||
| deferred_roles = _deferred_evaluation.object_roles | ||
| deferred_teams = _deferred_evaluation.needs_team_recompute | ||
|
|
||
| _deferred_evaluation.enabled = previous_enabled | ||
| _deferred_evaluation.object_roles = previous_roles | ||
| _deferred_evaluation.needs_team_recompute = previous_teams | ||
|
|
||
| if deferred_teams: | ||
| compute_team_member_roles() | ||
| if deferred_roles: | ||
| compute_object_role_permissions(object_roles=deferred_roles) | ||
|
|
There was a problem hiding this comment.
Make deferred evaluation state request-scoped and flush only on the outermost successful exit.
_deferred_evaluation is process-global, so one thread can flip enabled while another thread is in update_after_assignment(), causing cross-request batching and skipped recomputes. The current nesting logic also computes on inner exit instead of merging the inner batch back into the outer context, so the advertised nested behavior is broken and exception paths still do work that the surrounding transaction may roll back. Use thread-local/context-local state here and only run compute_* when the outermost context exits without an error.
Suggested fix
+import threading
from contextlib import contextmanager
from typing import Union
from uuid import UUID
@@
-class _DeferredEvaluationState:
+class _DeferredEvaluationState(threading.local):
"""Tracks whether RoleEvaluation cache updates should be deferred.
@@
`@contextmanager`
def defer_role_evaluation():
@@
previous_enabled = _deferred_evaluation.enabled
previous_roles = _deferred_evaluation.object_roles
previous_teams = _deferred_evaluation.needs_team_recompute
@@
+ completed = False
try:
yield
+ completed = True
finally:
deferred_roles = _deferred_evaluation.object_roles
deferred_teams = _deferred_evaluation.needs_team_recompute
_deferred_evaluation.enabled = previous_enabled
_deferred_evaluation.object_roles = previous_roles
_deferred_evaluation.needs_team_recompute = previous_teams
- if deferred_teams:
- compute_team_member_roles()
- if deferred_roles:
- compute_object_role_permissions(object_roles=deferred_roles)
+ if previous_enabled:
+ previous_roles.update(deferred_roles)
+ _deferred_evaluation.needs_team_recompute = previous_teams or deferred_teams
+ elif completed:
+ if deferred_teams:
+ compute_team_member_roles()
+ if deferred_roles:
+ compute_object_role_permissions(object_roles=deferred_roles)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ansible_base/rbac/triggers.py` around lines 29 - 78, The current
_deferred_evaluation is process-global and flushes work on every context exit,
causing cross-request races and incorrect nested behavior; change
defer_role_evaluation() to use request-scoped (thread-local or contextvar) state
with a nesting counter (e.g., add a depth attribute on the state), have inner
contexts merge their object_roles and needs_team_recompute into the parent state
instead of running computes, and only when the outermost context exits
successfully (no exception) run compute_team_member_roles() and
compute_object_role_permissions(object_roles=...) and then clear the state;
update references to _deferred_evaluation in defer_role_evaluation() (and any
callers like update_after_assignment) to use the new thread-local/context-local
state and ensure computes are skipped if an exception occurred in the context.
|
huffmanca
left a comment
There was a problem hiding this comment.
Nice work on this — the batching approach and TOCTOU guards are solid. One thing I noticed:
defer_role_evaluation() runs compute_* in two cases where it doesn't need to
The finally block in defer_role_evaluation() unconditionally calls compute_team_member_roles() / compute_object_role_permissions():
-
On inner context exit when nested — if
previous_enabledisTrue, the inner context flushes its roles immediately instead of merging them into the parent's set. This defeats the batching optimization (though nesting isn't used in any current call site, so it's theoretical today). -
On exception — since
save_user_claims()wraps everything intransaction.atomic(), an exception rolls back the DB changes, butcompute_*still runs against that rolled-back state. The results get corrected on next recompute, but it's throwaway work.
Neither is a correctness issue — compute_* is idempotent and the system self-corrects. But both are straightforward to fix by checking whether to merge vs. flush:
completed = False
try:
yield
completed = True
finally:
deferred_roles = _deferred_evaluation.object_roles
deferred_teams = _deferred_evaluation.needs_team_recompute
_deferred_evaluation.enabled = previous_enabled
_deferred_evaluation.object_roles = previous_roles
_deferred_evaluation.needs_team_recompute = previous_teams
if previous_enabled:
previous_roles.update(deferred_roles)
_deferred_evaluation.needs_team_recompute |= deferred_teams
elif completed:
if deferred_teams:
compute_team_member_roles()
if deferred_roles:
compute_object_role_permissions(object_roles=deferred_roles)
This is bananas. I can hardly believe this was actually happening. This is gargantuan red flag territory. Maybe it would be better to cut out a smaller portion of this patch doing this stuff. Because the rest of it goes really deep, and slapping on |
|
This also seems to directly overlap with my attempt at bulk re-computation here. You did |



Summary
save_user_claims()inno_reverse_sync()to suppress redundant HTTP calls back to Gateway during stub Org/Team creation triggered byget_or_create_resource()post_save signalstransaction.atomic()so partial failures roll back cleanly instead of leaving users in an inconsistent permission state (especially important for services withoutATOMIC_REQUESTS)defer_role_evaluation()context manager intriggers.pythat collects all affectedObjectRoleinstances and processescompute_object_role_permissions()in a single batch on context exit, reducing N individual compute+write cycles to onesave_user_claims()andprocess_rbac_permissions()for production observability of claims reconciliation durationProblem
When a user with many role assignments (e.g., superusers with hundreds of explicit assignments) authenticates via JWT to a service,
save_user_claims()executes N individualgive_permission()/remove_permission()calls. Each call triggers:RoleEvaluationcache recompute (compute_object_role_permissions())post_savesignals from stub Org/Team creation that fire HTTP reverse sync calls back to GatewayThis caused timeouts during authentication and occasional permission loss when reconciliation didn't complete.
Changes
ansible_base/rbac/triggers.py_DeferredEvaluationStateclass anddefer_role_evaluation()context managerupdate_after_assignment()to defer when evaluation is activeansible_base/rbac/claims.pysave_user_claims()now wrapped inno_reverse_sync(),transaction.atomic(), anddefer_role_evaluation()ansible_base/jwt_consumer/common/auth.pyprocess_rbac_permissions(): fetch time, save time, total timeNew Tests
test_app/tests/rbac/test_defer_role_evaluation.py— 14 tests covering context manager state, nesting, exception handling, deferred batching, and integration with give/remove permissiontest_app/tests/rbac/test_save_user_claims_optimizations.py— 6 tests verifying atomicity, reverse sync suppression, deferred evaluation, and loggingtest_app/tests/jwt_consumer/common/test_auth_timing.py— 4 tests verifying timing logs in process_rbac_permissionsTest plan
py312-sqlite)test_claims.py,test_triggers.py, andtest_auth.pytests continue to passNote: This PR was developed with assistance from Claude AI assistant.
Made with Cursor
Summary by CodeRabbit
Release Notes
Bug Fixes
Performance
Tests