Skip to content

[#10545] improvement(auth): Batch-load securable objects to eliminate N+1 in loadRolePrivilege#10546

Draft
yuqi1129 wants to merge 1 commit intoapache:mainfrom
yuqi1129:fix-10545-batch-load-securable-objects
Draft

[#10545] improvement(auth): Batch-load securable objects to eliminate N+1 in loadRolePrivilege#10546
yuqi1129 wants to merge 1 commit intoapache:mainfrom
yuqi1129:fix-10545-batch-load-securable-objects

Conversation

@yuqi1129
Copy link
Contributor

What changes were proposed in this pull request?

This PR eliminates the N+1 query pattern in JcasbinAuthorizer.loadRolePrivilege() and fixes a DBCP2 connection pool misconfiguration.

Auth N+1 fix:

  • Added listSecurableObjectsByRoleIds(List<Long>) to SecurableObjectMapper and SecurableObjectBaseSQLProvider, which fetches securable objects for multiple roles in a single WHERE role_id IN (...) query.
  • Added RoleMetaService.batchListSecurableObjectsForRoles(List<Long>) that issues this single batch query and groups results by role ID. Extracted buildSecurableObjectsFromPOs helper reused by both the single and batch paths.
  • Refactored JcasbinAuthorizer.loadRolePrivilege() to collect all unloaded role IDs, call the batch method once, then load policies serially. Removed the per-role CompletableFuture + entityStore.get() pattern.

DBCP2 connection pool fix:

  • minEvictableIdleTimeMillis: 1000ms → 30000ms — prevents connections from being destroyed after 1 second idle, eliminating constant reconnect churn.
  • minIdle: 0 → 5 — keeps a pool of warm connections ready.
  • maxIdle: 5 → 10 — allows more idle connections to be retained.

Query count improvement (cold path):

Before After
Role stubs 1 query 1 query
Role metadata N queries 0 (already in stubs)
Securable objects N queries 1 query
Name resolution T queries T queries
Total 2 + 2N + T 2 + 1 + T

Why are the changes needed?

Fix: #10545

Each call to loadRolePrivilege issued 2N DB queries (one per role for role metadata, one per role for securable objects). With N=5 roles and T=3 object types, that's 16 queries per authorization check. This is a bottleneck under high concurrency or when the cache is cold (e.g., after a restart or TTL expiry in HA deployments).

The DBCP2 misconfiguration caused connection destroy-then-reconnect on nearly every request (1s idle eviction with minIdle=0), adding ~5–20ms latency per request.

Does this PR introduce any user-facing change?

No API changes. The connection pool defaults change slightly (minIdle, maxIdle, minEvictableIdleTimeMillis) which improves performance transparently.

How was this patch tested?

  • TestRoleMetaService — all 33 tests pass (H2 and PostgreSQL backends).
  • TestJcasbinAuthorizer — all 7 tests pass. Updated test to mock RoleMetaService.batchListSecurableObjectsForRoles instead of per-role entityStore.get().

…minate N+1 in loadRolePrivilege

- Add listSecurableObjectsByRoleIds(List<Long>) to SecurableObjectMapper and
  SecurableObjectBaseSQLProvider: fetches securable objects for multiple roles
  in a single WHERE role_id IN (...) query.
- Add RoleMetaService.batchListSecurableObjectsForRoles(List<Long>): issues the
  batch query and groups results by role ID; extracted buildSecurableObjectsFromPOs
  helper reused by both single and batch paths.
- Rewrite JcasbinAuthorizer.loadRolePrivilege(): collect all unloaded role IDs,
  call the batch method once, load policies serially. Removes the per-role
  CompletableFuture + entityStore.get() pattern. Cold-path query count drops
  from 2+2N+T to 2+1+T (where N = roles, T = distinct object types).
- Fix DBCP2 pool: minEvictableIdleTimeMillis 1000ms -> 30000ms, minIdle 0 -> 5,
  maxIdle 5 -> 10. Prevents connection churn that was adding 5-20ms per request.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 25, 2026 12:13
@yuqi1129 yuqi1129 marked this pull request as draft March 25, 2026 12:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets authorization-path performance by removing an N+1 pattern when loading role privileges in JcasbinAuthorizer.loadRolePrivilege(), and improves JDBC connection pool behavior by adjusting DBCP2 idle/eviction defaults.

Changes:

  • Batch-load securable objects for multiple roles via a new listSecurableObjectsByRoleIds(...) mapper/provider path and RoleMetaService.batchListSecurableObjectsForRoles(...).
  • Refactor JcasbinAuthorizer.loadRolePrivilege() to load securable objects in one batch call and then load policies serially.
  • Tune DBCP2 pool settings (minIdle, maxIdle, minEvictableIdleTimeMillis) to reduce reconnect churn.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
server-common/src/test/java/org/apache/gravitino/server/authorization/jcasbin/TestJcasbinAuthorizer.java Updates tests to mock the new batch-loading path for role securable objects.
server-common/src/main/java/org/apache/gravitino/server/authorization/jcasbin/JcasbinAuthorizer.java Replaces per-role loads with a single batch securable-object fetch and constructs full RoleEntity instances for policy loading.
core/src/main/java/org/apache/gravitino/storage/relational/session/SqlSessionFactoryHelper.java Adjusts DBCP2 idle/eviction parameters to keep warm connections and avoid aggressive eviction.
core/src/main/java/org/apache/gravitino/storage/relational/service/RoleMetaService.java Introduces batch API to load securable objects for multiple role IDs and refactors securable-object building into a shared helper.
core/src/main/java/org/apache/gravitino/storage/relational/mapper/provider/base/SecurableObjectBaseSQLProvider.java Adds SQL for WHERE role_id IN (...) batch lookup.
core/src/main/java/org/apache/gravitino/storage/relational/mapper/SecurableObjectSQLProviderFactory.java Exposes the new batch SQL provider method.
core/src/main/java/org/apache/gravitino/storage/relational/mapper/SecurableObjectMapper.java Adds the MyBatis mapper method for batch securable-object listing.
Comments suppressed due to low confidence (1)

core/src/main/java/org/apache/gravitino/storage/relational/service/RoleMetaService.java:447

  • buildSecurableObjectsFromPOs(...) now runs over the combined securable-object set for multiple roles. As written, objectIds can contain many duplicates when different roles refer to the same metadata object, which can bloat the subsequent name-resolution IN (...) queries (and may hit parameter limits). Consider de-duplicating objectIds (while preserving type grouping), e.g., by collecting into a set/distinct() before invoking the lookup function.
    securableObjectPOs.stream()
        .collect(Collectors.groupingBy(SecurableObjectPO::getType))
        .forEach(
            (type, objects) -> {
              List<Long> objectIds =
                  objects.stream()
                      .map(SecurableObjectPO::getMetadataObjectId)
                      .collect(Collectors.toList());

Comment on lines +528 to +529
Map<Long, List<SecurableObject>> secObjsByRoleId =
RoleMetaService.batchListSecurableObjectsForRoles(unloadedRoleIds);
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new batch call to RoleMetaService.batchListSecurableObjectsForRoles(...) can now fail as a single operation, but any exception will be wrapped by AuthorizationRequestContext.loadRole() as RuntimeException("Failed to load role: ", e) without indicating which role(s) were being loaded. Previously the per-role entityStore.get(...) path added role-name context in the thrown exception. Consider catching failures around the batch call and rethrowing with actionable context (e.g., include metalake, username, and the list of unloadedRoleIds).

Suggested change
Map<Long, List<SecurableObject>> secObjsByRoleId =
RoleMetaService.batchListSecurableObjectsForRoles(unloadedRoleIds);
Map<Long, List<SecurableObject>> secObjsByRoleId;
try {
secObjsByRoleId = RoleMetaService.batchListSecurableObjectsForRoles(unloadedRoleIds);
} catch (RuntimeException e) {
throw new RuntimeException(
"Failed to batch-load securable objects for roles " + unloadedRoleIds
+ " of userId " + userId,
e);
}

Copilot uses AI. Check for mistakes.
Comment on lines +342 to +371
/**
* Batch-loads securable objects for multiple roles in a single SQL query and returns a map from
* role ID to the resolved {@link SecurableObject} list. This eliminates the N+1 query pattern
* that occurs when loading securable objects for each role individually.
*
* @param roleIds the list of role IDs to load
* @return a map from role ID to its list of resolved securable objects
*/
@Monitored(
metricsSource = GRAVITINO_RELATIONAL_STORE_METRIC_NAME,
baseMetricName = "batchListSecurableObjectsForRoles")
public static Map<Long, List<SecurableObject>> batchListSecurableObjectsForRoles(
List<Long> roleIds) {
if (roleIds.isEmpty()) {
return ImmutableMap.of();
}
List<SecurableObjectPO> allPOs =
SessionUtils.getWithoutCommit(
SecurableObjectMapper.class, mapper -> mapper.listSecurableObjectsByRoleIds(roleIds));

Map<Long, List<SecurableObjectPO>> byRoleId =
allPOs.stream().collect(Collectors.groupingBy(SecurableObjectPO::getRoleId));

ImmutableMap.Builder<Long, List<SecurableObject>> builder = ImmutableMap.builder();
for (Long roleId : roleIds) {
List<SecurableObjectPO> pos = byRoleId.getOrDefault(roleId, Collections.emptyList());
builder.put(roleId, buildSecurableObjectsFromPOs(pos));
}
return builder.build();
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior (batchListSecurableObjectsForRoles) + new mapper/provider path (listSecurableObjectsByRoleIds) is not covered by any unit/integration test in this repo (no tests reference the new method). Per project guidelines, please add coverage in TestRoleMetaService to validate correctness (multiple role IDs, missing roles returning empty lists) and that it works across backends (H2/PostgreSQL).

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +137 to +139
com.google.common.collect.ImmutableMap.Builder<Long, List<SecurableObject>> result =
com.google.common.collect.ImmutableMap.builder();
for (Long id : ids) {
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using a fully-qualified class name inside the method body (com.google.common.collect.ImmutableMap.Builder). Please add an import for ImmutableMap and reference it directly to match the repo's Java import convention.

Copilot uses AI. Check for mistakes.
Comment on lines 490 to +545
private void loadRolePrivilege(
String metalake, String username, Long userId, AuthorizationRequestContext requestContext) {
requestContext.loadRole(
() -> {
EntityStore entityStore = GravitinoEnv.getInstance().entityStore();
NameIdentifier userNameIdentifier = NameIdentifierUtil.ofUser(metalake, username);
List<RoleEntity> entities;
List<RoleEntity> roleStubs;
try {
entities =
roleStubs =
entityStore
.relationOperations()
.listEntitiesByRelation(
SupportsRelationOperations.Type.ROLE_USER_REL,
userNameIdentifier,
Entity.EntityType.USER);
List<CompletableFuture<Void>> loadRoleFutures = new ArrayList<>();
for (RoleEntity role : entities) {
Long roleId = role.id();
allowEnforcer.addRoleForUser(String.valueOf(userId), String.valueOf(roleId));
denyEnforcer.addRoleForUser(String.valueOf(userId), String.valueOf(roleId));
if (loadedRoles.getIfPresent(roleId) != null) {
continue;
}
CompletableFuture<Void> loadRoleFuture =
CompletableFuture.supplyAsync(
() -> {
try {
return entityStore.get(
NameIdentifierUtil.ofRole(metalake, role.name()),
Entity.EntityType.ROLE,
RoleEntity.class);
} catch (Exception e) {
throw new RuntimeException("Failed to load role: " + role.name(), e);
}
},
executor)
.thenAcceptAsync(
roleEntity -> {
loadPolicyByRoleEntity(roleEntity);
loadedRoles.put(roleId, true);
},
executor);
loadRoleFutures.add(loadRoleFuture);
}
CompletableFuture.allOf(loadRoleFutures.toArray(new CompletableFuture[0])).join();
} catch (IOException e) {
throw new RuntimeException(e);
}

// Register user-role associations in enforcers for all roles.
for (RoleEntity role : roleStubs) {
allowEnforcer.addRoleForUser(String.valueOf(userId), String.valueOf(role.id()));
denyEnforcer.addRoleForUser(String.valueOf(userId), String.valueOf(role.id()));
}

// Collect stubs for roles whose policies have not yet been loaded into the enforcer.
List<RoleEntity> unloadedRoleStubs =
roleStubs.stream()
.filter(role -> loadedRoles.getIfPresent(role.id()) == null)
.collect(Collectors.toList());
if (unloadedRoleStubs.isEmpty()) {
return;
}

// Batch-fetch securable objects for all unloaded roles in a single query,
// eliminating the N+1 pattern of per-role entityStore.get() calls.
List<Long> unloadedRoleIds =
unloadedRoleStubs.stream().map(RoleEntity::id).collect(Collectors.toList());
Map<Long, List<SecurableObject>> secObjsByRoleId =
RoleMetaService.batchListSecurableObjectsForRoles(unloadedRoleIds);

for (RoleEntity stub : unloadedRoleStubs) {
List<SecurableObject> securableObjects =
secObjsByRoleId.getOrDefault(stub.id(), Collections.emptyList());
RoleEntity fullRole =
RoleEntity.builder()
.withId(stub.id())
.withName(stub.name())
.withNamespace(stub.namespace())
.withProperties(stub.properties())
.withAuditInfo(stub.auditInfo())
.withSecurableObjects(securableObjects)
.build();
loadPolicyByRoleEntity(fullRole);
loadedRoles.put(stub.id(), true);
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadRolePrivilege() no longer uses the executor thread pool (the previous CompletableFuture path was removed), but initialize() still creates a fixed thread pool (default 100 threads) and close() shuts it down. This now allocates idle threads for no benefit and increases memory/CPU footprint. Consider removing the executor field + thread pool initialization entirely (and any related test reflection), or reintroduce an async use of it if still needed.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

Code Coverage Report

Overall Project 64.87% +0.1% 🟢
Files changed 80.0% 🟢

Module Coverage
aliyun 1.73% 🔴
api 47.14% 🟢
authorization-common 85.96% 🟢
aws 1.1% 🔴
azure 2.6% 🔴
catalog-common 10.0% 🔴
catalog-fileset 80.02% 🟢
catalog-hive 80.98% 🟢
catalog-jdbc-clickhouse 79.06% 🟢
catalog-jdbc-common 42.89% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.05% 🟢
catalog-jdbc-starrocks 78.27% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 45.07% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 87.15% 🟢
catalog-lakehouse-paimon 77.71% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 77.83% 🟢
common 49.42% 🟢
core 80.95% +0.11% 🟢
filesystem-hadoop3 76.97% 🟢
flink 38.86% 🔴
flink-runtime 0.0% 🔴
gcp 14.2% 🔴
hadoop-common 10.39% 🔴
hive-metastore-common 45.82% 🟢
iceberg-common 50.21% 🟢
iceberg-rest-server 66.62% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 23.88% 🔴
lance-rest-server 57.84% 🟢
lineage 53.02% 🟢
optimizer 82.87% 🟢
optimizer-api 21.95% 🔴
server 85.62% 🟢
server-common 69.67% +0.11% 🟢
spark 32.79% 🔴
spark-common 39.09% 🔴
trino-connector 31.62% 🔴
Files
Module File Coverage
core SqlSessionFactoryHelper.java 95.08% 🟢
RoleMetaService.java 92.24% 🟢
SecurableObjectBaseSQLProvider.java 90.91% 🟢
SecurableObjectSQLProviderFactory.java 90.48% 🟢
SecurableObjectMapper.java 0.0% 🔴
server-common JcasbinAuthorizer.java 70.29% 🟢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Optimize the performance of loadRolePrivilege

2 participants