Skip to content

fix(setup-report): avoid query-gas-limit by querying grants per cold->warm pair#1307

Open
redstartechno wants to merge 1 commit into
gonka-ai:upgrade-v0.2.14from
redstartechno:bugfix/setup-report-gas-limit
Open

fix(setup-report): avoid query-gas-limit by querying grants per cold->warm pair#1307
redstartechno wants to merge 1 commit into
gonka-ai:upgrade-v0.2.14from
redstartechno:bugfix/setup-report-gas-limit

Conversation

@redstartechno

Copy link
Copy Markdown

What problem does this solve?

The admin endpoint GET /admin/v1/setup/report reports permissions_granted as UNAVAILABLE on a populated chain. Its checkPermissions step queries the node with authz.GranteeGrants(grantee = warm key), which the Cosmos SDK implements as a scan of the entire authz grant store (grants are keyed granter-first), filtering by grantee in the pagination callback. As the total number of grants on the network grows, that query exceeds the node's query-gas-limit and fails with out-of-gas, so the warm-key permission check can no longer complete. Only the setup report is affected; the node and chain are unaffected (this is a read-only query, not a transaction).

How do you know this is a real problem?

  • decentralized-api/internal/server/admin/setup_report.go calls GranteeGrants(Grantee: warmKeyAddr) and then filters the result in memory with if grant.Granter == coldKeyAddr.
  • In cosmos-sdk v0.53.3 (the pinned version), the grant key layout is granter-first (0x01|granter|grantee|msgType). x/authz/keeper/grpc_query.go implements GranteeGrants as prefix.NewStore(store, GrantKey) — i.e. it iterates and unmarshals every grant in the store and keeps only those whose grantee matches. Cost scales with the total number of grants on chain, not with the grants relevant to this account.
  • The repo's own upgrade handlers note the scale (~2,000 grants at ~100 hosts in app/upgrades/v0_2_12), which is enough to exceed the default query-gas-limit (10,000,000, set in inference-chain/scripts/init-docker.sh).

How does this solve the problem?

It replaces the grantee-wide scan with a pair-scoped query of the exact cold→warm grant:

authzQueryClient.Grants(ctx, &authztypes.QueryGrantsRequest{
    Granter: coldKeyAddr,
    Grantee: warmKeyAddr,
})

Grants(granter, grantee) prefix-scans only the granter|grantee sub-tree (the ~20 grants in InferenceOperationKeyPerms), so query gas no longer grows with the size of the chain. Because the query already constrains both sides, the in-memory grant.Granter == coldKeyAddr filter is removed. This mirrors the keeper's existing HasWarmKeyGrant (inference-chain/x/inference/keeper/devshard_settlement.go), and the chain's own AuthzKeeper interface already exposes only GranterGrants and Grants, never GranteeGrants.

What risks does this introduce? How can we mitigate them?

Low. The check's result is unchanged — the same permissions_granted PASS/FAIL/expiring-soon logic runs over the same set of cold→warm grants; only the underlying query is narrowed. No pagination is needed: a single cold→warm pair holds at most ~20 grants, below the SDK's default page size of 100. No other code path is touched (checkFeegrant already uses a precise point query and is unchanged).

How do you know this PR fixes the problem?

The query changes from an O(all grants on chain) full-store scan to an O(grants for one pair) prefix scan, so it stays well under query-gas-limit regardless of network size, and the out-of-gas condition that produced UNAVAILABLE no longer occurs. The expected check result is identical to what GranteeGrants produced before it began hitting the gas limit.

Which components are affected?

decentralized-api only — internal/server/admin/setup_report.go, function checkPermissions. No chain/state-machine code, no other endpoints.

Testing & evidence

  • gofmt clean.
  • Local toolchain note: this repo's full build requires cgo (the blst BLS dependency) and a C compiler, which I do not have set up locally, so I could not run the full go build/go test here. The change is a single, type-level substitution (GranteeGrantsGrants, with Granter added and the now-redundant granter filter dropped), verified against the cosmos-sdk v0.53.3 authz types and matching the in-repo HasWarmKeyGrant usage. CI/maintainer build will exercise the full compile and tests.
  • Maintainer-verifiable end-to-end: on a chain with many grants, call GET /admin/v1/setup/report and confirm permissions_granted returns PASS/FAIL rather than UNAVAILABLE with an out-of-gas message.

Diff: 1 file, +13 / -10.

@patimen patimen added this to the v0.2.14 milestone Jun 8, 2026
@patimen patimen changed the base branch from main to upgrade-v0.2.14 June 8, 2026 22:49
@patimen

patimen commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

/run-integration

@redstartechno

Copy link
Copy Markdown
Author

Thanks for running integration. Looked into the 3 failing suites — they appear unrelated to this change, which only touches the /admin/v1/setup/report authz query:

Happy to dig further if you see a plausible path from this change to any of those suites.

@patimen

patimen commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Thanks for working on this. I can't really review this in its current shape because the branch/commit history still appears to include unrelated roadmap and devshard gateway scope alongside the setup-report gas-limit fix.

Could you please clean this up or recut it so the PR contains only the setup-report gas-limit change? Once the diff/history is focused, I'll take another look.

@redstartechno redstartechno force-pushed the bugfix/setup-report-gas-limit branch from 65a5181 to d9a4cc6 Compare June 12, 2026 18:25
@redstartechno

Copy link
Copy Markdown
Author

Done — recut the branch onto upgrade-v0.2.14 with only the setup-report gas-limit commit. The unrelated roadmap/devshard scope appeared because the branch was originally cut from main (after #1266 and #1284 merged there) and the PR base later moved to upgrade-v0.2.14, which doesn't include those yet. The diff is now a single commit touching only decentralized-api/internal/server/admin/setup_report.go.

checkPermissions queried authz GranteeGrants(grantee=warmKey), which the
Cosmos SDK implements as a full scan of the entire grant store (grants are
keyed granter-first), filtering by grantee in the callback. On a populated
chain that exceeds the node query-gas-limit and the permissions check fails
with out-of-gas, so /admin/v1/setup/report reports it as UNAVAILABLE. Only
the report is affected; this is a read-only query, not a tx.

Query the exact cold->warm pair with Grants(granter,grantee) instead, which
prefix-scans only that pair (~20 grants) and no longer grows with chain size.
The pair query already constrains both sides, so the in-memory granter filter
is dropped. Mirrors the keeper's HasWarmKeyGrant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants