fix(setup-report): avoid query-gas-limit by querying grants per cold->warm pair#1307
Conversation
|
/run-integration |
|
Thanks for running integration. Looked into the 3 failing suites — they appear unrelated to this change, which only touches the
Happy to dig further if you see a plausible path from this change to any of those suites. |
|
Thanks for working on this. I can't really review this in its current shape because the branch/commit history still appears to include unrelated roadmap and devshard gateway scope alongside the setup-report gas-limit fix. Could you please clean this up or recut it so the PR contains only the setup-report gas-limit change? Once the diff/history is focused, I'll take another look. |
65a5181 to
d9a4cc6
Compare
|
Done — recut the branch onto |
checkPermissions queried authz GranteeGrants(grantee=warmKey), which the Cosmos SDK implements as a full scan of the entire grant store (grants are keyed granter-first), filtering by grantee in the callback. On a populated chain that exceeds the node query-gas-limit and the permissions check fails with out-of-gas, so /admin/v1/setup/report reports it as UNAVAILABLE. Only the report is affected; this is a read-only query, not a tx. Query the exact cold->warm pair with Grants(granter,grantee) instead, which prefix-scans only that pair (~20 grants) and no longer grows with chain size. The pair query already constrains both sides, so the in-memory granter filter is dropped. Mirrors the keeper's HasWarmKeyGrant.
What problem does this solve?
The admin endpoint
GET /admin/v1/setup/reportreportspermissions_grantedasUNAVAILABLEon a populated chain. ItscheckPermissionsstep queries the node withauthz.GranteeGrants(grantee = warm key), which the Cosmos SDK implements as a scan of the entire authz grant store (grants are keyed granter-first), filtering by grantee in the pagination callback. As the total number of grants on the network grows, that query exceeds the node'squery-gas-limitand fails with out-of-gas, so the warm-key permission check can no longer complete. Only the setup report is affected; the node and chain are unaffected (this is a read-only query, not a transaction).How do you know this is a real problem?
decentralized-api/internal/server/admin/setup_report.gocallsGranteeGrants(Grantee: warmKeyAddr)and then filters the result in memory withif grant.Granter == coldKeyAddr.0x01|granter|grantee|msgType).x/authz/keeper/grpc_query.goimplementsGranteeGrantsasprefix.NewStore(store, GrantKey)— i.e. it iterates and unmarshals every grant in the store and keeps only those whose grantee matches. Cost scales with the total number of grants on chain, not with the grants relevant to this account.app/upgrades/v0_2_12), which is enough to exceed the defaultquery-gas-limit(10,000,000, set ininference-chain/scripts/init-docker.sh).How does this solve the problem?
It replaces the grantee-wide scan with a pair-scoped query of the exact cold→warm grant:
Grants(granter, grantee)prefix-scans only thegranter|granteesub-tree (the ~20 grants inInferenceOperationKeyPerms), so query gas no longer grows with the size of the chain. Because the query already constrains both sides, the in-memorygrant.Granter == coldKeyAddrfilter is removed. This mirrors the keeper's existingHasWarmKeyGrant(inference-chain/x/inference/keeper/devshard_settlement.go), and the chain's ownAuthzKeeperinterface already exposes onlyGranterGrantsandGrants, neverGranteeGrants.What risks does this introduce? How can we mitigate them?
Low. The check's result is unchanged — the same
permissions_grantedPASS/FAIL/expiring-soon logic runs over the same set of cold→warm grants; only the underlying query is narrowed. No pagination is needed: a single cold→warm pair holds at most ~20 grants, below the SDK's default page size of 100. No other code path is touched (checkFeegrantalready uses a precise point query and is unchanged).How do you know this PR fixes the problem?
The query changes from an O(all grants on chain) full-store scan to an O(grants for one pair) prefix scan, so it stays well under
query-gas-limitregardless of network size, and the out-of-gas condition that producedUNAVAILABLEno longer occurs. The expected check result is identical to whatGranteeGrantsproduced before it began hitting the gas limit.Which components are affected?
decentralized-apionly —internal/server/admin/setup_report.go, functioncheckPermissions. No chain/state-machine code, no other endpoints.Testing & evidence
gofmtclean.blstBLS dependency) and a C compiler, which I do not have set up locally, so I could not run the fullgo build/go testhere. The change is a single, type-level substitution (GranteeGrants→Grants, withGranteradded and the now-redundant granter filter dropped), verified against the cosmos-sdk v0.53.3authztypes and matching the in-repoHasWarmKeyGrantusage. CI/maintainer build will exercise the full compile and tests.GET /admin/v1/setup/reportand confirmpermissions_grantedreturns PASS/FAIL rather thanUNAVAILABLEwith an out-of-gas message.Diff: 1 file, +13 / -10.