Skip to content

feat(state): support multi-endpoint reads for resilience #4856

@walldiss

Description

@walldiss

Summary

The TxClient already supports multiple core endpoints for transaction submission via WithAdditionalCoreEndpoints. However, all read operations (balance, delegation, ABCI queries) are hardcoded to the primary endpoint (coreConns[0]). If that endpoint goes down, all reads fail even if additional endpoints are configured and healthy.

Current Behavior

In state/core_access.go, all query clients are initialized on the primary connection only:

ca.stakingCli = stakingtypes.NewQueryClient(ca.coreConns[0])
ca.distributionCli = distributiontypes.NewQueryClient(ca.coreConns[0])
ca.feeGrantCli = feegrant.NewQueryClient(ca.coreConns[0])
ca.abciQueryCli = tmservice.NewServiceClient(ca.coreConns[0])

The additional endpoint infrastructure already exists (AdditionalCoreEndpoints in config, connections created in constructors) — it's just not used for reads.

Possible Approaches

  1. Failover with timeout — use primary by default, retry on next endpoint after failure/timeout
  2. Parallel reads (race) — issue to all endpoints, take first success
  3. Health-check routing — background health checks, route to healthy endpoints
  4. gRPC-level load balancing — use gRPC's built-in round-robin resolver

See also: #4857 for the equivalent gap in the api/client library (DA reads).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions