feat(signer): add health monitoring and automatic failover for signer pool#399
feat(signer): add health monitoring and automatic failover for signer pool#399raushan728 wants to merge 3 commits intosolana-foundation:mainfrom
Conversation
Greptile SummaryThis PR adds per-signer health tracking to Key observations:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant versioned_transaction
participant SignerPool
participant SignerWithMetadata
participant Signer
Caller->>versioned_transaction: sign_transaction(signer)
versioned_transaction->>Signer: sign_message(message_bytes)
alt Signing succeeds
Signer-->>versioned_transaction: Ok(signature)
versioned_transaction->>SignerPool: record_signing_success(signer)
SignerPool->>SignerWithMetadata: record_success()
Note over SignerWithMetadata: consecutive_failures = 0\nis_healthy = true
versioned_transaction-->>Caller: Ok(transaction, encoded)
else Signing fails
Signer-->>versioned_transaction: Err(e)
versioned_transaction->>SignerPool: record_signing_failure(signer)
SignerPool->>SignerWithMetadata: record_failure()
Note over SignerWithMetadata: consecutive_failures += 1\nif >= 3: is_healthy = false
versioned_transaction-->>Caller: Err(SigningError)
end
Caller->>SignerPool: get_next_signer()
SignerPool->>SignerPool: healthy_signers()
Note over SignerPool: Filters out is_healthy=false\nFallback to full pool if all unhealthy
SignerPool-->>Caller: Arc<Signer>
|
Remote signers communicate over HTTP and can fail intermittently. Previously,
SignerPoolhad no mechanism to detect degraded signers — it would keep routingrequests to a failing signer indefinitely.
This PR adds per-signer health tracking with automatic failover:
get_signer_by_pubkey) now also respect the recovery probe (fail only if cooldown not elapsed)