Skip to content

Commit 513ce9b

Browse files
authored
fix(tools/talis): fibre-experiment race fixes & loadgen tooling (#3305)
* fix(tools/talis): wait-for-chain + atomic keyring + one-command driver Three race conditions surfaced repeatedly on a fresh AWS bring-up of the Fibre throughput experiment. Each one had the same shape: a talis subcommand "succeeded" at the CLI level (or returned the txhash with --yes) before the chain had actually applied the work, leaving downstream steps to fail in confusing ways. This commit makes each step verify *outcome*, not just *invocation*, so the experiment can go from a fresh `talis up` to a running loadgen without manual intervention. • setup-fibre script (fibre_setup.go) now: - polls `celestia-appd status` for `latest_block_height>0` before submitting any tx — fixes the silent-noop where set-host + 100× deposit-to-escrow all bounced with "celestia-app is not ready; please wait for first block"; - retries `set-host` in a loop until the validator's host shows up in `query valaddr providers` — fixes the case where --yes returns the txhash before block inclusion and the tx silently lands in the mempool but never confirms; - verifies fibre-0's escrow account is funded on-chain before the tmux session exits — same silent-failure mode as set-host, but on the deposit side. The talis-CLI step also now cross-checks all validators are registered from a single vantage point before returning, so a concurrent set-host race surfaces as an error instead of a half-empty provider list start-fibre would cache forever. • fibre-bootstrap-evnode (fibre_bootstrap_evnode.go) now stages the keyring scp into a tmp directory and `mv`s it atomically into place. The previous direct `scp -r` to /root/keyring-fibre/keyring-test created the directory before transferring its contents — the evnode init script's `[ -d keyring-test ]` poll passed mid-transfer, the daemon launched with no fibre-0.info, and crashed with `keyring entry "fibre-0" not found`. • evnode_init.sh (genesis.go) now waits for the specific keyring-test/fibre-0.info file rather than just the keyring-test directory. Belt-and-braces: the bootstrap mv is already atomic on the same filesystem, but the file-level guard means a hand-pushed keyring (not via talis) can't trip the same race. • New `talis fibre-experiment` umbrella command runs up → genesis → deploy → setup-fibre → start-fibre → fibre-bootstrap-evnode in order. Each step uses the same binary as a subprocess; failures in any step abort the chain. Operator goes from a prepared root dir to a running loadgen with one command, instead of remembering the sequence. Verified by 5-min sustained loadgen against julien/fiber HEAD with PR #3287 (concurrent submitter) merged: 47.65 MB/s @ 99.999 % ok, up from the prior 24.57 MB/s baseline (the gap is PR #3287's overlapping uploads — these talis fixes just stop the deploy from silently breaking before throughput matters). * fix(tools/talis): finalize fibre setup race fixes Three follow-up bugs surfaced from the PR #3303 follow-up verification run on a 3-validator AWS Fibre cluster: - aws.go: CreateAWSInstances exited 0 even when individual instance launches failed, so `talis up` lied about success and downstream steps proceeded against a partial cluster. Returns a joined error now so failure cascades stop early. - download.go: sshExec used cmd.CombinedOutput, mixing SSH warnings (the "Warning: Permanently added '...'..." chatter on stderr) into bytes the caller hands to fmt.Sscanf("%d"). The CLI-side providers cross-check parsed those warnings as 0 and looped until its 5-min deadline even though a direct SSH query showed all 3 providers registered. Switch to cmd.Output() (stdout only) and add `-q -o LogLevel=ERROR` to silence the chatter for any caller that does combine streams. - fibre_setup.go: the per-validator escrow verification used `celestia-appd query fibre escrow` which doesn't exist — the actual subcommand is `escrow-account`. The query errored on every retry, the grep for "amount" never matched, and the script wedged on the 3-min deadline reporting `FATAL: fibre-0 escrow not present`. Switch to `escrow-account` and key on `"found":true` (the explicit existence flag in the response). Also wrap the fibre-0 deposit-to-escrow itself in a retry loop matching set-host — same `--yes`-returns-before-inclusion silent-failure mode bit it. fibre-1..N stay best-effort. * feat(evnode-txsim): keep-alive conn pool + pprof endpoint Two diagnostic improvements for the load generator: 1. http.Transport.MaxIdleConnsPerHost defaults to 2 in stdlib. With --concurrency=8 (or higher), 6+ goroutines per cycle had to open fresh TCP+TLS sockets per request because the pool couldn't hold their idle conns between requests. Bump MaxIdleConns / MaxIdleConnsPerHost / MaxConnsPerHost to 2*concurrency so every active sender has a reusable keep-alive socket, eliminating handshake churn from the hot path. 2. Always-on net/http/pprof on 127.0.0.1:6060. evnode-txsim is a load tester, not a production daemon, so cost of always serving profiling is acceptable; the payoff is being able to grab CPU profiles under live load without re-deploying the binary — `ssh -L 6060:127.0.0.1:6060 root@loadgen \ go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30`. A profile captured this way under c=8 traced the per-request hot path: 25.5% in kernel write(2), 25% in net/http body marshaling. That diagnostic surfaced that the c6in.2xlarge loadgen was the binding constraint for the experiment at ~22 MB/s, not evnode or DA — a finding we'd have spent another debug round chasing without the in-process profiler.
1 parent d5f981c commit 513ce9b

8 files changed

Lines changed: 297 additions & 16 deletions

File tree

tools/talis/aws.go

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -350,16 +350,28 @@ func CreateAWSInstances(ctx context.Context, insts []Instance, sshKey, keyName s
350350
close(results)
351351
}()
352352

353-
var created []Instance
353+
var (
354+
created []Instance
355+
failures []string
356+
)
354357
for res := range results {
355358
if res.err != nil {
356359
fmt.Printf("❌ %s failed after %v %v\n", res.inst.Name, res.timeRequired, res.err)
360+
failures = append(failures, fmt.Sprintf("%s: %v", res.inst.Name, res.err))
357361
} else {
358362
created = append(created, res.inst)
359363
fmt.Printf("✅ %s is up (public=%s) in %v\n", res.inst.Name, res.inst.PublicIP, res.timeRequired)
360364
}
361365
fmt.Printf("---- Progress: %d/%d\n", len(created), total)
362366
}
367+
if len(failures) > 0 {
368+
// Surface partial-failure as an error so `talis up` exits
369+
// non-zero; without this, downstream genesis runs against a
370+
// half-provisioned config and fails much later with confusing
371+
// "X has no public IP yet" messages.
372+
return created, fmt.Errorf("%d/%d instance(s) failed to launch: %s",
373+
len(failures), total, strings.Join(failures, "; "))
374+
}
363375
return created, nil
364376
}
365377

tools/talis/cmd/evnode-txsim/main.go

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ import (
2626
"fmt"
2727
"io"
2828
"net/http"
29+
_ "net/http/pprof"
2930
"os"
3031
"os/signal"
3132
"sort"
@@ -113,7 +114,27 @@ func run(cli cliFlags) error {
113114
return fmt.Errorf("seed random pool: %w", err)
114115
}
115116

116-
httpClient := &http.Client{Timeout: cli.timeout}
117+
// Bump per-host idle connections so concurrent goroutines reuse
118+
// keep-alive sockets instead of churning TCP+TLS handshakes —
119+
// stdlib default MaxIdleConnsPerHost=2 caps in-flight requests
120+
// to 2 keep-alive sockets per target, which serializes any
121+
// concurrency>2 onto fresh connections each request.
122+
transport := http.DefaultTransport.(*http.Transport).Clone()
123+
transport.MaxIdleConns = 2 * cli.concurrency
124+
transport.MaxIdleConnsPerHost = 2 * cli.concurrency
125+
transport.MaxConnsPerHost = 2 * cli.concurrency
126+
httpClient := &http.Client{Timeout: cli.timeout, Transport: transport}
127+
128+
// pprof on a dedicated listener — `_ "net/http/pprof"` registers
129+
// handlers on http.DefaultServeMux. Always-on at 127.0.0.1:6060
130+
// since this is a load-tester binary, not a production daemon;
131+
// SSH port-forward to grab profiles under load:
132+
//
133+
// ssh -L 6060:127.0.0.1:6060 root@loadgen \
134+
// go tool pprof http://localhost:6060/debug/pprof/profile?seconds=10
135+
go func() {
136+
_ = http.ListenAndServe("127.0.0.1:6060", nil)
137+
}()
117138

118139
ctx, cancel := context.WithCancel(context.Background())
119140
if cli.duration > 0 {

tools/talis/download.go

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -193,16 +193,26 @@ func compressAndDownload(table, localPath, user, host, sshKeyPath string) error
193193
return nil
194194
}
195195

196-
// sshExec runs a command on a remote host via SSH and returns the combined output.
196+
// sshExec runs a command on a remote host via SSH and returns stdout only.
197+
//
198+
// We intentionally do NOT use CombinedOutput here. ssh prints connection
199+
// chatter ("Warning: Permanently added '...' to the list of known hosts.")
200+
// on stderr, and a previous `CombinedOutput` revision caused
201+
// `fmt.Sscanf(out, "%d")` parses to silently return 0 because the leading
202+
// stderr line had no digits. Capturing only stdout keeps numeric output
203+
// parseable; -q + LogLevel=ERROR further suppresses the chatter for any
204+
// caller that does combine streams.
197205
func sshExec(user, host, sshKeyPath, command string) ([]byte, error) {
198206
cmd := exec.Command("ssh",
207+
"-q",
208+
"-o", "LogLevel=ERROR",
199209
"-o", "StrictHostKeyChecking=no",
200210
"-o", "UserKnownHostsFile=/dev/null",
201211
"-i", sshKeyPath,
202212
fmt.Sprintf("%s@%s", user, host),
203213
command,
204214
)
205-
return cmd.CombinedOutput()
215+
return cmd.Output()
206216
}
207217

208218
func sftpDownload(remotePath, localPath, user, host, sshKeyPath string) error {

tools/talis/fibre_bootstrap_evnode.go

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -125,21 +125,45 @@ to fetch, then SCPs to each evnode-*.`,
125125
defer wg.Done()
126126
log.Printf("[%s] pushing JWT + keyring", ev.Name)
127127

128+
// JWT is small + atomic on the receive side because
129+
// it's a single file, so we push it directly.
128130
if err := scpToRemote(sshUser, ev.PublicIP, sshKeyPath, localJWT, "/root/bridge-jwt.txt", false); err != nil {
129131
errCh <- fmt.Errorf("[%s] push JWT: %w", ev.Name, err)
130132
return
131133
}
132134

133-
// mkdir the parent so scp lands at the exact path
134-
// evnode_init.sh waits for.
135-
if _, err := sshExec(sshUser, ev.PublicIP, sshKeyPath, "mkdir -p /root/keyring-fibre && rm -rf /root/keyring-fibre/keyring-test"); err != nil {
136-
errCh <- fmt.Errorf("[%s] mkdir keyring-fibre: %w", ev.Name, err)
135+
// Keyring push is staged through a tmp dir and
136+
// promoted via mv. Without staging, evnode_init.sh's
137+
// poll loop (which tests `[ -d keyring-test ]`)
138+
// passes the moment scp -r mkdir's the directory,
139+
// long before fibre-0.info is on disk. evnode then
140+
// launches mid-scp and dies with `keyring entry
141+
// "fibre-0" not found`. mv is atomic on the same
142+
// filesystem so the init script either sees nothing
143+
// (keep waiting) or the fully-populated dir (start
144+
// the daemon cleanly).
145+
stageDir := "/root/.keyring-fibre.staging"
146+
prep := fmt.Sprintf(
147+
"rm -rf %s && mkdir -p %s && mkdir -p /root/keyring-fibre && rm -rf /root/keyring-fibre/keyring-test",
148+
stageDir, stageDir,
149+
)
150+
if _, err := sshExec(sshUser, ev.PublicIP, sshKeyPath, prep); err != nil {
151+
errCh <- fmt.Errorf("[%s] stage keyring: %w", ev.Name, err)
137152
return
138153
}
139-
if err := scpToRemote(sshUser, ev.PublicIP, sshKeyPath, filepath.Join(localKeyringRoot, "keyring-test"), "/root/keyring-fibre/keyring-test", true); err != nil {
154+
stageDest := stageDir + "/keyring-test"
155+
if err := scpToRemote(sshUser, ev.PublicIP, sshKeyPath, filepath.Join(localKeyringRoot, "keyring-test"), stageDest, true); err != nil {
140156
errCh <- fmt.Errorf("[%s] push keyring: %w", ev.Name, err)
141157
return
142158
}
159+
promote := fmt.Sprintf(
160+
"mv %s /root/keyring-fibre/keyring-test && rmdir %s",
161+
stageDest, stageDir,
162+
)
163+
if _, err := sshExec(sshUser, ev.PublicIP, sshKeyPath, promote); err != nil {
164+
errCh <- fmt.Errorf("[%s] promote keyring: %w", ev.Name, err)
165+
return
166+
}
143167

144168
log.Printf("[%s] ✓ pushed; daemon should start within ~10s", ev.Name)
145169
}(ev)

tools/talis/fibre_experiment.go

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
package main
2+
3+
import (
4+
"fmt"
5+
"os"
6+
"os/exec"
7+
8+
"github.com/spf13/cobra"
9+
)
10+
11+
// fibreExperimentCmd is the one-command driver for the Fibre throughput
12+
// experiment. It assumes the operator has already populated the
13+
// experiment directory with a config.json + scripts/ + base config.toml +
14+
// app.toml (i.e. ran `talis init` + `talis add` for validators / bridge
15+
// / evnode / loadgen) and that build artefacts are at $rootDir/build.
16+
//
17+
// It then invokes — in order — the same subcommands the operator would
18+
// run by hand:
19+
//
20+
// 1. up — provision instances
21+
// 2. genesis -b <build> — stage validator/bridge/evnode/loadgen payloads
22+
// 3. deploy — ship payloads + start init scripts
23+
// 4. setup-fibre — register host + deposit escrow on each validator
24+
// 5. start-fibre — launch the fibre server on each validator
25+
// 6. fibre-bootstrap-evnode — scp bridge JWT + fibre keyring onto evnode-*
26+
//
27+
// Each step is invoked via os/exec on the running binary. Any failure
28+
// surfaces immediately; nothing is retried at this layer (the
29+
// individual subcommands handle their own waits + retries).
30+
//
31+
// After step 6 returns, evnode-* daemons start within ~10 s and the
32+
// load-gen's init script auto-launches evnode-txsim. The operator
33+
// reads the final TXSIM: line from the load-gen.
34+
func fibreExperimentCmd() *cobra.Command {
35+
var (
36+
rootDir string
37+
buildDir string
38+
)
39+
40+
cmd := &cobra.Command{
41+
Use: "fibre-experiment",
42+
Short: "End-to-end driver: up → genesis → deploy → setup-fibre → start-fibre → fibre-bootstrap-evnode",
43+
Long: `Run every step needed to bring up a Fibre throughput experiment from a
44+
prepared root directory. Equivalent to invoking each subcommand in
45+
sequence; included so the operator doesn't have to remember the order
46+
or watch for inter-step races.`,
47+
RunE: func(cmd *cobra.Command, args []string) error {
48+
self, err := os.Executable()
49+
if err != nil {
50+
return fmt.Errorf("locate own binary: %w", err)
51+
}
52+
53+
steps := []struct {
54+
name string
55+
args []string
56+
}{
57+
{"up", []string{"up", "-d", rootDir}},
58+
{"genesis", []string{"genesis", "-d", rootDir, "-b", buildDir}},
59+
{"deploy", []string{"deploy", "-d", rootDir}},
60+
{"setup-fibre", []string{"setup-fibre", "-d", rootDir}},
61+
{"start-fibre", []string{"start-fibre", "-d", rootDir}},
62+
{"fibre-bootstrap-evnode", []string{"fibre-bootstrap-evnode", "-d", rootDir}},
63+
}
64+
65+
for _, s := range steps {
66+
fmt.Printf("\n=== talis %s ===\n", s.name)
67+
c := exec.Command(self, s.args...)
68+
c.Stdout = os.Stdout
69+
c.Stderr = os.Stderr
70+
c.Env = os.Environ()
71+
if err := c.Run(); err != nil {
72+
return fmt.Errorf("step %q failed: %w", s.name, err)
73+
}
74+
}
75+
76+
fmt.Println()
77+
fmt.Println("=== fibre-experiment complete ===")
78+
fmt.Println("evnode aggregator(s) start within ~10 s and load-gen init")
79+
fmt.Println("scripts auto-launch evnode-txsim once evnode's /stats responds.")
80+
fmt.Println("Final TXSIM: line lands at /root/txsim.log on each load-gen host.")
81+
return nil
82+
},
83+
}
84+
85+
cmd.Flags().StringVarP(&rootDir, "directory", "d", ".", "experiment root directory")
86+
cmd.Flags().StringVarP(&buildDir, "build-dir", "b", "./build", "directory containing the cross-compiled linux/amd64 binaries")
87+
88+
return cmd
89+
}

tools/talis/fibre_setup.go

Lines changed: 121 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -51,20 +51,104 @@ func setupFibreCmd() *cobra.Command {
5151
// Build script: register host + deposit escrow for validator + all fibre accounts
5252
var sb strings.Builder
5353

54+
// 0. Block until the chain has produced at least one block.
55+
// Without this, the very next tx returns
56+
// `celestia-app is not ready; please wait for first block`
57+
// from the local node — the call appears to succeed at
58+
// the CLI level (`--yes` returns the txhash before block
59+
// inclusion), but the tx never lands. Polling explicitly
60+
// avoids the `sleep 10` heuristic that used to be here.
61+
sb.WriteString(
62+
"echo 'waiting for chain to produce first block...'\n" +
63+
"DEADLINE=$(( $(date +%s) + 300 ))\n" +
64+
"while true; do\n" +
65+
" H=$(celestia-appd status 2>/dev/null | " +
66+
" grep -oE '\"latest_block_height\":\"[0-9]+\"' | " +
67+
" grep -oE '[0-9]+' | head -1)\n" +
68+
" if [ -n \"$H\" ] && [ \"$H\" -gt 0 ]; then\n" +
69+
" echo \"chain is at height $H\"\n" +
70+
" break\n" +
71+
" fi\n" +
72+
" if [ $(date +%s) -gt $DEADLINE ]; then\n" +
73+
" echo 'FATAL: chain never produced a block within 5m' >&2\n" +
74+
" exit 1\n" +
75+
" fi\n" +
76+
" sleep 3\n" +
77+
"done\n",
78+
)
79+
5480
// 1. Register fibre host address. Plain `host:port` form —
5581
// x/valaddr requires it; the gRPC client dials it via the
5682
// passthrough resolver. Don't prefix `dns:///` here.
83+
//
84+
// Retry until `query valaddr providers` shows OUR host
85+
// — `--yes` returns the txhash before inclusion, so a
86+
// single one-shot call can succeed at the RPC layer
87+
// while the chain rejects the tx (mempool full, signer
88+
// not yet in validator set, …) and we'd never know.
89+
// 5-minute deadline so a stuck chain doesn't loop
90+
// forever.
5791
sb.WriteString(fmt.Sprintf(
58-
"celestia-appd tx valaddr set-host %s:%d "+
92+
"HOST=%s:%d\n"+
93+
"DEADLINE=$(( $(date +%%s) + 300 ))\n"+
94+
"while true; do\n"+
95+
" celestia-appd tx valaddr set-host \"$HOST\" "+
5996
"--from validator --keyring-backend=test --home .celestia-app "+
60-
"--chain-id %s --fees %s --yes\n",
97+
"--chain-id %s --fees %s --yes >/dev/null 2>&1 || true\n"+
98+
" sleep 6\n"+
99+
" if celestia-appd query valaddr providers --chain-id %s -o json 2>/dev/null \\\n"+
100+
" | grep -q \"\\\"host\\\": *\\\"$HOST\\\"\"; then\n"+
101+
" echo \"set-host confirmed: $HOST\"\n"+
102+
" break\n"+
103+
" fi\n"+
104+
" if [ $(date +%%s) -gt $DEADLINE ]; then\n"+
105+
" echo \"FATAL: set-host did not register $HOST after 5m\" >&2\n"+
106+
" exit 1\n"+
107+
" fi\n"+
108+
" echo 'set-host pending, retrying...'\n"+
109+
"done\n",
61110
val.PublicIP, fibrePort,
62111
cfg.ChainID, fees,
112+
cfg.ChainID,
113+
))
114+
115+
// 2. Deposit escrow for fibre-0 inside a retry loop.
116+
// Same silent-failure mode as set-host: `--yes` returns
117+
// the txhash before inclusion, so a single bounced tx
118+
// (mempool full, signer not yet propagated, …) leaves
119+
// the runner failing every upload with
120+
// `escrow account not found for signer …`. fibre-0 is
121+
// the one the runner actually signs with by default,
122+
// so it's the only one we hard-block on.
123+
sb.WriteString(fmt.Sprintf(
124+
"FIBRE0_ADDR=$(celestia-appd keys show fibre-0 --keyring-backend test --home .celestia-app -a)\n"+
125+
"DEADLINE=$(( $(date +%%s) + 300 ))\n"+
126+
"while true; do\n"+
127+
" celestia-appd tx fibre deposit-to-escrow %s "+
128+
"--from fibre-0 --keyring-backend=test --home .celestia-app "+
129+
"--chain-id %s --fees %s --yes >/dev/null 2>&1 || true\n"+
130+
" sleep 6\n"+
131+
" if celestia-appd query fibre escrow-account \"$FIBRE0_ADDR\" --chain-id %s -o json 2>/dev/null \\\n"+
132+
" | grep -q '\"found\":true'; then\n"+
133+
" echo \"escrow confirmed for fibre-0 ($FIBRE0_ADDR)\"\n"+
134+
" break\n"+
135+
" fi\n"+
136+
" if [ $(date +%%s) -gt $DEADLINE ]; then\n"+
137+
" echo \"FATAL: fibre-0 escrow did not land after 5m\" >&2\n"+
138+
" exit 1\n"+
139+
" fi\n"+
140+
" echo 'fibre-0 escrow pending, retrying...'\n"+
141+
"done\n",
142+
escrowAmount,
143+
cfg.ChainID, fees,
144+
cfg.ChainID,
63145
))
64-
sb.WriteString("sleep 10\n")
65146

66-
// 2. Deposit escrow for each fibre worker account
67-
for i := range fibreAccounts {
147+
// 3. Best-effort fund fibre-1..N. The runner only signs
148+
// with fibre-0 by default, so a missing one of these
149+
// doesn't block uploads — they exist as headroom for
150+
// future signer rotation.
151+
for i := 1; i < fibreAccounts; i++ {
68152
keyName := fmt.Sprintf("fibre-%d", i)
69153
sb.WriteString(fmt.Sprintf(
70154
"celestia-appd tx fibre deposit-to-escrow %s "+
@@ -103,6 +187,38 @@ func setupFibreCmd() *cobra.Command {
103187
if err := waitForTmuxSessions(cfg.Validators, resolvedSSHKeyPath, SetupFibreSessionName, 10*time.Minute); err != nil {
104188
return fmt.Errorf("waiting for setup-fibre sessions: %w", err)
105189
}
190+
191+
// CLI-side verification that every validator's host is on
192+
// the chain's provider list before we hand off to start-
193+
// fibre / fibre-bootstrap-evnode. The per-validator script
194+
// above already self-verifies its own host, but we
195+
// re-check here from a single vantage point so a
196+
// concurrent set-host race across validators surfaces
197+
// before downstream steps cache an empty registry.
198+
if len(cfg.Validators) > 0 {
199+
expected := len(cfg.Validators)
200+
queryHost := cfg.Validators[0].PublicIP
201+
queryCmd := fmt.Sprintf(
202+
"celestia-appd query valaddr providers --chain-id %s -o json 2>/dev/null | grep -o '\"host\"' | wc -l",
203+
cfg.ChainID,
204+
)
205+
deadline := time.Now().Add(5 * time.Minute)
206+
for {
207+
out, err := sshExec("root", queryHost, resolvedSSHKeyPath, queryCmd)
208+
if err == nil {
209+
count := 0
210+
_, _ = fmt.Sscanf(strings.TrimSpace(string(out)), "%d", &count)
211+
if count >= expected {
212+
break
213+
}
214+
fmt.Printf(" valaddr providers: %d/%d registered, retrying...\n", count, expected)
215+
}
216+
if time.Now().After(deadline) {
217+
return fmt.Errorf("only some validators registered as fibre providers within 5m — re-run setup-fibre")
218+
}
219+
time.Sleep(5 * time.Second)
220+
}
221+
}
106222
fmt.Println("Validator setup done!")
107223

108224
// Deposit escrow for encoder accounts.

0 commit comments

Comments
 (0)