fix: TestSyncToTipMocha — Chain.Start() fails with peer dial errors before state sync begins

## Problem

`TestSyncToTipMocha` has a second, distinct failure mode where `mochaChain.Start()` at [`test/docker-e2e/e2e_sync_to_tip_test.go:80`](https://github.com/celestiaorg/celestia-app/blob/main/test/docker-e2e/e2e_sync_to_tip_test.go#L80) fails before Phase 1 (state sync) even begins. The node logs a storm of peer-dial errors and the test terminates at ~145s with `Error: 130` (SIGINT).

This is distinct from #7138, which tracks a KPI-timeout failure mode inside Phase 2 (block sync).

## Symptoms

Example from 2026-04-13 ([run 24323756159](https://github.com/celestiaorg/celestia-app/actions/runs/24323756159)):

```
e2e_sync_to_tip_test.go:76: Starting mocha sync-to-tip node
e2e_sync_to_tip_test.go:80: <error>
3:13AM ERR Error dialing peer err="dial tcp 65.109.83.40:28656: connect: connection refused" module=p2p
3:14AM ERR Error dialing peer err="dial tcp 152.53.33.96:12056: connect: connection refused" module=p2p
3:14AM ERR Error dialing peer err="dial tcp 177.54.156.69:26656: i/o timeout" module=p2p
3:14AM ERR Error dialing peer err="dial tcp 65.109.116.42:11656: i/o timeout" module=p2p
...
Error: 130
--- FAIL: TestCelestiaTestSuite/TestSyncToTipMocha (149.06s)
```

All listed peer errors are either `connection refused` or `i/o timeout` — the node cannot establish a connection to any mocha peer.

## Affected runs (in the last 20 nightly runs)

Short-failure (~145s) runs that look like this mode:

| Date | Run | Duration |
|---|---|---|
| 2026-04-01 | [23829926363](https://github.com/celestiaorg/celestia-app/actions/runs/23829926363) | 144s |
| 2026-04-02 | [23881709964](https://github.com/celestiaorg/celestia-app/actions/runs/23881709964) | 145s |
| 2026-04-03 | [23931900878](https://github.com/celestiaorg/celestia-app/actions/runs/23931900878) | 145s |
| 2026-04-04 | [23969952391](https://github.com/celestiaorg/celestia-app/actions/runs/23969952391) | 145s |
| 2026-04-09 | [24170163213](https://github.com/celestiaorg/celestia-app/actions/runs/24170163213) | 145s |
| 2026-04-11 | [24273208413](https://github.com/celestiaorg/celestia-app/actions/runs/24273208413) | 147s |
| 2026-04-12 | [24297392661](https://github.com/celestiaorg/celestia-app/actions/runs/24297392661) | 147s |
| 2026-04-13 | [24323756159](https://github.com/celestiaorg/celestia-app/actions/runs/24323756159) | 149s |

8 failures out of 20 runs. The 04-01..04-09 set may overlap with #7017 (closed 2026-04-10) but failures continued after the fix, suggesting #7017 addressed a related but distinct issue (`BlockWaitTimeout`) rather than the root peer-dial problem.

## Hypotheses

1. **Seed/peer list is stale or intermittent**: mocha peer addresses in `NewMochaConfig()` may contain nodes that are offline or behind firewalls that block GitHub Actions runners' egress IPs.
2. **Mocha network instability at ~03:00 UTC**: all nightly runs start around 03:00 UTC, which may coincide with a routine on mocha (e.g., chain upgrade window, validator maintenance).
3. **GitHub runner network egress restrictions**: outbound TCP to arbitrary ports (11656, 12056, 26656, 26676, 28656, 43656) may be rate-limited or shaped.
4. **Connection-count limit**: the node may give up after exhausting the configured peer list without establishing enough connections.

## Next steps

1. Check what timeout is being hit at ~145s — could be inside `Chain.Start()` or container-level.
2. Review `networks.NewMochaConfig()` peer/seed freshness; consider using a curated reliable-peers list.
3. Add logging of peer-connection state during `Chain.Start()` to diagnose future failures.
4. Consider a retry-with-backoff for `Chain.Start()` that re-resolves peers.

## Related

- #7138 (sibling — Phase 2 block-sync KPI timeout)
- #7017 (closed — tastora `WithBlockWaitTimeout`, addressed a related 120s timeout)
- #6862 (closed — prior nightly flakes in this test and `TestAllUpgrades`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: TestSyncToTipMocha — Chain.Start() fails with peer dial errors before state sync begins #7147

Problem

Symptoms

Affected runs (in the last 20 nightly runs)

Hypotheses

Next steps

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Date	Run	Duration
2026-04-01	23829926363	144s
2026-04-02	23881709964	145s
2026-04-03	23931900878	145s
2026-04-04	23969952391	145s
2026-04-09	24170163213	145s
2026-04-11	24273208413	147s
2026-04-12	24297392661	147s
2026-04-13	24323756159	149s

fix: TestSyncToTipMocha — Chain.Start() fails with peer dial errors before state sync begins #7147

Description

Problem

Symptoms

Affected runs (in the last 20 nightly runs)

Hypotheses

Next steps

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions