Skip to content

Commit 3173ddf

Browse files
committed
Add retry logic to TestFlyDeployHAPlacement for Corrosion replication lag
The TestFlyDeployHAPlacement test was failing intermittently with: Error: error creating a new machine: failed to launch VM: internal: failed to get app: sql: no rows in result set This is a known Corrosion replication lag issue where the app record hasn't replicated to all backend hosts yet when creating the second machine for HA. Changes: - Split launch and deploy to give more time between app creation and machine provisioning - Add retry logic using require.EventuallyWithT to retry deploy on replication errors - Only retry on the specific 'sql: no rows in result set' error - Fail fast on any other errors to avoid masking real issues - Retry for up to 30 seconds with 5-second intervals This makes the test resilient to backend replication lag without changing backend behavior or risking side effects from longer timeouts.
1 parent 5634020 commit 3173ddf

File tree

1 file changed

+19
-2
lines changed

1 file changed

+19
-2
lines changed

test/preflight/fly_deploy_test.go

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,11 +84,28 @@ func TestFlyDeployHAPlacement(t *testing.T) {
8484
f := testlib.NewTestEnvFromEnv(t)
8585
appName := f.CreateRandomAppName()
8686

87+
// Create the app without deploying to avoid the Corrosion replication race
8788
f.Fly(
88-
"launch --now --org %s --name %s --region %s --image nginx --internal-port 80",
89+
"launch --org %s --name %s --region %s --image nginx --internal-port 80",
8990
f.OrgSlug(), appName, f.PrimaryRegion(),
9091
)
91-
f.Fly("deploy --buildkit --remote-only")
92+
93+
// Retry the deploy command to handle Corrosion replication lag race conditions
94+
// The backend may not have replicated the app record to all hosts yet when
95+
// creating the second machine for HA, resulting in "sql: no rows in result set" errors
96+
require.EventuallyWithT(f, func(c *assert.CollectT) {
97+
result := f.FlyAllowExitFailure("deploy --buildkit --remote-only")
98+
if result.ExitCode() != 0 {
99+
stderr := result.StdErrString()
100+
// Only retry if it's the known Corrosion replication lag error
101+
if strings.Contains(stderr, "failed to get app: sql: no rows in result set") {
102+
assert.Fail(c, "Corrosion replication lag, retrying...")
103+
} else {
104+
// If it's a different error, fail immediately
105+
f.Fatalf("deploy failed with unexpected error: %s", stderr)
106+
}
107+
}
108+
}, 30*time.Second, 5*time.Second, "deploy should succeed after Corrosion replication")
92109

93110
assertHostDistribution(t, f, appName, 2)
94111
}

0 commit comments

Comments
 (0)