Commit 7d19c85
committed
planx: harden Hasura boot against slow Aurora cold starts
A planx StackSet CREATE op into account 214888068391 failed with
"ECS Deployment Circuit Breaker was triggered" on the Hasura service.
ECS doesn't surface a per-task reason in the StackSet output, but the
existing failure mode is well-understood:
1. Aurora can take 5-10 minutes to come up on a fresh sandbox account.
2. entrypoint-wrapper.sh waited 5 minutes for DNS / pg_isready, then
logged "WARNING: ... Continuing anyway..." and started Hasura
regardless. Hasura then crashed connecting to a still-cold DB, ECS
restarted the task, repeat → circuit breaker tripped.
3. circuitBreaker: { rollback: true } meant the entire stack got rolled
back, deleting the very CloudWatch logs that would have told us this.
Two changes:
- entrypoint-wrapper.sh: extend the DNS + pg_isready waits to 10 minutes
each, and exit non-zero on timeout instead of "continuing anyway". A
fresh ECS-restarted task re-resolves DNS, so an exit fits the retry
semantics cleanly. Continuing past a missing DB just guarantees a
doomed Hasura process.
- compute.ts: bump healthCheckGracePeriod from 15 to 30 minutes for the
Hasura service, and switch its circuit breaker to enable=true,
rollback=false. ECS still stops piling on tasks once it gives up, but
the stack stays in CREATE_FAILED with logs intact instead of
ROLLBACK_COMPLETE with everything gone.
The other three services (api, sharedb, editor) keep their default
circuit breaker behaviour — only Hasura has the fresh-DB cold-start
problem.1 parent d79ac74 commit 7d19c85
2 files changed
Lines changed: 26 additions & 10 deletions
File tree
- cloudformation/scenarios/planx
- cdk/lib/constructs
- docker/hasura
Lines changed: 9 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
171 | 171 | | |
172 | 172 | | |
173 | 173 | | |
174 | | - | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
175 | 182 | | |
176 | | - | |
| 183 | + | |
177 | 184 | | |
178 | 185 | | |
179 | 186 | | |
| |||
Lines changed: 17 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
19 | 24 | | |
20 | | - | |
| 25 | + | |
21 | 26 | | |
22 | 27 | | |
23 | 28 | | |
24 | 29 | | |
25 | | - | |
26 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
27 | 33 | | |
28 | 34 | | |
29 | 35 | | |
30 | 36 | | |
31 | | - | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
32 | 40 | | |
33 | | - | |
| 41 | + | |
34 | 42 | | |
35 | 43 | | |
36 | 44 | | |
37 | 45 | | |
38 | | - | |
39 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
40 | 49 | | |
41 | 50 | | |
42 | 51 | | |
| |||
0 commit comments