Commit 66c8e08
ci(cypress): default stateful ES to snapshot on CI, docker locally (elastic#264218)
## Summary
Default Cypress stateful Elasticsearch provisioning to `snapshot` on CI
and keep `docker` for local development.
The earlier switch to Docker as the universal default (elastic#254306) was
motivated by:
- making local dev match shipped artifacts,
- multi-arch support for Apple Silicon,
- avoiding per-spec snapshot extraction,
- faster warm starts on developer machines.
All four are genuine wins **for local dev**. On CI they either don't
apply, are neutral, or are actively counter-productive. After gathering
empirical data from Buildkite, the right default on CI is `snapshot`; on
workstations the right default stays `docker`.
## Why snapshot on CI
1. **No version-skew race.** Kibana CI already resolves an ES snapshot
manifest once per build in
[`.buildkite/scripts/lifecycle/pre_build.sh`](https://github.com/elastic/kibana/blob/main/.buildkite/scripts/lifecycle/pre_build.sh)
against `kibana-ci-es-snapshots-daily` — Kibana's own daily-verified
bucket, version-locked to Kibana by construction. The post-version-bump
window (`9.5.0`, `9.6.0`, …) that my earlier auto-detect probe tried to
guard against doesn't actually exist for stateful Cypress on CI: the
tar.gz is already there, or `pre_build.sh` has already failed the build
before any Cypress agent starts. A Docker image for that same version is
_not_ guaranteed to exist at the same moment — which is the exact
failure mode we kept running into.
2. **Docker-on-CI is not meaningfully faster on the same hardware.** I
pulled job durations from Buildkite for `kibana-on-merge` Security
Solution Cypress jobs before and after elastic#254306 and reconciled them
against the Buildkite agent machine-type change (`n2-standard-4` →
`n2-highmem-4`) that landed in the same window. Controlling for that
hardware change, ES start-up on a warm CI agent is ~5s different between
snapshot tar.gz and Docker — within noise for a 20–40 minute Cypress
group. The speedups originally attributed to Docker were largely a
hardware upgrade.
3. **ES starts once per FTR config group, not per spec.** `parallel.ts`
provisions ES once for each group in `specGroups`, runs all specs in
that group against the same cluster, then shuts down (see
[`runSpecGroup`](https://github.com/elastic/kibana/blob/main/x-pack/solutions/security/plugins/security_solution/scripts/run_cypress/parallel.ts)).
Only retry runs go per-spec. So the "Docker avoids per-spec extraction
on CI" argument is mostly about retries, which are a tiny fraction of
total runtime.
4. **Fewer moving parts on CI.** No Docker registry auth, no Docker pull
on every agent, no fallback logic between Docker and snapshot, no GCS
probe script. Snapshot tar.gz is already pre-fetched/cached by the
standard Kibana CI lifecycle.
## Why keep Docker for local dev
1. Matches shipped artifacts byte-for-byte.
2. Native multi-arch (Apple Silicon) without a separate tar.gz pipeline.
3. Warm starts are fast once the image is cached on the workstation.
4. `CYPRESS_ES_FROM=snapshot` (or `docker`) still works as an explicit
override for both environments.
## Change
```ts
const defaultEsFrom = process.env.CI ? 'snapshot' : 'docker';
const esFrom =
configEsFrom === 'serverless' ? 'serverless' : esFromEnv || defaultEsFrom;
```
Also drops the earlier `detect_cypress_es_from.sh` probe and its hook in
`setup_job_env.sh` — `pre_build.sh` already covers the version-skew
concern at a better layer.
The serverless routing fix (`configEsFrom === 'serverless'` wins over
`CYPRESS_ES_FROM`) is retained from the first commit and is independent
of the default flip — it prevents stateful `CYPRESS_ES_FROM=snapshot`
from accidentally booting serverless suites against a stateful snapshot
tar.gz and blowing up with `unknown setting
[xpack.security.authc.native_roles.enabled]`.
## Test plan
- [ ] Green `kibana-on-merge` Security Solution Cypress jobs (stateful +
serverless).
- [ ] Green `kibana-pull-request` Security Solution Cypress jobs with no
`CYPRESS_ES_FROM` set.
- [ ] Local: `yarn cypress:run ...` still uses Docker by default.
- [ ] Local: `CYPRESS_ES_FROM=snapshot yarn cypress:run ...` uses
snapshot.
- [ ] Serverless suites remain on `serverless` regardless of
`CYPRESS_ES_FROM`.
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>1 parent 8f7b932 commit 66c8e08
1 file changed
Lines changed: 16 additions & 1 deletion
Lines changed: 16 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
420 | 420 | | |
421 | 421 | | |
422 | 422 | | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
423 | 437 | | |
424 | 438 | | |
425 | | - | |
| 439 | + | |
| 440 | + | |
426 | 441 | | |
427 | 442 | | |
428 | 443 | | |
| |||
0 commit comments