Skip to content

Commit 66c8e08

Browse files
ci(cypress): default stateful ES to snapshot on CI, docker locally (elastic#264218)
## Summary Default Cypress stateful Elasticsearch provisioning to `snapshot` on CI and keep `docker` for local development. The earlier switch to Docker as the universal default (elastic#254306) was motivated by: - making local dev match shipped artifacts, - multi-arch support for Apple Silicon, - avoiding per-spec snapshot extraction, - faster warm starts on developer machines. All four are genuine wins **for local dev**. On CI they either don't apply, are neutral, or are actively counter-productive. After gathering empirical data from Buildkite, the right default on CI is `snapshot`; on workstations the right default stays `docker`. ## Why snapshot on CI 1. **No version-skew race.** Kibana CI already resolves an ES snapshot manifest once per build in [`.buildkite/scripts/lifecycle/pre_build.sh`](https://github.com/elastic/kibana/blob/main/.buildkite/scripts/lifecycle/pre_build.sh) against `kibana-ci-es-snapshots-daily` — Kibana's own daily-verified bucket, version-locked to Kibana by construction. The post-version-bump window (`9.5.0`, `9.6.0`, …) that my earlier auto-detect probe tried to guard against doesn't actually exist for stateful Cypress on CI: the tar.gz is already there, or `pre_build.sh` has already failed the build before any Cypress agent starts. A Docker image for that same version is _not_ guaranteed to exist at the same moment — which is the exact failure mode we kept running into. 2. **Docker-on-CI is not meaningfully faster on the same hardware.** I pulled job durations from Buildkite for `kibana-on-merge` Security Solution Cypress jobs before and after elastic#254306 and reconciled them against the Buildkite agent machine-type change (`n2-standard-4` → `n2-highmem-4`) that landed in the same window. Controlling for that hardware change, ES start-up on a warm CI agent is ~5s different between snapshot tar.gz and Docker — within noise for a 20–40 minute Cypress group. The speedups originally attributed to Docker were largely a hardware upgrade. 3. **ES starts once per FTR config group, not per spec.** `parallel.ts` provisions ES once for each group in `specGroups`, runs all specs in that group against the same cluster, then shuts down (see [`runSpecGroup`](https://github.com/elastic/kibana/blob/main/x-pack/solutions/security/plugins/security_solution/scripts/run_cypress/parallel.ts)). Only retry runs go per-spec. So the "Docker avoids per-spec extraction on CI" argument is mostly about retries, which are a tiny fraction of total runtime. 4. **Fewer moving parts on CI.** No Docker registry auth, no Docker pull on every agent, no fallback logic between Docker and snapshot, no GCS probe script. Snapshot tar.gz is already pre-fetched/cached by the standard Kibana CI lifecycle. ## Why keep Docker for local dev 1. Matches shipped artifacts byte-for-byte. 2. Native multi-arch (Apple Silicon) without a separate tar.gz pipeline. 3. Warm starts are fast once the image is cached on the workstation. 4. `CYPRESS_ES_FROM=snapshot` (or `docker`) still works as an explicit override for both environments. ## Change ```ts const defaultEsFrom = process.env.CI ? 'snapshot' : 'docker'; const esFrom = configEsFrom === 'serverless' ? 'serverless' : esFromEnv || defaultEsFrom; ``` Also drops the earlier `detect_cypress_es_from.sh` probe and its hook in `setup_job_env.sh` — `pre_build.sh` already covers the version-skew concern at a better layer. The serverless routing fix (`configEsFrom === 'serverless'` wins over `CYPRESS_ES_FROM`) is retained from the first commit and is independent of the default flip — it prevents stateful `CYPRESS_ES_FROM=snapshot` from accidentally booting serverless suites against a stateful snapshot tar.gz and blowing up with `unknown setting [xpack.security.authc.native_roles.enabled]`. ## Test plan - [ ] Green `kibana-on-merge` Security Solution Cypress jobs (stateful + serverless). - [ ] Green `kibana-pull-request` Security Solution Cypress jobs with no `CYPRESS_ES_FROM` set. - [ ] Local: `yarn cypress:run ...` still uses Docker by default. - [ ] Local: `CYPRESS_ES_FROM=snapshot yarn cypress:run ...` uses snapshot. - [ ] Serverless suites remain on `serverless` regardless of `CYPRESS_ES_FROM`. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
1 parent 8f7b932 commit 66c8e08

1 file changed

Lines changed: 16 additions & 1 deletion

File tree

  • x-pack/solutions/security/plugins/security_solution/scripts/run_cypress

x-pack/solutions/security/plugins/security_solution/scripts/run_cypress/parallel.ts

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -420,9 +420,24 @@ ${JSON.stringify(
420420
let fleetServer: StartedFleetServer | undefined;
421421
let shutdownEs;
422422

423+
// `CYPRESS_ES_FROM` must only override the *stateful* ES provisioning path.
424+
// Serverless Cypress suites require the `kibana-ci/elasticsearch-serverless`
425+
// Docker image and will fail to boot when run against a stateful snapshot
426+
// tar.gz (e.g. `unknown setting [xpack.security.authc.native_roles.enabled]`
427+
// or `unknown setting [serverless.search.enable_replicas_for_instant_failover]`).
428+
//
429+
// Stateful default differs by environment:
430+
// - CI: `snapshot` — the `kibana-ci-es-snapshots-daily` manifest is
431+
// resolved in `.buildkite/scripts/lifecycle/pre_build.sh` before any
432+
// job runs, so the version is always in lockstep with Kibana. Avoids
433+
// the post-version-bump window where the ES Docker image isn't
434+
// published yet, and avoids a Docker registry pull on every agent.
435+
// - Local: `docker` — matches what we ship, multi-arch, and warm starts
436+
// are fast once the image is cached on the developer's machine.
423437
const esFromEnv = process.env.CYPRESS_ES_FROM;
424438
const configEsFrom = config.get('esTestCluster.from');
425-
const esFrom = esFromEnv || (configEsFrom === 'serverless' ? 'serverless' : 'docker');
439+
const defaultEsFrom = process.env.CI ? 'snapshot' : 'docker';
440+
const esFrom = configEsFrom === 'serverless' ? 'serverless' : esFromEnv || defaultEsFrom;
426441

427442
try {
428443
shutdownEs = await pRetry(

0 commit comments

Comments
 (0)