Skip to content

[CI] Opt-in FTR solution-selective testing via PR labels#271968

Closed
shahzad31 wants to merge 1 commit into
elastic:mainfrom
shahzad31:ftr-selective-solution-testing
Closed

[CI] Opt-in FTR solution-selective testing via PR labels#271968
shahzad31 wants to merge 1 commit into
elastic:mainfrom
shahzad31:ftr-selective-solution-testing

Conversation

@shahzad31
Copy link
Copy Markdown
Contributor

Summary

FTR configs are a frequent source of flaky, unrelated failures that block PRs. This adds opt-in, solution-level selective testing for FTR, gated behind two new PR labels, so a PR confined to one solution doesn't get blocked by another solution's flaky FTR configs.

Two labels (mutually usable, skip wins if both are set):

  • ci:skip-unaffected-ftr-configs — drop FTR configs that belong to solutions the PR doesn't touch.
  • ci:soft-fail-unaffected-ftr-configs — keep those configs running, but make their failures non-blocking (warning annotations) instead of failing the build.

Safety model (conservative by design)

The diff is only narrowed when every changed file — and every downstream dependent — is confined to one or more solutions' private code. The full, blocking suite runs exactly as today whenever the change touches:

  • platform / shared modules,
  • CI / test-infra / FTR manifests (critical-file list),
  • a downstream consumer that lives in platform or an unrecognized group.

On-merge builds always run the full suite. Solutions are visibility: private and cannot depend on one another, so a diff that stays inside a single solution's code cannot break another solution — that invariant is what makes this safe.

How it works

  • Module group (from kibana.jsonc) is now captured per module (getModuleGroup) in the affected-packages graph.
  • selective_ftr.ts resolves the affected solution set from the diff + transitive downstream graph, bailing to a full run on anything cross-cutting.
  • pick_test_group_run_order either reduces the enabled FTR config set (skip) or emits a ftr_soft_fail_configs.json artifact (soft-fail).
  • ftr_configs.sh treats soft-fail configs as non-blocking per-config.

Dry-run impact (40 recent merged PRs)

Outcome Share
Narrowed to a single solution 50%
Full suite (platform/CI/test-infra) 50%

For a typical solution-confined PR, ~40% of the FTR suite's blocking surface (the other solutions' configs) is removed or made non-blocking. No diff touching platform/shared/CI was ever narrowed in the sample.

Test plan

  • Unit tests for solution resolution, group lookup, manifest mapping, and label parsing (node .buildkite jest — 95 tests).
  • Open a draft PR confined to one solution with ci:skip-unaffected-ftr-configs and confirm other solutions' FTR configs are dropped.
  • Same with ci:soft-fail-unaffected-ftr-configs and confirm other solutions' configs run but are non-blocking.
  • Confirm a PR touching platform/shared/CI still runs the full blocking suite.

Made with Cursor

Reduce flaky-FTR noise on PRs whose changes are confined to a single
solution. Two opt-in PR labels gate the behaviour:

- ci:skip-unaffected-ftr-configs: drop FTR configs that belong to
  solutions the PR does not touch.
- ci:soft-fail-unaffected-ftr-configs: keep those configs running but
  mark their failures non-blocking (warning annotations) so unrelated
  flakiness can't block the PR.

Detection is deliberately conservative. The diff is only narrowed when
every changed file (and every downstream dependent) is confined to one
or more solutions' private code. Anything touching platform/shared
modules, CI/test-infra, or FTR manifests — or a downstream platform
consumer — runs the full, blocking suite exactly as today. On-merge
builds always run the full suite.

Implemented on top of the affected-packages module graph: kibana.jsonc
`group` is now captured per module (getModuleGroup) and the downstream
dependency graph is used to expand the affected set before deciding
whether the change is solution-confined.

Co-authored-by: Cursor <cursoragent@cursor.com>
@shahzad31 shahzad31 added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:actionable-obs Formerly "obs-ux-management", responsible for SLO, o11y alerting, significant events, & synthetics. ci:build-with-rspack-optimizer labels May 29, 2026
@github-actions github-actions Bot added the author:actionable-obs PRs authored by the actionable obs team label May 29, 2026
@infra-vault-gh-plugin-prod
Copy link
Copy Markdown

infra-vault-gh-plugin-prod Bot commented May 29, 2026

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

  • Click to trigger kibana-pull-request for this PR!
  • Click to trigger kibana-deploy-project-from-pr for this PR!
  • Click to trigger kibana-deploy-cloud-from-pr for this PR!
  • Click to trigger kibana-entity-store-performance-from-pr for this PR!
  • Click to trigger kibana-storybooks-from-pr for this PR!

@shahzad31
Copy link
Copy Markdown
Contributor Author

@tylersmalley @delanni i think seems feasible, at least solution specific skip/soft-fail for ftr configs , let me know what you think 🙏🏼

@kibanamachine
Copy link
Copy Markdown
Contributor

kibanamachine commented May 29, 2026

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] Jest Tests #6 / AddLayerButton renders all compatible series types
  • [job] [logs] FTR Configs #197 / discover/esql_3 Index editor allows editing an existing index
  • [job] [logs] Scout Lane #42 - serverless-observability_complete / default / local-serverless-observability_complete - Advanced tab permissions - should NOT show Advanced tab for editor role on classic stream
  • [job] [logs] Scout Lane #42 - serverless-observability_complete / default / local-serverless-observability_complete - Advanced tab permissions - should NOT show Advanced tab for viewer role on classic stream
  • [job] [logs] Scout Lane #1 - stateful-classic / default / local-stateful-classic - Stream data routing - previewing data - should select matched filter by default when condition is set
  • [job] [logs] FTR Configs #160 / Screenshots - serverless observability UI response ops docs observability connectors server log connector server log connector screenshots

Metrics [docs]

✅ unchanged

History

@shahzad31
Copy link
Copy Markdown
Contributor Author

we have a RFC in progress from Ops team

@shahzad31 shahzad31 closed this May 29, 2026
@shahzad31 shahzad31 deleted the ftr-selective-solution-testing branch May 29, 2026 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author:actionable-obs PRs authored by the actionable obs team backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:actionable-obs Formerly "obs-ux-management", responsible for SLO, o11y alerting, significant events, & synthetics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants