Skip to content

[FTR] Reduce wasted wait in header.waitUntilLoadingHasFinished#271855

Draft
csr wants to merge 3 commits into
elastic:mainfrom
csr:ftr-header-loading-wait-perf
Draft

[FTR] Reduce wasted wait in header.waitUntilLoadingHasFinished#271855
csr wants to merge 3 commits into
elastic:mainfrom
csr:ftr-header-loading-wait-perf

Conversation

@csr
Copy link
Copy Markdown
Member

@csr csr commented May 29, 2026

Summary

Tactical reduction of the hard-coded "wait for loading indicator to appear" timeout in header.waitUntilLoadingHasFinished() from 1500ms to 500ms.

This is FTR's most-called UI sync helper (~3000 call sites). On the steady-state path (page already idle), the loader never appears and the full 1500ms is wasted; cutting it to 500ms saves ~1s per idle invocation while preserving correctness.

Refs #222306. Scout has already deprecated the equivalent helper (#246843).

Why this is "stop the bleeding", not the proper fix

The root cause of the wasted time is a misuse pattern:

Anti-Pattern: Multi-step navigation (e.g. Settings home → Data Views list → specific Data View) by clicking UI elements and waiting for a global loading indicator to confirm page readiness significantly slows test execution.

Fix: Replace multi-step click sequences with a single direct URL navigation call. Use testSubjects.existOrFail('<unique-element>') to confirm the target page is ready instead of the generic pageObjects.header.waitUntilLoadingHasFinished.

The proper long-term fix is to migrate callers off waitUntilLoadingHasFinished to direct URL navigation + explicit existOrFail('<unique-element>') checks. That is a much larger, per-suite undertaking — Scout already followed that path in #246843.

This PR is a single, narrow change that buys back ~1s per idle call across all existing callers while the proper migration happens in parallel.

Why 500ms is enough

The Kibana global loading indicator is driven by a 250ms debounce on http.getLoadingCount$:

export const LOADING_DEBOUNCE_TIME = 250;
...
http.getLoadingCount$().pipe(
  debounceTime(LOADING_DEBOUNCE_TIME),
  map((c) => c > 0)
)

If a request is in flight, the indicator becomes visible within ~250ms. 500ms gives 2x safety margin over the debounce; longer waits are pure overhead when the page is already idle (the common case for this helper, e.g. after navigateToApp, after retry.waitFor, etc.).

The disappear step (awaitGlobalLoadingIndicatorHidden, capped at defaultFindTimeout * 10 = 100s) is unchanged and remains the authoritative "loading finished" signal — any rare case where the loader appears later than 500ms is absorbed by that 100s window.

Risk assessment

WebDriver is notoriously slow at rendering, so 500ms is not obviously safe without empirical validation. The risk being tested here: whether 500ms is tight enough to occasionally false-negative the appear step on slow CI agents, and whether that cascades into flake (it shouldn't — the disappear step still bounds the overall wait at 100s — but we want to confirm).

To validate, this PR runs the flaky-test-runner across the most timing-sensitive FTR configs that exercise this helper heavily.

Expected savings (if confirmed flake-free)

  • Per idle call: ~1000ms saved (1500 → ~500).
  • At ~70% idle hit rate × ~1500 calls per full UI run × ~1s ≈ ~17 minutes wall-clock saved per full FTR UI run (per worker; aggregate CI savings scale with the number of FTR jobs running this helper).
  • Per-call cost when a real loader is mid-flight: unchanged in practice — the appear poll resolves at ~250ms via the debounce.

Test plan

  • node scripts/eslint --fix — no errors.
  • node scripts/type_check --project src/platform/test/tsconfig.json — passes.
  • node scripts/check_changes.ts — passes.
  • Local FTR run attempted (status_page config) — blocked by an environmental Chrome 149 / Kibana CSP incompatibility (inline script violates 'script-src 'report-sample' 'self'' in the Kibana page) that also reproduces on main without this change. CI is the authoritative validator.
  • ci:collect-ftr-timing data confirms per-call savings.
  • Flaky-test-runner across high-traffic, timing-sensitive UI configs is green at 25–50 runs each (Lens at 50, Cases / Discover / Dashboard / Context at 25). See linked build below.

Follow-up

The proper fix is migrating callers — direct URL navigation + existOrFail('<unique-element>'). Happy to file a tracking issue if this PR lands cleanly.

The "appear" step in `waitUntilLoadingHasFinished()` polls for the
`globalLoadingIndicator` test-subj for 1500ms. Kibana's loading state is
itself driven by a 250ms debounce on `http.getLoadingCount$`
(`LOADING_DEBOUNCE_TIME` in `core/.../chrome_hooks.ts`), so when a request
is in flight the loader becomes visible within ~250ms; when no request is
in flight, the indicator never appears and the full 1500ms is wasted.

This helper is one of the most-called UI synchronization points in FTR
(~3000 call sites across `src/platform/test/`, `x-pack/`, and solutions),
and the page is already idle for the majority of those calls (after
`navigateToApp`, after `retry.waitFor`, etc.). Reducing the appear-step
timeout from 1500ms to 500ms (debounce + 250ms safety buffer) cuts ~1s
off every idle invocation while preserving correctness: the
`awaitGlobalLoadingIndicatorHidden` step still bounds the overall wait
at `defaultFindTimeout * 10` (100s) and is the authoritative
"loading finished" signal.

Refs elastic#222306

Co-authored-by: Cursor <cursoragent@cursor.com>
@csr csr added release_note:skip Skip the PR/issue when compiling release notes ci:collect-ftr-timing labels May 29, 2026
@infra-vault-gh-plugin-prod
Copy link
Copy Markdown

infra-vault-gh-plugin-prod Bot commented May 29, 2026

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

  • Click to trigger kibana-pull-request for this PR!
  • Click to trigger kibana-deploy-project-from-pr for this PR!
  • Click to trigger kibana-deploy-cloud-from-pr for this PR!
  • Click to trigger kibana-entity-store-performance-from-pr for this PR!
  • Click to trigger kibana-storybooks-from-pr for this PR!

@csr csr changed the title [FTR] Reduce wasted wait in header.waitUntilLoadingHasFinished [FTR] Reduce wasted wait in header.waitUntilLoadingHasFinished May 29, 2026
@kibanamachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #79 / lens app - TSVB Open in Lens Convert to Lens action on dashboard "before all" hook for "should show notification in context menu if visualization can be converted"
  • [job] [logs] FTR Configs #155 / serverless observability UI - onboarding Onboarding Onboarding Firehose Quickstart Flow shows an AWS service when data is detected

Metrics [docs]

✅ unchanged

@kibanamachine
Copy link
Copy Markdown
Contributor

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#12505

[✅] x-pack/platform/test/functional/apps/lens/group1/config.ts: 50/50 tests passed.
[✅] x-pack/platform/test/functional_with_es_ssl/apps/cases/group1/config.ts: 25/25 tests passed.
[✅] src/platform/test/functional/apps/discover/group1/config.ts: 25/25 tests passed.
[✅] src/platform/test/functional/apps/discover/group9/config.ts: 25/25 tests passed.
[✅] src/platform/test/functional/apps/dashboard/group1/config.ts: 25/25 tests passed.
[✅] src/platform/test/functional/apps/context/config.ts: 25/25 tests passed.

see run history

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:collect-ftr-timing release_note:skip Skip the PR/issue when compiling release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants