Skip to content

Fix flaky tests (homepage-settings.spec.js duplicate Sample Page / Sample page row match, publish-panel.spec.js focus assertion before panel close completed)#77893

Open
danluu wants to merge 1 commit intoWordPress:trunkfrom
danluu:try/test-flakes-pr
Open

Conversation

@danluu
Copy link
Copy Markdown
Contributor

@danluu danluu commented May 2, 2026

This is part of an AI fuzzing project, where an AI wrote a fuzzer and then triages bugs from the fuzzer and creates fixes. See #77716 for the tracking issue. As of this writing, there have been no known false positives from this project, but there have been some issues, which are documented in #77716. I expect we’ll see false positives at some point (and may even have one that’s been filed in a PR that hasn’t been inspected by a code owner yet).

What?

Gutenberg CI tests seem to be somewhat flaky. Of the PRs I've submitted recently, it feels like most PRs that have no "real" CI failures fail on initial submission and need to be re-run once or twice to pass CI. I haven't actually run the numbers; maybe it's 25% or 30% or something. In any case, a decent number of CI runs fail due to flakes.

This PR attempts to fix #68892, #77385, and #77721. It probably make sense to clean up more than just these, but since I'm not familiar with this codebase, as with the RTC fuzzing bugs, I wanted to start with a small PR to get people's opinions on automated AI flake fixing before submitting more or larger PRs.

AI TEXT

The revised plan is a PR boundary, not a complete flake program. Its purpose is to select fixes where the cause is local, the reviewer can understand the change from one spec file, and the original behavior remains covered by the same end-to-end assertion. That is why the best first PR is limited to homepage-settings.spec.js: the failure mechanism is a fixture and locator problem, and the fix can isolate the page fixtures and scope row selection without deleting coverage or changing the user-visible behavior under test.

The plan intentionally rejects fixes that depend on later cleanup, monitoring, or policy enforcement. It does not use skips, quarantines, broad retries, weakened assertions, or test deletion because those reduce the chance that the PR actually improves correctness. A fix is acceptable only when it replaces a race with a deterministic condition tied to the operation being tested, such as an exact locator, an already-observable UI state, or a clearly completed editor action.

The excluded areas are not dismissed as unimportant. RTC synchronization, router readiness, Openverse/media behavior, upload lifecycle semantics, REST retry policy, and reporter hygiene can all be real sources of flakiness. They are excluded from the first PR because each has a wider review surface and a higher risk of either changing product behavior or adding infrastructure that needs ongoing ownership. Those should be handled as separate focused changes only when there is concrete causal evidence and the fix preserves or strengthens the existing coverage.

Revised Fix Plan

  1. Keep the PR small and test-focused. Do not include a flake ledger, reporter rewrite, CI workflow redesign, broad requestUtils retry layer, dashboard, bot rule, skip/quarantine mechanism, or any other change that requires ongoing follow-up.
  2. Preserve the asserted behavior exactly. For each touched test, state the behavior protected by the current assertion and keep that assertion or an equivalent assertion in the same PR. Do not reduce expectations, loosen assertions, or turn an end-to-end behavior into only lower-level coverage.
  3. Do not delete tests in this PR. Deletion or material narrowing should be treated as out of scope unless the certainty is extraordinarily high and reviewers can see identical or stronger replacement coverage. The default is to rewrite the flaky interaction, not remove it.
  4. Fix only flakes with a concrete, local mechanism and a small patch. Acceptable mechanisms are fixture isolation, exact locator scoping, waiting for an already-observable UI state, and explicitly selecting an existing block before using its toolbar. Do not include entries whose latest evidence is only socket hang up.
  5. Avoid shared abstractions. Do not add cross-suite readiness APIs or package-level helpers. A small helper inside a touched spec is acceptable only if it keeps the diff simpler and has no behavior outside that file.
  6. Exclude higher-risk areas from this PR: Openverse mocking, global REST retry/backoff, flake issue deduplication, DataViews helper design, RTC synchronization, upload lifecycle semantics, and broad router synchronization. Those may deserve separate focused PRs, but they add risk here.
  7. Validate only the changed surface. Run the directly affected spec(s), preferably with --repeat-each for the modified tests, and verify the original behavior is still asserted. Do not make post-merge monitoring or future cleanup part of the PR's correctness.

Low-Risk PR Shape

  1. Best first PR: fix homepage-settings.spec.js only. It is a clear test-fixture/locator bug: isolate page fixtures and scope rows exactly while preserving the same homepage/posts-page action assertions.
  2. If a second fix is included, choose publish-panel.spec.js only if the change is a narrow wait for panel closure or aria-expanded=false before the existing focus assertion. Keep it test-only.
  3. Keep post-content-focus-mode.spec.js and classic.spec.js as follow-up PRs unless the first PR remains trivially small. They are still local, but they touch different editor flows and are easier to review separately.
  4. Keep router, DataViews, RTC, upload-save-lock, Openverse/media, REST retry/backoff, and reporter work out of this low-risk PR.

Concrete Small-Scope Fix Selection

Checked again on 2026-05-02, the only fixes that fit the low-risk idea are the two test-only fixes below. They are direct responses to the latest retained failure examples, they preserve the existing assertions, and they do not require shared helpers, product changes, retries, skips, quarantines, or follow-up ownership.

Include Flake issue(s) File / test Concrete fix How the fix hits the flake Coverage effect
Yes, first #77385, #68892 test/e2e/specs/site-editor/homepage-settings.spec.js / should show correct homepage actions based on current homepage or posts page Own the page fixtures before seeding and select the intended row with exact title scoping. The safest shape is to reset page state before creating Homepage, Sample page, and Draft page, then derive row locators from exact page titles rather than broad getByLabel() matches. The latest failure is Playwright strict mode resolving Sample Page and Sample page as two matching rows. Fixture reset removes leaked page rows; exact row selection prevents case/substring title collisions from selecting the wrong DataViews row. The same homepage and posts-page menu assertions remain in place. No reduction. The test still verifies that the current homepage cannot be set as homepage/posts page and that the posts page action disappears after setting a posts page.
Yes, second only if the PR remains tiny #77721 test/e2e/specs/editor/various/publish-panel.spec.js / should move focus back to the Publish panel toggle button when canceling After clicking Cancel, wait for the publish panel to actually close, preferably by waiting for the top-bar Publish toggle to have aria-expanded="false", then keep the existing toBeFocused() assertion. The latest failure shows the Publish toggle still has aria-expanded="true" while the test is already asserting focus. Waiting for the close state ties the assertion to the operation under test instead of sampling focus while the panel is still open. No reduction. The focus assertion remains the coverage; the added wait only establishes that the cancel/close transition completed before checking focus return.

Actual PR Branch Fix Coverage

The created code branch is try/test-flakes-pr at commit 0aba5a3831f (Stabilize small-scope flaky e2e tests). It changes only test/e2e/specs/site-editor/homepage-settings.spec.js and test/e2e/specs/editor/various/publish-panel.spec.js.

That branch fixes these flakes:

Fixed by branch Flake issue(s) Why this branch fixes it
Yes #77385, #68892 The branch resets homepage/posts-page settings and deletes all pages before creating the three pages used by homepage-settings.spec.js, then replaces broad getByLabel() row filtering with exact, case-sensitive row locators. This directly targets the observed strict-mode failure where both Sample Page and Sample page matched the same intended row lookup.
Yes #77721 The branch waits for the Publish panel toggle to report aria-expanded="false" after clicking Cancel, then keeps the original toBeFocused() assertion. This directly targets the observed failure where the test asserted focus while the panel was still open (aria-expanded="true").

That branch does not fix #77720 (post-content-focus-mode.spec.js) or #77705 / #36123 (classic.spec.js). Those remain intentionally excluded from the small PR because they need focused review of broader editor flows.

Do not include post-content-focus-mode.spec.js in this PR even though its latest failure is a local-looking wait for Initial content. That flake lives inside the "Show template" rendering mode and template-part embedding path; a correct fix probably needs a Post Content readiness helper and should be reviewed as a focused editor-flow PR.

Do not include classic.spec.js in this PR even though explicitly selecting the Classic block after undo might address the latest toolbar timeout. The test covers a long TinyMCE/media-modal/upload/convert/undo flow and has a long historical flake issue; keep it separate so reviewers can evaluate whether the fix actually preserves the media conversion and undo coverage rather than merely making the toolbar button clickable.

END AI TEXT

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: danluu <danluu@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@danluu danluu force-pushed the try/test-flakes-pr branch from 2231852 to a27bc3f Compare May 2, 2026 20:49
@danluu danluu force-pushed the try/test-flakes-pr branch from a27bc3f to 4c5a34b Compare May 2, 2026 21:00
@t-hamano t-hamano added the [Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests. label May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests.

Projects

None yet

2 participants