Experimental workflow to auto-fix tests#5014

Open

malithsen wants to merge 4 commits intodevelopfrom

try/auto-fix-tests

Contributor

malithsen commented Feb 12, 2026 •

edited

Loading

Changes proposed in this Pull Request:

This is an experiment to use Claude Code Action to automatically analyze test failures on PRs and open fix PRs when the failure is due to an issue in the test iteself (eg: #5005)

How it works:

Triggers automatically via workflow_run when PHP tests, JS tests, or E2E tests workflows fail on a PR
Step 1: Claude tries to classify the failure as test_drift, test_bug, application_bug, or environment using structured JSON output, then posts an analysis comment on the PR
Step 2: If classified as test_drift or test_bug with high/medium confidence, Claude creates a separate fix PR targeting the original PR's branch

Some guardrails:

Skips claude/ branches to avoid potential loops.
Skips if a fix PR already exists for the same source PR
Post-fix validation to ensure only test files were modified
Bash tools restricted to only the required gh and git commands

Testing instructions

Primarily looking for a code-review and feedback on the approach. We'd need to merge it to develop to actually test it.

I've tested a slightly modified version of this workflow in fork of the repo by opening a PR that introducing a failing PR.

PR: malithsen#2
Automated fix: malithsen#3

This test doesn't perfectly capture the intended scenario, but it does validate the end-to-end flow failure detection, analysis, comment posting, and fix PR creation.

Covered with tests (or have a good reason not to test in description ☝️)
Tested on mobile (or does not apply)

Changelog entry

This Pull Request does not require a changelog entry. (Comment required below)

Changelog Entry Comment

Comment

Post merge

Added testing instructions to the Release Testing Instructions wiki page (or does not apply)
Added the needs docs label (or does not apply)
Included this PR in the Release Thread scope (or does not apply)

malithsen added 3 commits

February 12, 2026 14:48


          A workflow to auto-fix tests

a9e7f9e


          Minor comment adjustments


          Add missing permission

b3432aa

malithsen requested review from a team, daledupreez and wjrosa and removed request for a team

February 12, 2026 22:45


          Restrict tool use

c24c961

wjrosa approved these changes

View reviewed changes

Contributor

wjrosa left a comment

Awesome idea! Looks good to me 👍

malithsen added this to the 10.5.0 milestone

daledupreez approved these changes

View reviewed changes

Contributor

daledupreez left a comment

This is looking good to me, and I think is totally worth trying!

I have some fairly minor comments and suggestions, but none of them are blocking.

Once we get things working, it may be worth splitting up the code into separate composable actions with clear inputs and outputs, as there is a lot going on across all the steps and common state.

.github/workflows/claude-fix-tests.yml

+                          # Step 2: Check for existing fix PR & extract logs
+                          # ──────────────────────────────────────────────
+                          - name: Check for existing fix PR
+                            if: steps.resolve_pr.outputs.found == 'true' && steps.resolve_pr.outputs.is_fork != 'true'

Contributor

daledupreez Feb 17, 2026

Nit: Why not check that is_fork == 'false'? I think it makes what we are trying to find a bit clearer.

Suggested change

      
                          if: steps.resolve_pr.outputs.found == 'true' && steps.resolve_pr.outputs.is_fork != 'true'
          
                          if: steps.resolve_pr.outputs.found == 'true' && steps.resolve_pr.outputs.is_fork == 'false'

.github/workflows/claude-fix-tests.yml

Comment on lines +134 to +135

		const failedJobs = jobs.data.jobs.filter(j => j.conclusion === 'failure');
		const failedJobNames = failedJobs.map(j => j.name);

Contributor

daledupreez Feb 17, 2026

Nit: can we use job in the filters?

Suggested change

      
                                  const failedJobs = jobs.data.jobs.filter(j => j.conclusion === 'failure');
          
                                  const failedJobNames = failedJobs.map(j => j.name);
          
                                  const failedJobs = jobs.data.jobs.filter(job => job.conclusion === 'failure');
          
                                  const failedJobNames = failedJobs.map(job => job.name);

.github/workflows/claude-fix-tests.yml

Comment on lines +137 to +139

+                                    // Download logs for each failed job (truncated to last 200 lines each, max 3 jobs)
+                                    let allLogs = '';
+                                    for (const job of failedJobs.slice(0, 3)) {

Contributor

daledupreez Feb 17, 2026

Out of interest, why only 3 jobs and why 200 lines?

.github/workflows/claude-fix-tests.yml

+                                        });
+                                        const logLines = log.data.split('\n');
+                                        const truncated = logLines.slice(-200).join('\n');
+                                        allLogs += `\n--- Job: ${job.name} ---\n${truncated}\n`;

Contributor

daledupreez Feb 17, 2026

Might it be worth adding more explicit separators between jobs?

Suggested change

      
                                      allLogs += `\n--- Job: ${job.name} ---\n${truncated}\n`;
          
                                      allLogs += `\n--- Job: ${job.name} ---\n${truncated}\n--- /end job ${job.name} ---\n`;

.github/workflows/claude-fix-tests.yml

Comment on lines +165 to +176

+                                steps.resolve_pr.outputs.is_fork != 'true' &&
+                                steps.check_existing.outputs.exists != 'true'
+                            uses: actions/checkout@v4
+                            with:
+                                ref: ${{ steps.resolve_pr.outputs.pr_branch }}
+                                fetch-depth: 0
+                          - name: Analyze test failure (Phase 1)
+                            if: >
+                                steps.resolve_pr.outputs.found == 'true' &&
+                                steps.resolve_pr.outputs.is_fork != 'true' &&
+                                steps.check_existing.outputs.exists != 'true'

Contributor

daledupreez Feb 17, 2026

As noted earlier, might it make sense to check for == 'false' rather than != 'true'?

Suggested change

      
                              steps.resolve_pr.outputs.is_fork != 'true' &&
          
                              steps.check_existing.outputs.exists != 'true'
          
                          uses: actions/checkout@v4
          
                          with:
          
                              ref: ${{ steps.resolve_pr.outputs.pr_branch }}
          
                              fetch-depth: 0
          
                        - name: Analyze test failure (Phase 1)
          
                          if: >
          
                              steps.resolve_pr.outputs.found == 'true' &&
          
                              steps.resolve_pr.outputs.is_fork != 'true' &&
          
                              steps.check_existing.outputs.exists != 'true'
          
                              steps.resolve_pr.outputs.is_fork == 'false' &&
          
                              steps.check_existing.outputs.exists == 'false'
          
                          uses: actions/checkout@v4
          
                          with:
          
                              ref: ${{ steps.resolve_pr.outputs.pr_branch }}
          
                              fetch-depth: 0
          
                        - name: Analyze test failure (Phase 1)
          
                          if: >
          
                              steps.resolve_pr.outputs.found == 'true' &&
          
                              steps.resolve_pr.outputs.is_fork == 'false' &&
          
                              steps.check_existing.outputs.exists == 'false'

.github/workflows/claude-fix-tests.yml

+                                    ## Run URL: ${{ steps.logs.outputs.run_url }}
+                                    ## Error Logs:
+                                    ${{ steps.logs.outputs.logs }}

Contributor

daledupreez Feb 17, 2026

Might it be worth adding a ``` boundary here to explicitly wrap the log content in a markdown-native way?

Suggested change

      
                                  ${{ steps.logs.outputs.logs }}
          
                                  ```
          
                                  ${{ steps.logs.outputs.logs }}
          
                                  ```

.github/workflows/claude-fix-tests.yml

+                                    if (analysis.affected_files && analysis.affected_files.length > 0) {
+                                      body += `**Affected files:**\n`;
+                                      analysis.affected_files.forEach(f => { body += `- \`${f}\`\n`; });

Contributor

daledupreez Feb 17, 2026

To match the other code that is body-oriented, we could switch this around to use map() and join(). (Not at all blocking.)

Suggested change

      
                                    analysis.affected_files.forEach(f => { body += `- \`${f}\`\n`; });
          
                                    body += analysis.affected_files.map(filename => `- \`${f}\``).join( '\n' );

.github/workflows/claude-fix-tests.yml

+                                    ## Failed Workflow: ${{ steps.logs.outputs.workflow_name }}
+                                    ## Error Logs:
+                                    ${{ steps.logs.outputs.logs }}

Contributor

daledupreez Feb 17, 2026

As above RE wrapping the log output in ```.

Suggested change

      
                                  ${{ steps.logs.outputs.logs }}
          
                                  ```
          
                                  ${{ steps.logs.outputs.logs }}
          
                                  ```

.github/workflows/claude-fix-tests.yml

+                                    Fixes failing tests from [${{ steps.logs.outputs.workflow_name }}](${{ steps.logs.outputs.run_url }}) on PR #${{ steps.resolve_pr.outputs.pr_number }}.
+                                    ---
+                                    *Auto-generated by Claude Code*"

Contributor

daledupreez Feb 17, 2026

Maybe mention the workflow here?

Suggested change

      
                                  *Auto-generated by Claude Code*"
          
                                  *Auto-generated by Claude Code via the claude-fix-tests workflow*"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet