Skip to content

[Abandoned] Add selective test retry for E2E/integration tests#5355

Closed
Copilot wants to merge 30 commits intomainfrom
copilot/investigate-e2e-test-retry-approach
Closed

[Abandoned] Add selective test retry for E2E/integration tests#5355
Copilot wants to merge 30 commits intomainfrom
copilot/investigate-e2e-test-retry-approach

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 28, 2026

Description

Implements a selective test retry mechanism that only retries failed tests instead of rerunning the entire test suite. This replaces the retryCountOnTaskFailure approach which caused pipeline timeouts and poor test result visibility.

AB#182243

Pipelines

Example of failing pipeline: https://microsofthealthoss.visualstudio.com/FhirServer/_build/results?buildId=47394&view=results

Example of working pipeline attached to this PR.

Changes

New PowerShell test retry script (build/scripts/Invoke-TestWithRetry.ps1):

  • Parses TRX files to extract failed test names
  • Retries only failed tests using --filter expressions
  • Extracts method names from complex xUnit fixture test names for reliable filtering
  • Generates separate TRX files per attempt for Azure DevOps retry-aware publishing
  • Supports code coverage collection via AdditionalArgs parameter

Azure DevOps pipeline updates:

  • Added AllowPtrToDetectTestRunRetryFiles: true variable to enable retry-aware test result publishing
  • Tests now show as "Passed with Retry" instead of "Failed" when they succeed after initial failure
  • Updated E2E tests (e2e-tests.yml), SQL integration tests (run-sql-tests.yml), and Cosmos integration tests (run-cosmos-tests.yml) to use the new retry script
  • Added explicit PublishTestResults@2 tasks for proper TRX file publishing
  • Fixed coverage file path patterns from */coverage.cobertura.xml to **/coverage.cobertura.xml

Removed retryCountOnTaskFailure:

  • Removed from E2E test tasks
  • Removed from SQL and Cosmos integration test tasks
  • Removed from export test tasks (Cosmos and SQL)

Benefits

  • Faster retries: Only failed tests are re-run instead of the entire suite
  • Better visibility: Azure DevOps shows which tests passed after retry vs. which truly failed
  • No timeouts: Retrying a subset of tests completes much faster than full suite re-runs
  • Complete audit trail: All TRX files from all attempts are preserved and published

Related issues

Addresses pipeline timeout and test result visibility issues with E2E and integration tests.

Testing

  • PowerShell syntax validated
  • Argument parsing tested with quoted strings (single and double quotes)
  • Method name extraction tested with xUnit fixture test name patterns
  • Applied to E2E tests (Cosmos, SQL, all categories) and Integration tests (Cosmos, SQL)

FHIR Team Checklist

  • Update the title of the PR to be succinct and less than 65 characters
  • Add a milestone to the PR for the sprint that it is merged (i.e. add S47)
  • Tag the PR with the type of update: Build, Enhancement
  • Tag the PR with Open source
  • Tag the PR with Schema Version unchanged
  • CI is green before merge Build Status
  • Review squash-merge requirements

Semver Change (docs)

Skip (build infrastructure only)

Copilot AI and others added 4 commits January 28, 2026 18:32
- Add Invoke-TestWithRetry.ps1 script that retries only failed tests
- Update e2e-tests.yml to use retry script instead of retryCountOnTaskFailure
- Remove retryCountOnTaskFailure from integration and export test tasks
- Keep infrastructure task retries unchanged

Co-authored-by: mikaelweave <753437+mikaelweave@users.noreply.github.com>
… script

- Add comprehensive README.md for test retry scripts
- Fix Invoke-TestWithRetry.ps1 to properly inherit environment variables
- Use & operator instead of Start-Process for better env var handling

Co-authored-by: mikaelweave <753437+mikaelweave@users.noreply.github.com>
- Add escaping for pipe characters in test names
- Add comment about special character handling
- Improve robustness of filter building

Co-authored-by: mikaelweave <753437+mikaelweave@users.noreply.github.com>
- Create VALIDATION.md with detailed validation steps
- Include expected outcomes for success, retry, and failure scenarios
- Add troubleshooting guide for common issues
- Provide clear sign-off checklist

Co-authored-by: mikaelweave <753437+mikaelweave@users.noreply.github.com>
Copilot AI changed the title [WIP] Investigate adding job-level retries for E2E tests Replace task-level test retry with selective failed-test retry Jan 28, 2026
Copilot AI requested a review from mikaelweave January 28, 2026 18:37
@mikaelweave
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI and others added 2 commits January 28, 2026 20:26
- Update run-cosmos-tests.yml to use Invoke-TestWithRetry.ps1
- Update run-sql-tests.yml to use Invoke-TestWithRetry.ps1
- Replace DotNetCoreCLI@2 with PowerShell@2 + PublishTestResults@2
- Maintain code coverage collection via AdditionalArgs
- Update README.md with integration test usage example
- Ensure environment variables are properly passed

Co-authored-by: mikaelweave <753437+mikaelweave@users.noreply.github.com>
- Add comprehensive validation steps for integration tests
- Include code coverage validation checks
- Add integration test retry examples to expected outcomes
- Update checklist to cover both E2E and integration scenarios

Co-authored-by: mikaelweave <753437+mikaelweave@users.noreply.github.com>
Copilot AI changed the title Replace task-level test retry with selective failed-test retry Add selective test retry to E2E and integration tests Jan 28, 2026
- Delete README.md (moved content to script comments)
- Delete VALIDATION.md (validation complete)
- Enhanced Invoke-TestWithRetry.ps1 with comprehensive inline documentation:
  - Added WHERE IT'S USED section
  - Added BENEFITS OVER retryCountOnTaskFailure section
  - Added HOW IT WORKS section
  - Added CODE COVERAGE SUPPORT section
  - Added E2E and Integration test examples
  - Added notes about PublishTestResults@2 usage
- All necessary documentation now in PowerShell script for maintainability

Co-authored-by: mikaelweave <753437+mikaelweave@users.noreply.github.com>
Copilot AI changed the title Add selective test retry to E2E and integration tests Consolidate test retry documentation into PowerShell script Jan 28, 2026
Copilot stopped work on behalf of mikaelweave due to an error January 28, 2026 20:41
- Add AllowPtrToDetectTestRunRetryFiles: true to build-variables.yml
- This enables Azure DevOps to detect retry attempts from multiple TRX files
- Tests that fail initially but pass on retry now show as "Passed with Retry"
- Update PowerShell script comments to document this feature
- Reference: https://devblogs.microsoft.com/dotnet/microsoft-testing-platform-azure-retry/

Co-authored-by: mikaelweave <753437+mikaelweave@users.noreply.github.com>
Copilot AI changed the title Consolidate test retry documentation into PowerShell script Enable retry-aware test publishing in Azure DevOps Jan 28, 2026
@mikaelweave
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mikaelweave
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mikaelweave
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mikaelweave
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mikaelweave
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mikaelweave
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mikaelweave
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mikaelweave
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mikaelweave
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Co-Authored-By: Warp <agent@warp.dev>
@mikaelweave
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).


- task: DotNetCoreCLI@2
displayName: 'Export E2E ${{ parameters.version }} CosmosDB'
retryCountOnTaskFailure: 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason we removed retry logic here, but did not update to use the Invoke-TestWithRetry.ps1 vs. continuing to u se DotNetCoreCli?

@mikaelweave
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@mikaelweave mikaelweave closed this Feb 6, 2026
@mikaelweave mikaelweave changed the title Add selective test retry for E2E/integration tests [Abandoned] Add selective test retry for E2E/integration tests Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Build No-ADR ADR not needed No-PaaS-breaking-change Open source This change is only relevant to the OSS code or release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants