Skip to content

Conversation

@jduo
Copy link
Collaborator

@jduo jduo commented Dec 15, 2025

Summary

This PR enables Java integration tests to run on Windows using WSL (Windows Subsystem for Linux), removing the previous Windows exclusion from CI workflows. It adds WSL support for cluster management, test execution, and process cleanup.

Due to the limited resources available on GitHub Windows runners and the additional overhead of running through WSL, several optimizations were implemented:

  • Reduced Replica Count: Windows CI uses 0 replicas instead of the standard 1-4 replicas to minimize resource usage and avoid replica synchronization issues that can occur in resource-constrained environments
  • Increased Timeouts: Test timeouts were increased across multiple test classes:
    • Request timeouts: 10s → 20s for better stability under WSL
    • Test class timeouts: 10s → 15s/20s to account for WSL overhead
  • Reduced Data Sizes: Large data tests reduced from 1 << 16 (64KB) to 1 << 15 (32KB) to prevent memory pressure on Windows runners
  • WSL Overhead Considerations: The virtualization layer adds latency and resource overhead, requiring these adjustments for reliable test execution

Changes Made

CI/CD Workflow Updates

  • .github/workflows/install-shared-dependencies/action.yml: Added WSL setup with Ubuntu 22.04 and direct Valkey installation in WSL for Windows runners
  • .github/workflows/install-engine/action.yml: Added x86_64-pc-windows-msvc target support
  • .github/workflows/java-cd.yml: Removed Windows exclusion from integration tests (previously skipped with -x :integTest:test)
  • .github/workflows/java.yml: Enabled integration tests on Windows and excluded only Redis 6.2 on Windows

Java Integration Test Infrastructure

  • java/integTest/build.gradle:

    • Added getClusterCommand() helper to wrap commands with wsl prefix on Windows
    • Updated all cluster management tasks to use WSL on Windows
    • Added comprehensive cleanup in gradle.buildFinished to kill orphaned Valkey/Redis processes
    • Configured Windows to use 0 replicas (avoiding replica sync issues in CI)
    • All Python cluster_manager.py invocations now route through WSL on Windows
  • java/integTest/src/test/java/glide/cluster/ValkeyCluster.java:

    • Added getScriptPath() method to convert Windows paths to WSL format (C:\path/mnt/c/path)
    • Updated cluster start/stop commands to use wsl prefix on Windows

Test Adjustments for Windows

  • java/integTest/src/test/java/glide/TestUtilities.java: Added validation for empty host strings and improved error messages for invalid host formats
  • java/integTest/src/test/java/glide/ConnectionTests.java: Adjusted AZ affinity test to expect 0 replicas on Windows WSL CI
  • java/integTest/src/test/java/glide/SharedCommandTests.java: Updated waitTest and wait_timeout_check to handle 0 replicas on Windows WSL CI
  • java/integTest/src/test/java/glide/SharedClientTests.java:
    • Increased request timeout from 10s to 20s for Windows WSL CI stability
    • Reduced data size tests from 1 << 16 to 1 << 15 for Windows WSL CI compatibility
  • java/integTest/src/test/java/glide/ErrorHandlingTests.java: Increased timeout from 10s to 15s
  • java/integTest/src/test/java/glide/standalone/BatchTests.java: Increased timeout from 10s to 20s

Issue link

This Pull Request is linked to issue (URL): [REPLACE ME]

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one issue.
  • Commit message has a detailed description of what changed and why.
  • Tests are added or updated.
  • CHANGELOG.md and documentation files are updated.
  • Destination branch is correct - main or release
  • Create merge commit if merging release branch into main, squash otherwise.

jduo added 3 commits December 15, 2025 14:16
- Add WSL setup in GitHub workflow for Windows runners
- Enable integration tests on Windows (remove -x :integTest:test)
- Add WSL command wrapper for all cluster_manager.py calls
- Disable ValkeyCluster.close() server shutdown
- Add comprehensive cleanup after all tests complete
- Use pkill to find and stop all running servers in WSL

This enables Java integration tests to run on Windows using WSL
without Docker networking issues or premature server shutdown.

Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
- Restore missing plugins block
- Fix variable scope in gradle.buildFinished block
- Use local isWindowsRuntime variable instead of project-level isWindows

Signed-off-by: James Duong <[email protected]>
@jduo jduo requested a review from a team as a code owner December 15, 2025 22:17
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
- Configure cluster to use zero replicas on Windows to avoid sync issues
- Increase request timeout from 10s to 60s for cluster client operations
- Increase ErrorHandlingTests timeout from 10s to 15s for Windows CI
- Remove class-level timeout annotation from SharedClientTests
- Update wait command test expectations to account for zero replicas on Windows
- Add --keep-folder flag to cluster cleanup to preserve state between runs
- Adjust AZ cluster test to use conditional replica counts based on platform

Signed-off-by: affonsov <[email protected]>
- updateding java-cd
- reducing the size clientAndDataSize to not stress wsl on github runner
- reverting changs on cluster_manager.py

Signed-off-by: affonsov <[email protected]>
Signed-off-by: affonsov <[email protected]>
@jduo jduo changed the title Prototyping WSL workflow Enable Windows integration test in workflow through WSL Jan 1, 2026
target: ${{ inputs.target }}
github-token: ${{ inputs.github-token }}

- name: Install engine
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this name to more clearly show that it only applies to Linux and Mac.

exec {
workingDir "${project.rootDir}/../utils"
commandLine 'python3', 'cluster_manager.py', '--tls', 'start', '-r', '0'
commandLine getClusterCommand(['python3', 'cluster_manager.py', '--tls', 'start', '-r', '0'])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to get Windows WSL to run if replicas by changing what we wait for when syncing for topologies, but I don't remember now. I also had to run WSL with the networking mode set to mirrored, though that may not be available with Github Actions. Maybe we raise a separate issue for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not an issue in local WSL, this problem only happen in the Github actions.
In Github actions or the replicas hang, or they take too much time to sync

Copy link
Collaborator

@yipin-chen yipin-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We need to resolve CI error.

@affonsov affonsov changed the base branch from release-2.2 to main January 2, 2026 17:49
@affonsov affonsov changed the base branch from main to release-2.2 January 2, 2026 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants