Skip to content

Replace Docker cluster with cluster_manager.py#108

Merged
alexr-bq merged 19 commits into
mainfrom
cluster-test-infra
Jun 10, 2026
Merged

Replace Docker cluster with cluster_manager.py#108
alexr-bq merged 19 commits into
mainfrom
cluster-test-infra

Conversation

@alexr-bq

@alexr-bq alexr-bq commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Changes the cluster test methodology to align with other GLIDE clients, using cluster_manager.py to create valkey clusters for testing. In doing this, tried enabling a lot of tests in cluster mode that were previously not enabled, and in doing so found lots of issues with the tests. These tests have been explicitly skipped in cluster mode for now and will be handled in #101

This change:

  • Uses cluster_manager.py from core submodule for cluster setup
  • Refactors the test jobs so that test_cluster and test_standalone run against the core tests in /valkey and /lint
  • Removes some test files in /valkey that were just wrappers around tests in /lint
  • Adds test job that covers both cluster and standalone tests
  • Skips tests in cluster mode that don't work yet, to be addressed in Close gaps in cluster functionality #101

- Add Valkey::TestCluster class wrapping cluster_manager.py invocation
- Update Helper::Cluster to use TestCluster for dynamic cluster management
- Update Helper::Client with standalone server support via TestCluster
- Update CI workflow to use Python 3.11 + native Valkey installation
- Remove grokzen/redis-cluster Docker dependency
- Add rantly gem for property-based testing
- Fix hardcoded port 7000 reference in connection_options.rb

This aligns the Ruby client's cluster testing infrastructure with other
GLIDE clients (Python, Java, Go) by using the shared cluster_manager.py
script from valkey-glide/utils/.

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
@alexr-bq alexr-bq changed the title feat(ruby): replace Docker cluster with cluster_manager.py Replace Docker cluster with cluster_manager.py Jun 3, 2026
alexr-bq added 3 commits June 3, 2026 11:19
packages.valkey.io DNS resolution was failing in GitHub Actions.
Build Valkey 8.0.0 from source with TLS support, matching the
approach used by valkey-glide's install-engine action.

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
- Use ENV.fetch with nil default instead of ENV[] (FetchEnvVar)
- Use modifier unless for single-line conditional (IfUnlessModifier)
- Add test_cluster.rb to Metrics/ClassLength and MethodLength exclusions

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
- Rename test/valkey/ directory to test/standalone/
- Update Rakefile: test:valkey → test:standalone
- Default 'test' task now runs both test:standalone and test:cluster
- Update CI workflow to use test:standalone

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
@alexr-bq alexr-bq marked this pull request as ready for review June 3, 2026 19:13
alexr-bq added 9 commits June 3, 2026 18:37
- Rename test/standalone/ to test/valkey/ for shared modules
- Convert test classes to ValkeyTests::* modules (like Lint::*)
- Create new test/standalone/ with test classes that include modules
- Update test/cluster/cluster_commands_test.rb to include ValkeyTests modules
- Update test_helper.rb to load valkey shared test modules
- Fix assert_not_nil -> refute_nil for Minitest compatibility

Signed-off-by: Alex Rehnby-Martin <alexrema@amazon.com>
Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
- Split Lint modules with setup methods (GeoCommands, JsonCommands,
  ModuleCommands, VectorSearchCommands) into their own test classes
- Fix test isolation issue where GeoCommands.setup created 'Sicily' key
  that leaked into other tests like test_del
- Add ensure_otel_initialized to OpenTelemetry module setup to handle
  random test ordering
- Add skip for test_randomkey in cluster mode (requires isolated db)

Fixes standalone test failures caused by setup method conflicts when
multiple Lint modules were combined into a single test class.

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
- Skip cross-slot operations (smove, sinter, sdiff, zunion, zinter, etc.) in cluster mode
- Skip rename/renamenx in cluster mode (different hash slots)
- Skip lmove when using different keys across hash slots
- Skip EXEC/DISCARD without MULTI in cluster mode
- Fix function tests to clean up libraries before loading (rolib, policylib)
- Relax OpenTelemetry span count assertions to handle test ordering

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
These directories contain reusable test modules (not standalone test files)
that are included by test classes in test/standalone/ and test/cluster/.

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
- Skip all MULTI/EXEC transaction tests in cluster mode (connection routing issues)
- Skip geosearchstore test (source/destination keys may be on different slots)
- Skip sorting tests with GET/STORE (cross-slot operations)
- Skip eval/evalsha tests with random keys (cross-slot operations)
- Skip large_parameter_arrays test (50 keys on different slots)

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
Wait 200ms before cleaning test files to allow any buffered spans
from previous tests to flush. This fixes flaky span count assertions
caused by async span flushing with 100ms flush interval.

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
- Skip test_del, test_del_with_array_argument (untagged keys)
- Skip test_scan (may not see all keys in cluster mode)
- Skip test_select_database (behavior varies in cluster)
- Skip test_memory_malloc_stats (returns multi-node Array)
- Skip test_pfmerge (untagged keys foo/bar/res)
- Skip test_script_execution_consistency, test_parameter_round_trip_preservation
  (random keys may be on different slots)
- Skip destructive cluster slot management tests (addslotsrange, delslotsrange,
  addslots, delslots, setslot) to prevent cluster instability
- Fix statistics tests to use _new_client helper instead of hard-coded port 7000

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
Comment thread test/valkey/opentelemetry_test.rb
Comment thread test/valkey/statistics_test.rb
alexr-bq added 3 commits June 3, 2026 20:36
- Convert rescue modifier to begin/rescue blocks in function_commands.rb
- Fix comment annotation format in commands_test.rb

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
The tests were using Valkey.new which defaults to 127.0.0.1:6379,
but in cluster mode there's no standalone server at that address.
Use _new_client helper to get a properly configured client.

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
CLIENT KILL by address doesn't work reliably in cluster mode because
the command may be routed to a different node than where the client
is connected, resulting in 'No such client' errors.

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
Comment thread Gemfile Outdated
Comment thread test/support/helper/cluster.rb
Comment thread test/support/helper/client.rb Outdated
Comment thread test/support/helper/client.rb Outdated
Comment thread test/support/helper/cluster.rb
Comment thread test/support/helper/cluster.rb
Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
Comment thread test/support/test_cluster.rb
Comment thread Rakefile
Comment thread .github/workflows/CI.yml
alexr-bq added 2 commits June 10, 2026 11:37
- Remove unused rantly gem from Gemfile
- Remove unused start_server/stop_server methods from Helper::Client
- Move test_cluster.rb from lib/ to test/support/ (not shipped in gem)
- Update docs to use test:standalone instead of test:valkey
- Update cd.yml to use rake test:standalone
- Update .rubocop.yml to remove stale exclusions

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
@alexr-bq alexr-bq merged commit 6396496 into main Jun 10, 2026
23 checks passed
@alexr-bq alexr-bq deleted the cluster-test-infra branch June 10, 2026 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants