Skip to content

Conversation

@gflarity
Copy link
Contributor

What type of PR is this?

Developer Tools

What this PR does / why we need it:

This PR adds a new setup-debug-cluster CLI command that creates a K3D cluster identical to the one used in E2E tests. This enables developers to easily reproduce E2E test environments locally for debugging purposes.

The command handles all setup steps including:

  • Creating the K3D cluster with the same configuration as E2E tests
  • Setting up a Docker registry
  • Pre-pulling and caching test images
  • Deploying Grove operator via Skaffold
  • Installing Kai scheduler via Helm

Key changes:

  • New operator/e2e/cmd/setup-debug-cluster/main.go CLI tool using Kong for argument parsing
  • Refactored DefaultClusterConfig() in k8s_clusters.go to be the single source of truth for cluster configuration (including node labels and taints)
  • Added DeleteCluster() and GetKubeconfig() helper functions for reuse
  • Updated shared_cluster.go to use DefaultClusterConfig() with overrides instead of duplicating configuration
  • Improved registry cleanup to check for port conflicts with other containers

Which issue(s) this PR fixes:

NONE

Special notes for your reviewer:

The CLI supports all the same configuration options as the E2E cluster setup, with sensible defaults. When run interactively, it waits for Ctrl+C and then tears down the cluster. When run non-interactively, it leaves the cluster running for manual teardown.

Example usage:

cd operator/e2e/cmd/setup-debug-cluster
go run . -v

Does this PR introduce a API change?

NONE

Additional documentation e.g., enhancement proposals, usage docs, etc.:

NONE

@gflarity gflarity force-pushed the setup_debug_cluster branch from 521b477 to 7cd31f0 Compare January 22, 2026 16:47
@gflarity gflarity merged commit 5428cdd into ai-dynamo:main Jan 22, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants