Skip to content

Conversation

@gflarity
Copy link
Contributor

@gflarity gflarity commented Jan 16, 2026

What type of PR is this?

Test Infra

What this PR does / why we need it:

Occasionally E2E will fail to even create the test cluster:

2026-01-16T01:15:32.899Z	INFO	setup/shared_cluster.go:154	🚀 Setting up shared k3d cluster for all e2e tests...
2026-01-16T01:16:30.853Z	INFO	setup/k8s_clusters.go:453	✅ Cluster deleted successfully
2026-01-16T01:16:30.853Z	ERROR	tests/main_test.go:44	failed to setup shared cluster: failed to setup shared k3d cluster: failed to create cluster: Failed Cluster Start: Failed to add one or more agents: Node k3d-shared-e2e-test-cluster-agent-16 failed to get ready: error waiting for log line `successfully registered node` from node 'k3d-shared-e2e-test-cluster-agent-16': stopped returning log lines: node k3d-shared-e2e-test-cluster-agent-16 is running=true in status=restarting
FAIL	github.com/ai-dynamo/grove/operator/e2e/tests	57.982s

It's not clear why this would happen, and it happens very rarely. But what we can do is just retry like you would if you were creating the cluster by hand.

Also, sometimes the test can hang and take the full test timeout. This was fixed as well.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a API change?

NONE

Additional documentation e.g., enhancement proposals, usage docs, etc.:


@gflarity gflarity force-pushed the cluster_creation_retry branch 3 times, most recently from 0c95b7f to 006f804 Compare January 18, 2026 18:16
@gflarity gflarity force-pushed the cluster_creation_retry branch from 006f804 to a9bb759 Compare January 19, 2026 14:34
@gflarity gflarity force-pushed the cluster_creation_retry branch from 97611b7 to 548fcc4 Compare January 22, 2026 18:31
@gflarity gflarity merged commit e0cd00e into ai-dynamo:main Jan 22, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants