Skip to content

Set Metrics Fix: disable the metrics server in indexer testing#976

Merged
github-actions[bot] merged 1 commit intollm-d:mainfrom
Gregory-Pereira:fix-metrics-serer-for-indexer-testing
Apr 6, 2026
Merged

Set Metrics Fix: disable the metrics server in indexer testing#976
github-actions[bot] merged 1 commit intollm-d:mainfrom
Gregory-Pereira:fix-metrics-serer-for-indexer-testing

Conversation

@Gregory-Pereira
Copy link
Copy Markdown
Member

cc @shuynh2017 @lionelvillard @asm582 @vivekk16

The test manager was created with default manager.Options{}, which starts a metrics server on :8080. The metrics server was failing to bind (likely port conflict), causing mgr.Start() to return immediately. This shut down the manager's cache, so subsequent tests using the cached client could never find any objects.

Before:

make test
/Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
cp config/crd/bases/llmd.ai_variantautoscalings.yaml charts/workload-variant-autoscaler/crds/llmd.ai_variantautoscalings.yaml
/Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
internal/config/saturation_scaling.go
internal/engines/pipeline/enforcer.go
internal/engines/pipeline/greedy_score_optimizer_test.go
test/e2e/scale_from_zero_test.go
go vet ./...
Setting up envtest binaries for Kubernetes version 1.34...
/Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/bin/k8s/1.34.1-darwin-arm64KUBEBUILDER_ASSETS="/Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/bin/k8s/1.34.1-darwin-arm64" PATH=/Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/bin:/Users/gregpereirapereira/.gvm/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/gregpereirapereira/.pyenv/shims:/Users/gregpereirapereira/.local/bin:/Users/gregpereirapereira/bin:/Users/gregpereirapereira/scripts:/Users/gregpereirapereira/.cargo/bin:/Users/gregpereirapereira/go/bin:/Users/gregpereirapereira/.gvm/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Applications/Docker.app/Contents/Resources/bin:/Users/gregpereirapereira/Documents/tech/Work/red-hat/code/claude-workflow-building:/Users/gregpereirapereira/bin go test $(go list ./... | grep -v /e2e | grep -v /benchmark) -coverprofile cover.out
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/api/v1alpha1	(cached)	coverage: 41.9% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/cmd		coverage: 0.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/actuator	6.259s	coverage: 73.8% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/collector		coverage: 0.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/collector/registration	(cached)	coverage: 56.4% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/collector/source	(cached)	coverage: 18.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/collector/source/pod	(cached)	coverage: 91.6% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/collector/source/prometheus	(cached)	coverage: 67.6% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/config	(cached)	coverage: 64.0% of statements
?   	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/constants	[no test files]
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/controller	7.839s	coverage: 56.9% of statements
Running Suite: Indexers Suite - /Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/internal/controller/indexers
==================================================================================================================================================================
Random Seed: 1775233048

Will run 14 of 14 specs
•
------------------------------
• [FAILED] [3.023 seconds]
Indexers FindVAForDeployment [It] should return VA targeting a specific deployment
/Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/internal/controller/indexers/indexers_test.go:149

  Timeline >>
  2026-04-03T09:17:30-07:00	INFO	Stopping and waiting for non leader election runnables
  2026-04-03T09:17:30-07:00	INFO	Stopping and waiting for warmup runnables
  2026-04-03T09:17:30-07:00	INFO	Stopping and waiting for leader election runnables
  2026-04-03T09:17:30-07:00	INFO	Stopping and waiting for caches
  2026-04-03T09:17:30-07:00	INFO	Stopping and waiting for webhooks
  2026-04-03T09:17:30-07:00	INFO	Stopping and waiting for HTTP servers
  2026-04-03T09:17:30-07:00	INFO	Wait completed, proceeding to shutdown the manager
  [FAILED] in [It] - /Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/internal/controller/indexers/indexers_test.go:156 @ 04/03/26 09:17:33.479
  << Timeline

  [FAILED] Timed out after 1.000s.
  Expected
      <string>:
  to equal
      <string>: va-1
  In [It] at: /Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/internal/controller/indexers/indexers_test.go:156 @ 04/03/26 09:17:33.479
------------------------------
SSSSSSSSSSSS

Summarizing 1 Failure:
  [FAIL] Indexers FindVAForDeployment [It] should return VA targeting a specific deployment
  /Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/internal/controller/indexers/indexers_test.go:156

Ran 2 of 14 Specs in 6.446 seconds
FAIL! -- 1 Passed | 1 Failed | 0 Pending | 12 Skipped
--- FAIL: TestIndexers (6.45s)
FAIL
coverage: 36.0% of statements
FAIL	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/controller/indexers	8.338s
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/datastore	(cached)	coverage: 43.6% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/discovery	(cached)	coverage: 73.9% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/analyzers/queueingmodel	(cached)	coverage: 14.6% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/analyzers/queueingmodel/tuner		coverage: 0.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/analyzers/saturation_v2	(cached)	coverage: 92.9% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/common	(cached)	coverage: 100.0% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/executor		coverage: 0.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/pipeline	(cached)	coverage: 90.5% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/saturation	9.017s	coverage: 36.8% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/scalefromzero	(cached)	coverage: 43.3% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/interfaces	(cached)	coverage: 70.6% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/logging		coverage: 0.0% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/metrics		coverage: 0.0% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/modelanalyzer		coverage: 0.0% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/resources		coverage: 0.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/saturation	(cached)	coverage: 92.9% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/utils	(cached)	coverage: 27.5% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/utils/pool	(cached)	coverage: 60.7% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/utils/scaletarget	(cached)	coverage: 96.6% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/pkg/analyzer	(cached)	coverage: 90.9% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/pkg/config	(cached)	coverage: 100.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/pkg/core	(cached)	coverage: 93.9% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/pkg/manager	(cached)	coverage: 100.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/pkg/solver	(cached)	coverage: 97.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/test/chart	2.498s	coverage: [no statements]
	github.com/llm-d/llm-d-workload-variant-autoscaler/test/testconfig		coverage: 0.0% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/test/utils		coverage: 0.0% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/test/utils/resources		coverage: 0.0% of statements
FAIL
make: *** [test] Error 1

After:

make test
/Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
cp config/crd/bases/llmd.ai_variantautoscalings.yaml charts/workload-variant-autoscaler/crds/llmd.ai_variantautoscalings.yaml
/Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
Setting up envtest binaries for Kubernetes version 1.34...
/Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/bin/k8s/1.34.1-darwin-arm64KUBEBUILDER_ASSETS="/Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/bin/k8s/1.34.1-darwin-arm64" PATH=/Users/gregpereirapereira/Documents/tech/work/red-hat/code/opendatahub-io/workload-variant-autoscaler/bin:/Users/gregpereirapereira/.gvm/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/gregpereirapereira/.pyenv/shims:/Users/gregpereirapereira/.local/bin:/Users/gregpereirapereira/bin:/Users/gregpereirapereira/scripts:/Users/gregpereirapereira/.cargo/bin:/Users/gregpereirapereira/go/bin:/Users/gregpereirapereira/.gvm/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Applications/Docker.app/Contents/Resources/bin:/Users/gregpereirapereira/Documents/tech/Work/red-hat/code/claude-workflow-building:/Users/gregpereirapereira/bin go test $(go list ./... | grep -v /e2e | grep -v /benchmark) -coverprofile cover.out
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/api/v1alpha1	0.616s	coverage: 41.9% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/cmd		coverage: 0.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/actuator	18.617s	coverage: 73.8% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/collector		coverage: 0.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/collector/registration	8.543s	coverage: 56.4% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/collector/source	1.270s	coverage: 18.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/collector/source/pod	4.838s	coverage: 91.6% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/collector/source/prometheus	2.829s	coverage: 67.6% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/config	3.420s	coverage: 64.0% of statements
?   	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/constants	[no test files]
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/controller	19.395s	coverage: 56.9% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/controller/indexers	18.858s	coverage: 76.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/datastore	5.545s	coverage: 43.6% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/discovery	(cached)	coverage: 73.9% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/analyzers/queueingmodel	4.950s	coverage: 14.6% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/analyzers/queueingmodel/tuner		coverage: 0.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/analyzers/saturation_v2	7.354s	coverage: 92.9% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/common	8.383s	coverage: 100.0% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/executor		coverage: 0.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/pipeline	6.846s	coverage: 90.5% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/saturation	18.648s	coverage: 36.8% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/engines/scalefromzero	6.401s	coverage: 43.3% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/interfaces	5.750s	coverage: 70.6% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/logging		coverage: 0.0% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/metrics		coverage: 0.0% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/modelanalyzer		coverage: 0.0% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/resources		coverage: 0.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/saturation	8.499s	coverage: 92.9% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/utils	18.133s	coverage: 27.5% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/utils/pool	9.354s	coverage: 60.7% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/internal/utils/scaletarget	10.181s	coverage: 96.6% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/pkg/analyzer	(cached)	coverage: 90.9% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/pkg/config	(cached)	coverage: 100.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/pkg/core	(cached)	coverage: 93.9% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/pkg/manager	(cached)	coverage: 100.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/pkg/solver	(cached)	coverage: 97.0% of statements
ok  	github.com/llm-d/llm-d-workload-variant-autoscaler/test/chart	11.237s	coverage: [no statements]
	github.com/llm-d/llm-d-workload-variant-autoscaler/test/testconfig		coverage: 0.0% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/test/utils		coverage: 0.0% of statements
	github.com/llm-d/llm-d-workload-variant-autoscaler/test/utils/resources		coverage: 0.0% of statements

…e metrics server in tests, since it's not needed for indexer testing.

Signed-off-by: greg pereira <grpereir@redhat.com>
@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Apr 3, 2026

/lgtm
/approve

@github-actions github-actions bot added the lgtm Looks good to me, indicates that a PR is ready to be merged. label Apr 3, 2026
@lionelvillard
Copy link
Copy Markdown
Collaborator

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 45 5
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@github-actions github-actions bot merged commit aa7e6d2 into llm-d:main Apr 6, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm Looks good to me, indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants