Cleanup and fix E2E tests and metric emission by WheelyMcBones · Pull Request #633 · llm-d/llm-d-workload-variant-autoscaler

WheelyMcBones · 2026-01-24T10:43:04Z

Per title, this PR:

Cleans up outdated E2E tests
In E2Es and unit tests, distinguish between the VariantAutoscaling name and its scaleTargetRef name.
For compatibility with the tests and coherency, changes the metric emission label to use the VariantAutoscaling name as variant_name, instead of the scaleTargetRef name (closes External metrics should be labelled with VariantAutoscaling name #630).

Copilot

Pull request overview

This pull request cleans up outdated end-to-end tests and fixes metric emission to use the VariantAutoscaling name instead of the scaleTargetRef (deployment) name. This change addresses issue #630, ensuring external metrics are correctly labeled with the VariantAutoscaling resource name for proper identification and HPA selector compatibility.

Changes:

Modified metric emission in internal/metrics/metrics.go to use va.Name instead of va.GetScaleTargetName() for the variant_name label
Updated test utility function CreateVariantAutoscalingResource to distinguish between VariantAutoscaling name and its scaleTargetRef deployment name
Removed outdated E2E test files (test/e2e/e2e_test.go and test/e2e/e2e_suite_test.go)
Updated all saturation-based E2E tests to use the new function signatures with separate VA and deployment names
Updated unit tests to reflect the distinction between VariantAutoscaling name and deployment name

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
internal/metrics/metrics.go	Changed metric emission to use VariantAutoscaling name for variant_name label, fixing issue #630
test/utils/e2eutils.go	Added scaleTargetRefName parameter to CreateVariantAutoscalingResource function; removed obsolete commented code
test/e2e/e2e_test.go	Deleted outdated E2E test file (1546 lines removed)
test/e2e/e2e_suite_test.go	Deleted outdated E2E suite file (148 lines removed)
test/e2e-saturation-based/e2e_scale_to_zero_test.go	Updated to distinguish between VA name and deployment name in test setup and assertions
test/e2e-saturation-based/e2e_scale_from_zero_test.go	Updated to use separate VA and deployment names in CreateVariantAutoscalingResource calls
test/e2e-saturation-based/e2e_saturation_test.go	Updated single and multiple VA tests to properly distinguish VA names from deployment names
test/e2e-saturation-based/e2e_limiter_test.go	Updated limiter tests to use correct VA name for resource lookups
internal/engines/scalefromzero/engine_test.go	Updated unit tests to pass separate VA and deployment names to test utility functions

WheelyMcBones · 2026-01-24T12:35:31Z

Now, the limiter config in E2Es is properly checked by the controller, and constrained scale-out is checked, closing #635.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

WheelyMcBones · 2026-01-24T16:55:53Z

After properly configuring the limiter tests, the CI tests for the amd GPU type failed since the limiter is able to discover only NVIDIA ones properly:
https://github.com/llm-d-incubation/workload-variant-autoscaler/blob/27e584455227fb91c45c3b56da6c469b8810e1f8/internal/discovery/k8s_with_gpu_operator.go#L34-L44
used by:
https://github.com/llm-d-incubation/workload-variant-autoscaler/blob/27e584455227fb91c45c3b56da6c469b8810e1f8/internal/engines/saturation/engine.go#L93-L96
Therefore, the E2E tests with the limiter are skipped (when the GPU type is not "nvidia") waiting for proper discovery to be implemented for other GPU types.

WheelyMcBones · 2026-01-26T16:35:47Z

/ok-to-test

github-actions · 2026-01-26T16:35:58Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

…ot be equal to corresponding Deployment

This reverts commit b4f3388.

… vendors other than nvidia

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Copilot · 2026-01-29T13:17:47Z

+		err := k8sClient.CoreV1().ConfigMaps(controllerNamespace).Delete(ctx, scaleToZeroConfigMapName, metav1.DeleteOptions{})
+		Expect(client.IgnoreNotFound(err)).NotTo(HaveOccurred(), fmt.Sprintf("Should be able to delete existing scale-to-zero ConfigMap: %s", scaleToZeroConfigMapName))


The error handling has been improved by capturing and checking the error from the Delete operation. However, the comment on line 110 mentions "Delete existing ConfigMap if it exists", but the code doesn't actually ignore NotFound errors before the check. Consider using client.IgnoreNotFound(err) directly on line 112 instead of checking it separately, or remove the comment since the new implementation expects the ConfigMap to exist.

WheelyMcBones · 2026-01-29T14:40:07Z

/ok-to-test

github-actions · 2026-01-29T14:40:18Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

WheelyMcBones · 2026-01-29T17:16:11Z

/ok-to-test

github-actions · 2026-01-29T17:16:22Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

WheelyMcBones · 2026-01-29T17:51:04Z

/ok-to-test

github-actions · 2026-01-29T17:51:13Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

…ame instead of Deployment

WheelyMcBones · 2026-01-29T18:46:57Z

/ok-to-test

github-actions · 2026-01-29T18:47:08Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

WheelyMcBones · 2026-01-29T21:40:01Z

@asm582 @lionelvillard FYI E2Es passed on this one, marked it ready for review. Thanks!

asm582

/lgtm

WheelyMcBones self-assigned this Jan 24, 2026

Copilot AI review requested due to automatic review settings January 24, 2026 10:43

Copilot started reviewing on behalf of WheelyMcBones January 24, 2026 10:43 View session

Copilot AI reviewed Jan 24, 2026

View reviewed changes

WheelyMcBones changed the title ~~Cleanup E2E tests and metric emission~~ Cleanup and fix E2E tests and metric emission Jan 24, 2026

WheelyMcBones linked an issue Jan 24, 2026 that may be closed by this pull request

Limiter E2E tests do not properly check for constrained scale out #635

Closed

WheelyMcBones marked this pull request as ready for review January 24, 2026 12:35

Copilot AI review requested due to automatic review settings January 24, 2026 12:35

Copilot started reviewing on behalf of WheelyMcBones January 24, 2026 12:36 View session

Copilot AI reviewed Jan 24, 2026

View reviewed changes

WheelyMcBones added the WIP label Jan 24, 2026

Copilot AI review requested due to automatic review settings January 24, 2026 16:48

Copilot started reviewing on behalf of WheelyMcBones January 24, 2026 16:49 View session

Copilot AI reviewed Jan 24, 2026

View reviewed changes

WheelyMcBones removed the WIP label Jan 24, 2026

WheelyMcBones requested a review from asm582 January 26, 2026 15:49

WheelyMcBones force-pushed the e2e-cleanup branch from 0d50e7c to da9878c Compare January 26, 2026 16:22

WheelyMcBones added 9 commits January 29, 2026 14:11

refactor(test): cleanup and refactor E2E tests, changed VA names to n…

77247cf

…ot be equal to corresponding Deployment

changed variant_name label to match VariantAutoscaling name

4cbe71f

removed outdated unused tests

aa5f4f1

test: fixed VA creation in new E2Es and scale-from-zero unit tests

4942219

fix(test): fixed node selector labels for E2Es with limiter

96a14ce

fix(test): added explicit constrained scale-out check

4440637

fix(test): added controller restart to load limiter config

82d35fc

Revert "fix(test): fixed node selector labels for E2Es with limiter"

6aa04bb

This reverts commit b4f3388.

test(e2e-limiter): skipping limiter test due to discovery not finding…

3d4520e

… vendors other than nvidia

todo comment

2cd1e67

WheelyMcBones force-pushed the e2e-cleanup branch from da9878c to 2cd1e67 Compare January 29, 2026 13:12

Copilot AI review requested due to automatic review settings January 29, 2026 13:12

Copilot started reviewing on behalf of WheelyMcBones January 29, 2026 13:12 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

test(ocp): changed metric ref to VariantAutoscaling name

8d8dfc7

test(ocp): cleanup and split replica check

f9c754c

Copilot AI review requested due to automatic review settings January 29, 2026 17:50

Copilot started reviewing on behalf of WheelyMcBones January 29, 2026 17:50 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

WheelyMcBones added 2 commits January 29, 2026 19:24

test(ocp): remove unused functions

20a36fc

chart/samples: changed HPA metric label to match VariantAutoscaling n…

edf534a

…ame instead of Deployment

WheelyMcBones added ready-for-review Signal that changes are ready for review e2e passing labels Jan 29, 2026

asm582 approved these changes Feb 2, 2026

View reviewed changes

WheelyMcBones merged commit 1751bfa into llm-d:main Feb 3, 2026
8 checks passed

		err := k8sClient.CoreV1().ConfigMaps(controllerNamespace).Delete(ctx, scaleToZeroConfigMapName, metav1.DeleteOptions{})
		Expect(client.IgnoreNotFound(err)).NotTo(HaveOccurred(), fmt.Sprintf("Should be able to delete existing scale-to-zero ConfigMap: %s", scaleToZeroConfigMapName))

Conversation

WheelyMcBones commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

WheelyMcBones commented Jan 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

WheelyMcBones commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WheelyMcBones commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

WheelyMcBones commented Jan 29, 2026

Uh oh!

github-actions bot commented Jan 29, 2026

Uh oh!

WheelyMcBones commented Jan 29, 2026

Uh oh!

github-actions bot commented Jan 29, 2026

Uh oh!

WheelyMcBones commented Jan 29, 2026

Uh oh!

github-actions bot commented Jan 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

WheelyMcBones commented Jan 29, 2026

Uh oh!

github-actions bot commented Jan 29, 2026

Uh oh!

WheelyMcBones commented Jan 29, 2026

Uh oh!

asm582 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WheelyMcBones commented Jan 24, 2026 •

edited

Loading

WheelyMcBones commented Jan 24, 2026 •

edited

Loading