Skip to content

Cleanup and fix E2E tests and metric emission#633

Merged
WheelyMcBones merged 14 commits intollm-d:mainfrom
WheelyMcBones:e2e-cleanup
Feb 3, 2026
Merged

Cleanup and fix E2E tests and metric emission#633
WheelyMcBones merged 14 commits intollm-d:mainfrom
WheelyMcBones:e2e-cleanup

Conversation

@WheelyMcBones
Copy link
Copy Markdown
Collaborator

@WheelyMcBones WheelyMcBones commented Jan 24, 2026

Per title, this PR:

  • Cleans up outdated E2E tests
  • In E2Es and unit tests, distinguish between the VariantAutoscaling name and its scaleTargetRef name.
  • For compatibility with the tests and coherency, changes the metric emission label to use the VariantAutoscaling name as variant_name, instead of the scaleTargetRef name (closes External metrics should be labelled with VariantAutoscaling name #630).

@WheelyMcBones WheelyMcBones self-assigned this Jan 24, 2026
Copilot AI review requested due to automatic review settings January 24, 2026 10:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request cleans up outdated end-to-end tests and fixes metric emission to use the VariantAutoscaling name instead of the scaleTargetRef (deployment) name. This change addresses issue #630, ensuring external metrics are correctly labeled with the VariantAutoscaling resource name for proper identification and HPA selector compatibility.

Changes:

  • Modified metric emission in internal/metrics/metrics.go to use va.Name instead of va.GetScaleTargetName() for the variant_name label
  • Updated test utility function CreateVariantAutoscalingResource to distinguish between VariantAutoscaling name and its scaleTargetRef deployment name
  • Removed outdated E2E test files (test/e2e/e2e_test.go and test/e2e/e2e_suite_test.go)
  • Updated all saturation-based E2E tests to use the new function signatures with separate VA and deployment names
  • Updated unit tests to reflect the distinction between VariantAutoscaling name and deployment name

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
internal/metrics/metrics.go Changed metric emission to use VariantAutoscaling name for variant_name label, fixing issue #630
test/utils/e2eutils.go Added scaleTargetRefName parameter to CreateVariantAutoscalingResource function; removed obsolete commented code
test/e2e/e2e_test.go Deleted outdated E2E test file (1546 lines removed)
test/e2e/e2e_suite_test.go Deleted outdated E2E suite file (148 lines removed)
test/e2e-saturation-based/e2e_scale_to_zero_test.go Updated to distinguish between VA name and deployment name in test setup and assertions
test/e2e-saturation-based/e2e_scale_from_zero_test.go Updated to use separate VA and deployment names in CreateVariantAutoscalingResource calls
test/e2e-saturation-based/e2e_saturation_test.go Updated single and multiple VA tests to properly distinguish VA names from deployment names
test/e2e-saturation-based/e2e_limiter_test.go Updated limiter tests to use correct VA name for resource lookups
internal/engines/scalefromzero/engine_test.go Updated unit tests to pass separate VA and deployment names to test utility functions

@WheelyMcBones WheelyMcBones changed the title Cleanup E2E tests and metric emission Cleanup and fix E2E tests and metric emission Jan 24, 2026
@WheelyMcBones
Copy link
Copy Markdown
Collaborator Author

Now, the limiter config in E2Es is properly checked by the controller, and constrained scale-out is checked, closing #635.

@WheelyMcBones WheelyMcBones linked an issue Jan 24, 2026 that may be closed by this pull request
@WheelyMcBones WheelyMcBones marked this pull request as ready for review January 24, 2026 12:35
Copilot AI review requested due to automatic review settings January 24, 2026 12:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings January 24, 2026 16:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

@WheelyMcBones
Copy link
Copy Markdown
Collaborator Author

WheelyMcBones commented Jan 24, 2026

After properly configuring the limiter tests, the CI tests for the amd GPU type failed since the limiter is able to discover only NVIDIA ones properly:
https://github.com/llm-d-incubation/workload-variant-autoscaler/blob/27e584455227fb91c45c3b56da6c469b8810e1f8/internal/discovery/k8s_with_gpu_operator.go#L34-L44
used by:
https://github.com/llm-d-incubation/workload-variant-autoscaler/blob/27e584455227fb91c45c3b56da6c469b8810e1f8/internal/engines/saturation/engine.go#L93-L96
Therefore, the E2E tests with the limiter are skipped (when the GPU type is not "nvidia") waiting for proper discovery to be implemented for other GPU types.

@WheelyMcBones
Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

Copilot AI review requested due to automatic review settings January 29, 2026 13:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment on lines +111 to +112
err := k8sClient.CoreV1().ConfigMaps(controllerNamespace).Delete(ctx, scaleToZeroConfigMapName, metav1.DeleteOptions{})
Expect(client.IgnoreNotFound(err)).NotTo(HaveOccurred(), fmt.Sprintf("Should be able to delete existing scale-to-zero ConfigMap: %s", scaleToZeroConfigMapName))
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling has been improved by capturing and checking the error from the Delete operation. However, the comment on line 110 mentions "Delete existing ConfigMap if it exists", but the code doesn't actually ignore NotFound errors before the check. Consider using client.IgnoreNotFound(err) directly on line 112 instead of checking it separately, or remove the comment since the new implementation expects the ConfigMap to exist.

Copilot uses AI. Check for mistakes.
@WheelyMcBones
Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

@WheelyMcBones
Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

Copilot AI review requested due to automatic review settings January 29, 2026 17:50
@WheelyMcBones
Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

@WheelyMcBones
Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

@WheelyMcBones WheelyMcBones added ready-for-review Signal that changes are ready for review e2e passing labels Jan 29, 2026
@WheelyMcBones
Copy link
Copy Markdown
Collaborator Author

@asm582 @lionelvillard FYI E2Es passed on this one, marked it ready for review. Thanks!

Copy link
Copy Markdown
Collaborator

@asm582 asm582 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@WheelyMcBones WheelyMcBones merged commit 1751bfa into llm-d:main Feb 3, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

e2e passing ready-for-review Signal that changes are ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Limiter E2E tests do not properly check for constrained scale out External metrics should be labelled with VariantAutoscaling name

3 participants