cleanup/ refactor deploy scripts by mamy-CS · Pull Request #959 · llm-d/llm-d-workload-variant-autoscaler

mamy-CS · 2026-03-31T19:54:41Z

This pr cleans up/ hardens the deploy scripts for better readability and maintainability (mostly moving around code. For the most part no new functionality). These scripts are only used for e2es and dev purposes.

Local full emulated kind e2e run:

[AfterSuite] PASSED [0.070 seconds]
------------------------------

Ran 31 of 34 Specs in 520.389 seconds
SUCCESS! -- 31 Passed | 0 Failed | 0 Pending | 3 Skipped
--- PASS: TestE2E (520.39s)
PASS
ok      github.com/llm-d/llm-d-workload-variant-autoscaler/test/e2e     520.907s

==========================================
Test execution completed. Exit code: 0
==========================================

mamy-CS · 2026-03-31T20:15:33Z

/ok-to-test

github-actions · 2026-03-31T20:15:42Z

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

github-actions · 2026-03-31T20:15:50Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-03-31T20:18:40Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	22	28

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

Copilot

Pull request overview

Refactors and hardens the developer/E2E deployment scripts by modularizing deploy/install.sh into focused deploy/lib/* helpers, and shifts e2e-only InferenceObjective management from the install flow into the scale-from-zero e2e suite.

Changes:

Modularize deploy logic into deploy/lib/* (CLI parsing, prereqs, orchestration, monitoring/scaler backends, llm-d/WVA deploy, cleanup, verification).
Update scale-from-zero e2e to apply/delete InferenceObjective via Go fixtures when the CRD exists; update related docs accordingly.
Add make lint-deploy-scripts / make smoke-deploy-scripts and update deploy docs to reflect infra-first defaults (VA/HPA opt-in).

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
test/e2e/scale_from_zero_test.go	Applies `InferenceObjective` from e2e setup (and cleans up) when API exists.
test/e2e/README.md	Documents new e2e responsibility for `InferenceObjective` creation.
test/e2e/fixtures/inference_objective.go	Adds dynamic-client fixture to create/delete `InferenceObjective`.
Makefile	Adds deploy-script lint and smoke targets.
docs/developer-guide/troubleshooting.md	Updates troubleshooting guidance around `InferenceObjective` ownership.
docs/developer-guide/testing.md	Updates infra/e2e deployment notes for flow-control + `InferenceObjective`.
deploy/README.md	Updates environment variables and examples (VA/HPA opt-in; INFRA_ONLY usage).
deploy/openshift/install.sh	Refactors OpenShift plugin to reuse shared namespace loop + CA extraction helper.
deploy/lib/wait_helpers.sh	Adds shared retry/wait helpers for deploy scripts.
deploy/lib/verify.sh	Adds verification + deployment summary helpers.
deploy/lib/scaler_runtime.sh	Adds scaler backend runtime helpers (KEDA/adapter + APIService guard).
deploy/lib/prereqs.sh	Extracts prerequisite checks and interactive prompts.
deploy/lib/kube_like_adapter.sh	Adds Kubernetes-like environment adapter for shared functions.
deploy/lib/install_core.sh	Extracts main install orchestration from `deploy/install.sh`.
deploy/lib/infra_wva.sh	Adds shared WVA deploy helpers + shared namespace loop + OpenShift CA extraction helper.
deploy/lib/infra_scaler_backend.sh	Adds shared scaler-backend orchestration wrapper.
deploy/lib/infra_monitoring.sh	Adds shared monitoring orchestration + optional Grafana deploy.
deploy/lib/infra_llmd.sh	Adds shared llm-d deployment helpers and e2e infra-only behavior adjustments.
deploy/lib/discovery.sh	Adds discovery helpers (GPU type; InferencePool API group detection).
deploy/lib/deploy_prometheus_kube_stack.sh	Adds shared kube-prometheus-stack install/uninstall for kube-like envs.
deploy/lib/constants.sh	Centralizes shared constants/selectors/timeouts for deploy scripts.
deploy/lib/common.sh	Centralizes logging + utility helpers.
deploy/lib/cli.sh	Centralizes CLI help + argument parsing.
deploy/lib/cleanup.sh	Centralizes undeploy/cleanup logic and scaler backend teardown.
deploy/kubernetes/README.md	Updates Kubernetes deploy examples to opt-in VA/HPA and INFRA_ONLY.
deploy/kubernetes/install.sh	Refactors Kubernetes plugin to source shared kube-like helpers.
deploy/kind-emulator/setup.sh	Updates shebang for portability.
deploy/kind-emulator/README.md	Updates kind-emulator examples to opt-in VA/HPA and INFRA_ONLY.
deploy/kind-emulator/install.sh	Refactors kind-emulator plugin to reuse shared kube-like helpers in parts.
deploy/install.sh	Converts to a thin entrypoint sourcing `deploy/lib/*` and invoking `main`.
deploy/inference-objective-e2e.yaml	Clarifies when install applies this manifest (non-e2e scale-to-zero only).

github-actions · 2026-04-01T14:11:45Z

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

mamy-CS · 2026-04-01T14:37:27Z

/ok-to-test

github-actions · 2026-04-01T14:37:38Z

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

github-actions · 2026-04-01T14:37:43Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-04-01T14:40:22Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	26	24

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

mamy-CS · 2026-04-01T21:06:05Z

/ok-to-test

github-actions · 2026-04-01T21:06:16Z

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

github-actions · 2026-04-01T21:06:22Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

mamy-CS · 2026-04-01T21:09:17Z

pr is ready @lionelvillard PTAL, let's try to merge this asap as it's deployment. Had to rebase twice already today, so folks can work on top of it. Thanks.

github-actions · 2026-04-01T21:09:30Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	26	24

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

mamy-CS · 2026-04-02T17:00:33Z

/ok-to-test

github-actions · 2026-04-02T17:00:43Z

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

github-actions · 2026-04-02T17:00:49Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-04-02T17:04:56Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	33	17

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

mamy-CS requested review from lionelvillard and shuynh2017 March 31, 2026 20:00

mamy-CS self-assigned this Mar 31, 2026

mamy-CS linked an issue Mar 31, 2026 that may be closed by this pull request

cleanup infra deployment #956

Closed

lionelvillard requested a review from Copilot April 1, 2026 13:51

Copilot started reviewing on behalf of lionelvillard April 1, 2026 13:52 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

mamy-CS force-pushed the cleanup_install_script branch from 18711b9 to 3e60474 Compare April 1, 2026 14:11

mamy-CS force-pushed the cleanup_install_script branch 2 times, most recently from 0596c5a to ccd1a8b Compare April 1, 2026 14:15

mamy-CS enabled auto-merge (squash) April 1, 2026 15:12

mamy-CS force-pushed the cleanup_install_script branch from b0380ba to 48c6273 Compare April 1, 2026 20:33

mamy-CS mentioned this pull request Apr 2, 2026

fix: lint on test case #961

Closed

mamy-CS added 3 commits April 2, 2026 12:04

phase 1 clean up

b4b7b5b

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

phase 2 decouple

452adc6

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

skip modelservice install for e2es

8e96f5d

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

mamy-CS added 2 commits April 2, 2026 12:04

apply review changes from copilot

f3be650

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

fix lint

2a8eaeb

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

mamy-CS force-pushed the cleanup_install_script branch from bb495ac to 2a8eaeb Compare April 2, 2026 16:04

rm

2394e3e

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>