Skip to content

cleanup/ refactor deploy scripts#959

Merged
mamy-CS merged 6 commits intollm-d:mainfrom
mamy-CS:cleanup_install_script
Apr 2, 2026
Merged

cleanup/ refactor deploy scripts#959
mamy-CS merged 6 commits intollm-d:mainfrom
mamy-CS:cleanup_install_script

Conversation

@mamy-CS
Copy link
Copy Markdown
Collaborator

@mamy-CS mamy-CS commented Mar 31, 2026

per #872

This pr cleans up/ hardens the deploy scripts for better readability and maintainability (mostly moving around code. For the most part no new functionality). These scripts are only used for e2es and dev purposes.

Local full emulated kind e2e run:

[AfterSuite] PASSED [0.070 seconds]
------------------------------

Ran 31 of 34 Specs in 520.389 seconds
SUCCESS! -- 31 Passed | 0 Failed | 0 Pending | 3 Skipped
--- PASS: TestE2E (520.39s)
PASS
ok      github.com/llm-d/llm-d-workload-variant-autoscaler/test/e2e     520.907s

==========================================
Test execution completed. Exit code: 0
==========================================

@mamy-CS mamy-CS self-assigned this Mar 31, 2026
@mamy-CS mamy-CS linked an issue Mar 31, 2026 that may be closed by this pull request
@mamy-CS
Copy link
Copy Markdown
Collaborator Author

mamy-CS commented Mar 31, 2026

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 22 28
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors and hardens the developer/E2E deployment scripts by modularizing deploy/install.sh into focused deploy/lib/* helpers, and shifts e2e-only InferenceObjective management from the install flow into the scale-from-zero e2e suite.

Changes:

  • Modularize deploy logic into deploy/lib/* (CLI parsing, prereqs, orchestration, monitoring/scaler backends, llm-d/WVA deploy, cleanup, verification).
  • Update scale-from-zero e2e to apply/delete InferenceObjective via Go fixtures when the CRD exists; update related docs accordingly.
  • Add make lint-deploy-scripts / make smoke-deploy-scripts and update deploy docs to reflect infra-first defaults (VA/HPA opt-in).

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
test/e2e/scale_from_zero_test.go Applies InferenceObjective from e2e setup (and cleans up) when API exists.
test/e2e/README.md Documents new e2e responsibility for InferenceObjective creation.
test/e2e/fixtures/inference_objective.go Adds dynamic-client fixture to create/delete InferenceObjective.
Makefile Adds deploy-script lint and smoke targets.
docs/developer-guide/troubleshooting.md Updates troubleshooting guidance around InferenceObjective ownership.
docs/developer-guide/testing.md Updates infra/e2e deployment notes for flow-control + InferenceObjective.
deploy/README.md Updates environment variables and examples (VA/HPA opt-in; INFRA_ONLY usage).
deploy/openshift/install.sh Refactors OpenShift plugin to reuse shared namespace loop + CA extraction helper.
deploy/lib/wait_helpers.sh Adds shared retry/wait helpers for deploy scripts.
deploy/lib/verify.sh Adds verification + deployment summary helpers.
deploy/lib/scaler_runtime.sh Adds scaler backend runtime helpers (KEDA/adapter + APIService guard).
deploy/lib/prereqs.sh Extracts prerequisite checks and interactive prompts.
deploy/lib/kube_like_adapter.sh Adds Kubernetes-like environment adapter for shared functions.
deploy/lib/install_core.sh Extracts main install orchestration from deploy/install.sh.
deploy/lib/infra_wva.sh Adds shared WVA deploy helpers + shared namespace loop + OpenShift CA extraction helper.
deploy/lib/infra_scaler_backend.sh Adds shared scaler-backend orchestration wrapper.
deploy/lib/infra_monitoring.sh Adds shared monitoring orchestration + optional Grafana deploy.
deploy/lib/infra_llmd.sh Adds shared llm-d deployment helpers and e2e infra-only behavior adjustments.
deploy/lib/discovery.sh Adds discovery helpers (GPU type; InferencePool API group detection).
deploy/lib/deploy_prometheus_kube_stack.sh Adds shared kube-prometheus-stack install/uninstall for kube-like envs.
deploy/lib/constants.sh Centralizes shared constants/selectors/timeouts for deploy scripts.
deploy/lib/common.sh Centralizes logging + utility helpers.
deploy/lib/cli.sh Centralizes CLI help + argument parsing.
deploy/lib/cleanup.sh Centralizes undeploy/cleanup logic and scaler backend teardown.
deploy/kubernetes/README.md Updates Kubernetes deploy examples to opt-in VA/HPA and INFRA_ONLY.
deploy/kubernetes/install.sh Refactors Kubernetes plugin to source shared kube-like helpers.
deploy/kind-emulator/setup.sh Updates shebang for portability.
deploy/kind-emulator/README.md Updates kind-emulator examples to opt-in VA/HPA and INFRA_ONLY.
deploy/kind-emulator/install.sh Refactors kind-emulator plugin to reuse shared kube-like helpers in parts.
deploy/install.sh Converts to a thin entrypoint sourcing deploy/lib/* and invoking main.
deploy/inference-objective-e2e.yaml Clarifies when install applies this manifest (non-e2e scale-to-zero only).

Comment thread deploy/lib/cli.sh Outdated
Comment thread deploy/lib/install_core.sh Outdated
Comment thread deploy/lib/install_core.sh
Comment thread deploy/lib/install_core.sh Outdated
Comment thread test/e2e/fixtures/inference_objective.go
Comment thread test/e2e/fixtures/inference_objective.go Outdated
Comment thread test/e2e/fixtures/inference_objective.go Outdated
Comment thread deploy/openshift/install.sh Outdated
Comment thread Makefile Outdated
@mamy-CS mamy-CS force-pushed the cleanup_install_script branch from 18711b9 to 3e60474 Compare April 1, 2026 14:11
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

@mamy-CS mamy-CS force-pushed the cleanup_install_script branch 2 times, most recently from 0596c5a to ccd1a8b Compare April 1, 2026 14:15
@mamy-CS
Copy link
Copy Markdown
Collaborator Author

mamy-CS commented Apr 1, 2026

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 26 24
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@mamy-CS mamy-CS enabled auto-merge (squash) April 1, 2026 15:12
@mamy-CS mamy-CS force-pushed the cleanup_install_script branch from b0380ba to 48c6273 Compare April 1, 2026 20:33
@mamy-CS
Copy link
Copy Markdown
Collaborator Author

mamy-CS commented Apr 1, 2026

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@mamy-CS
Copy link
Copy Markdown
Collaborator Author

mamy-CS commented Apr 1, 2026

pr is ready @lionelvillard PTAL, let's try to merge this asap as it's deployment. Had to rebase twice already today, so folks can work on top of it. Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 26 24
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@mamy-CS mamy-CS mentioned this pull request Apr 2, 2026
mamy-CS added 3 commits April 2, 2026 12:04
Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>
Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>
Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>
mamy-CS added 2 commits April 2, 2026 12:04
Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>
Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>
@mamy-CS mamy-CS force-pushed the cleanup_install_script branch from bb495ac to 2a8eaeb Compare April 2, 2026 16:04
Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>
@mamy-CS
Copy link
Copy Markdown
Collaborator Author

mamy-CS commented Apr 2, 2026

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 33 17
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

Comment thread deploy/kubernetes/install.sh
Comment thread deploy/lib/cleanup.sh
Comment thread test/e2e/README.md
Comment thread deploy/README.md
Comment thread deploy/install.sh
Comment thread internal/engines/scalefromzero/engine_test.go
Comment thread Makefile
Comment thread Makefile
Comment thread deploy/openshift/install.sh
@mamy-CS mamy-CS merged commit 669f5c4 into llm-d:main Apr 2, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cleanup infra deployment

4 participants