Convert multi-model deploy script from bash to Go by kahilam · Pull Request #1015 · llm-d/llm-d-workload-variant-autoscaler

kahilam · 2026-04-15T17:59:45Z

Summary

AI-assisted using Cursor IDE.

Converts deploy/install-multi-model.sh (393-line bash script) into a Go tool at deploy/multimodel/, addressing review feedback from #1014 (comment) to move away from bash deployment scripts for readability, performance, and enabling concurrent test execution within a single GH workflow run.

Key improvements over the bash version:

Concurrent model deployment: Models 2..N deploy in parallel via goroutines (bash was sequential)
No Docker Hub images: Connectivity verification uses kubectl port-forward from the Go process, eliminating the in-cluster curlimages/curl:latest Job
Type-safe k8s resources: Gateway and HTTPRoute created via the dynamic client instead of heredoc YAML
Better error handling: Go error propagation vs bash set -e

Files changed:

Added: deploy/multimodel/main.go, deployer.go, portforward.go
Deleted: deploy/install-multi-model.sh
Modified: Makefile — updated targets to use go run ./deploy/multimodel, added INSTALL_GATEWAY_CTRLPLANE passthrough
Modified: deploy/lib/infra_llmd.sh — guarded modelArtifacts.labels yq call for chart compatibility

Functional parity:

The Go tool accepts the same environment variables (MODELS, LLMD_NS, ENVIRONMENT, DECODE_REPLICAS, etc.) and CLI flags (--undeploy) as the bash script. It still delegates to deploy/install.sh for per-model Helm deployments.

Tested on OpenShift cluster:

Full deploy + undeploy cycle with 2 models (Qwen/Qwen3-0.6B, unsloth/Meta-Llama-3.1-8B)
Gateway connectivity verified for both models via port-forward
Multi-model scaling benchmark ran successfully (SUCCESS! -- 1 Passed | 0 Failed)

Addresses review feedback from #1014 to move away from bash deployment scripts for readability, type safety, and concurrent model deployment. Key improvements: - Models 2..N deploy concurrently via goroutines (bash was sequential) - Connectivity verification uses kubectl port-forward from the Go process, eliminating the in-cluster curl Job and its Docker Hub image (curlimages/curl:latest) - Kubernetes resources (Gateway, HTTPRoute) created via dynamic client instead of heredoc YAML - Proper error handling and structured logging The Go tool is invoked via `go run ./deploy/multimodel` from the same Makefile targets (deploy-multi-model-infra, undeploy-multi-model-infra). Made-with: Cursor

github-actions · 2026-04-15T18:02:52Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	39	11

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

- Add INSTALL_GATEWAY_CTRLPLANE to Makefile passthrough (default: false for shared clusters with existing Istio) - Set E2E_TESTS_ENABLED=true to prevent interactive prompts in install.sh - Guard modelArtifacts.labels yq call in infra_llmd.sh to avoid schema validation errors on chart versions that don't support custom labels - Remove unused import in main.go Made-with: Cursor

github-actions · 2026-04-15T18:59:55Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	39	11

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

asm582 · 2026-04-15T19:07:09Z

Thanks, please test this PR locally with command similar to:

abhishekmalvankar@wecm-9-67-159-78 llm-d-workload-variant-autoscaler % make undeploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=asmalvan-test-3 LLMD_NS=asmalvan-test-3 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B" && \
make deploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=asmalvan-test-3 LLMD_NS=asmalvan-test-3 \
  NAMESPACE_SCOPED=true SKIP_BUILD=true \
  DECODE_REPLICAS=1 IMG_TAG=v0.6.0 LLM_D_RELEASE=v0.6.0 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B" && \
make test-multi-model-scaling \
  ENVIRONMENT=openshift \
  LLMD_NS=asmalvan-test-3 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

Please share a snippet of the test being passed. I know this is manual for now.

kahilam · 2026-04-15T19:17:43Z

Yes, tested locally on OpenShift cluster (wva-bench-test namespace). Full deploy + undeploy + benchmark run completed successfully:

Both models deployed and reachable through Gateway

make test-multi-model-scaling passed (exit code 0)
make undeploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"
make deploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  NAMESPACE_SCOPED=true SKIP_BUILD=true \
  DECODE_REPLICAS=1 IMG_TAG=v0.6.0 LLM_D_RELEASE=v0.6.0 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"
make test-multi-model-scaling \
  ENVIRONMENT=openshift \
  LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

asm582 · 2026-04-15T19:55:29Z

@kahilam linter tests are failing, please fix.

- Replace fmt.Sprintf with string concatenation (perfsprint) - Preallocate rules slice (prealloc) - Remove unused ctx param from detectInferencePoolAPIGroup (unparam) - Change verifyInferencePools to not return always-nil error (unparam) - Remove unused portStr method and strconv import (unused) Made-with: Cursor

- Use strconv.Itoa instead of fmt.Sprintf for int conversion (perfsprint) - Use string concatenation for svc/ prefix (perfsprint) - Remove trailing blank line in portforward.go (gofmt) Made-with: Cursor

asm582 · 2026-04-15T21:40:51Z

Yes, tested locally on OpenShift cluster (wva-bench-test namespace). Full deploy + undeploy + benchmark run completed successfully:

Both models deployed and reachable through Gateway

make test-multi-model-scaling passed (exit code 0)
make undeploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"
make deploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  NAMESPACE_SCOPED=true SKIP_BUILD=true \
  DECODE_REPLICAS=1 IMG_TAG=v0.6.0 LLM_D_RELEASE=v0.6.0 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"
make test-multi-model-scaling \
  ENVIRONMENT=openshift \
  LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

@kahilam, with the new changes, could you please share the output of the command above?

kahilam · 2026-04-15T22:10:02Z

Re-ran the full test with the latest lint fix commits (cac620d, fa005ec):

make undeploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

make deploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  NAMESPACE_SCOPED=true SKIP_BUILD=true \
  DECODE_REPLICAS=1 IMG_TAG=v0.6.0 LLM_D_RELEASE=v0.6.0 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

make test-multi-model-scaling \
  ENVIRONMENT=openshift \
  LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

Results:

Both models deployed and reachable through Gateway
HPA scaled both models from 1→2 replicas under load
Scaling monitor ran for 600s+ with stable 2/2 ready replicas

Ran 1 of 6 Specs in 832.610 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 5 Skipped
--- PASS: TestBenchmark (832.61s)
PASS
ok  github.com/llm-d/llm-d-workload-variant-autoscaler/test/benchmark  834.201s

==========================================
Multi-model benchmark completed. Exit code: 0
==========================================

kahilam · 2026-04-15T22:11:28Z

@asm582 FYI.

Full benchmark output (click to expand)

══════════════════════════════════════════════════════════════════
  MULTI-MODEL SCALING BENCHMARK RESULTS
  Models: 2
══════════════════════════════════════════════════════════════════

  ┌────────────────────────────────────────────────────────────
  │ MODEL: Qwen/Qwen3-0.6B
  │ Slug:  qwen-qwen3-0-6b
  ├────────────────────────────────────────────────────────────
  │ Load Job:        SuccessCriteriaMet
  │ Duration:        640s
  │ Final Replicas:  spec=2 ready=2
  │ Max Replicas:    2
  │ Avg Replicas:    1.98
  ├── Prometheus Metrics ──────────────────────────────────────
  │ Avg KV Cache:    0.3467
  │ Avg Queue Depth: 9.90
  │ Avg EPP Queue:   256.83
  ├── GuideLLM Results ────────────────────────────────────────
  │ Achieved RPS:    7.76
  │ Errors:          5125
  │ Incomplete:      505
  ├── Replica Timeline (42 snapshots) ─────────────────────────
  │   t=15s  spec=1  ready=1
  │   t=30s  spec=2  ready=1
  │   t=120s spec=2  ready=2
  │   ... (stable at spec=2 ready=2 through t=630s)
  └────────────────────────────────────────────────────────────

  ┌────────────────────────────────────────────────────────────
  │ MODEL: unsloth/Meta-Llama-3.1-8B
  │ Slug:  unsloth-meta-llama-3-1-8b
  ├────────────────────────────────────────────────────────────
  │ Load Job:        SuccessCriteriaMet
  │ Duration:        640s
  │ Final Replicas:  spec=2 ready=2
  │ Max Replicas:    2
  │ Avg Replicas:    1.98
  ├── Prometheus Metrics ──────────────────────────────────────
  │ Avg KV Cache:    0.5607
  │ Avg Queue Depth: 34.21
  │ Avg EPP Queue:   183.24
  ├── GuideLLM Results ────────────────────────────────────────
  │ Achieved RPS:    6.24
  │ Errors:          6236
  │ Incomplete:      511
  ├── Replica Timeline (42 snapshots) ─────────────────────────
  │   t=15s  spec=1  ready=1
  │   t=30s  spec=2  ready=1
  │   t=150s spec=2  ready=2
  │   ... (stable at spec=2 ready=2 through t=630s)
  └────────────────────────────────────────────────────────────

Ran 1 of 6 Specs in 832.610 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 5 Skipped
--- PASS: TestBenchmark (832.61s)
PASS
ok  github.com/llm-d/llm-d-workload-variant-autoscaler/test/benchmark  834.201s

==========================================
Multi-model benchmark completed. Exit code: 0
==========================================

asm582 · 2026-04-15T22:13:14Z

/lgtm
/approve

kahilam · 2026-04-15T22:20:17Z

/ok-to-test

github-actions · 2026-04-15T22:20:26Z

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

github-actions · 2026-04-15T22:20:34Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

kahilam · 2026-04-15T22:49:16Z

The VA=1 issue--> The WVA controller is stuck in a transition loop: HPA scales to 2, VA still desires 1, sees desired(1)!=current(2) as "in transition", blocks its own scale-up decision, and keeps trying to scale down to 1. Also seeing pod/pod_name label mismatch for dispatch rate metrics on all pods. This is an HPA/VA conflict in the test setup, not related to the Go conversion.

asm582 · 2026-04-16T02:26:35Z

Thanks for this PR, I do see scale up and scale down:

  ══════════════════════════════════════════════════════════════════
    MULTI-MODEL SCALING BENCHMARK RESULTS
    Models: 2
  ══════════════════════════════════════════════════════════════════

    ┌────────────────────────────────────────────────────────────
    │ MODEL: Qwen/Qwen3-0.6B
    │ Slug:  qwen-qwen3-0-6b
    ├────────────────────────────────────────────────────────────
    │ Load Job:        SuccessCriteriaMet
    │ Duration:        640s
    │ Final Replicas:  spec=2 ready=2
    │ Max Replicas:    3
    │ Avg Replicas:    2.62
    ├── Prometheus Metrics ──────────────────────────────────────
    │ Avg KV Cache:    0.2969
    │ Avg Queue Depth: 9.55
    │ Avg EPP Queue:   71.10
    ├── GuideLLM Results ────────────────────────────────────────
    │ Achieved RPS:    11.62
    │ TTFT (ms):       {"count":7440,"max":39492.857217788696,"mean":5748.447952699918,"median":68.22490692138672,"min":28.84078025817871,"mode":39.40463066101074,"pdf":null,"percentiles":{"p001":29.954195022583008,"p01":31.963109970092773,"p05":34.35778617858887,"p10":36.066532135009766,"p25":41.10145568847656,"p50":68.22490692138672,"p75":9606.345653533936,"p90":16588.982582092285,"p95":25162.424325942993,"p99":37522.39418029785,"p999":39151.00383758545},"std_dev":8957.843630675445,"total_sum":42768452.76808739,"variance":80242962.51163264}
    │ ITL (ms):        {"count":7432560,"max":38.22515438030193,"mean":13.06922665785656,"median":6.103248806209774,"min":2.7977484721201913,"mode":2.7977484721201913,"pdf":null,"percentiles":{"p001":2.8474922772045725,"p01":3.1527236655906394,"p05":3.4249353933859394,"p10":3.610821696253749,"p25":4.0013697054293065,"p50":6.103248806209774,"p75":25.06122885046301,"p90":29.460214160464787,"p95":30.052853776169968,"p99":32.98202076473751,"p999":37.47222397301171},"std_dev":10.668411937890385,"total_sum":97137811.28811836,"variance":113.81501327652208}
    │ Throughput:      {"count":7440000,"max":121266.6357421875,"mean":12423.798953980498,"median":10246.671009771986,"min":0.22895657302290076,"mode":0.22895657302290076,"pdf":null,"percentiles":{"p001":0.22895657302290076,"p01":0.2289822721387534,"p05":1915.207305936073,"p10":3363.515637530072,"p25":5722.106412005457,"p50":10246.671009771986,"p75":16743.72854291417,"p90":24775.589351894218,"p95":30335.207403608365,"p99":43206.74057825855,"p999":60300.76601002464},"std_dev":9228.151729703304,"total_sum":92433064217.61491,"variance":85158784.3464261}
    │ Errors:          4094
    │ Incomplete:      121
    ├── Replica Timeline (42 snapshots) ─────────────────────────
    │   t=15s  spec=1  ready=1
    │   t=30s  spec=2  ready=1
    │   t=45s  spec=2  ready=1
    │   t=60s  spec=2  ready=1
    │   t=75s  spec=2  ready=1
    │   t=90s  spec=2  ready=1
    │   t=105s  spec=2  ready=1
    │   t=120s  spec=2  ready=2
    │   t=135s  spec=2  ready=2
    │   t=150s  spec=2  ready=2
    │   t=165s  spec=2  ready=2
    │   t=180s  spec=3  ready=2
    │   t=195s  spec=3  ready=2
    │   t=210s  spec=3  ready=2
    │   t=225s  spec=3  ready=2
    │   t=240s  spec=3  ready=2
    │   t=255s  spec=3  ready=2
    │   t=270s  spec=3  ready=2
    │   t=285s  spec=3  ready=2
    │   t=300s  spec=3  ready=3
    │   t=315s  spec=3  ready=3
    │   t=330s  spec=3  ready=3
    │   t=345s  spec=3  ready=3
    │   t=360s  spec=3  ready=3
    │   t=375s  spec=3  ready=3
    │   t=390s  spec=3  ready=3
    │   t=405s  spec=3  ready=3
    │   t=420s  spec=3  ready=3
    │   t=435s  spec=3  ready=3
    │   t=450s  spec=3  ready=3
    │   t=465s  spec=3  ready=3
    │   t=480s  spec=3  ready=3
    │   t=495s  spec=3  ready=3
    │   t=510s  spec=3  ready=3
    │   t=525s  spec=3  ready=3
    │   t=540s  spec=3  ready=3
    │   t=555s  spec=3  ready=3
    │   t=570s  spec=3  ready=3
    │   t=585s  spec=2  ready=2
    │   t=600s  spec=2  ready=2
    │   t=615s  spec=2  ready=2
    │   t=630s  spec=2  ready=2
    └────────────────────────────────────────────────────────────

    ┌────────────────────────────────────────────────────────────
    │ MODEL: unsloth/Meta-Llama-3.1-8B
    │ Slug:  unsloth-meta-llama-3-1-8b
    ├────────────────────────────────────────────────────────────
    │ Load Job:        SuccessCriteriaMet
    │ Duration:        640s
    │ Final Replicas:  spec=2 ready=2
    │ Max Replicas:    3
    │ Avg Replicas:    2.48
    ├── Prometheus Metrics ──────────────────────────────────────
    │ Avg KV Cache:    0.3402
    │ Avg Queue Depth: 11.98
    │ Avg EPP Queue:   76.33
    ├── GuideLLM Results ────────────────────────────────────────
    │ Achieved RPS:    5.64
    │ TTFT (ms):       {"count":3612,"max":35652.39071846008,"mean":8797.05326283889,"median":400.432825088501,"min":74.51033592224121,"mode":128.53074073791504,"pdf":null,"percentiles":{"p001":76.00045204162598,"p01":79.52380180358887,"p05":83.17351341247559,"p10":86.70234680175781,"p25":142.32182502746582,"p50":400.432825088501,"p75":16011.017084121704,"p90":27655.20977973938,"p95":32447.837591171265,"p99":34792.26303100586,"p999":35578.76014709473},"std_dev":11445.213173063228,"total_sum":31774956.38537407,"variance":130992904.57686004}
    │ ITL (ms):        {"count":3608388,"max":47.51579253165214,"mean":19.854888642225824,"median":19.79256797958542,"min":5.188541011409359,"mode":5.188541011409359,"pdf":null,"percentiles":{"p001":5.541122234142101,"p01":6.002523519613363,"p05":6.436811672435986,"p10":6.652015107530016,"p25":7.275045335710466,"p50":19.79256797958542,"p75":31.282470987604427,"p90":38.150784250971554,"p95":40.05243398763754,"p99":42.999861118671774,"p999":46.20621464512608},"std_dev":12.756438617742925,"total_sum":71644141.91794395,"variance":162.72672620824304}
    │ Throughput:      {"count":3612000,"max":89328.576959574,"mean":6022.24663721846,"median":4500.326180257511,"min":0.24693105606477062,"mode":0.24789795527357833,"pdf":null,"percentiles":{"p001":0.24693105606477062,"p01":0.24789795527357833,"p05":726.7624561915939,"p10":1473.0872238165603,"p25":2622.527797753948,"p50":4500.326180257511,"p75":7728.291623740843,"p90":12693.84180716234,"p95":16336.618384454037,"p99":25556.848027897267,"p999":38486.42719422505},"std_dev":5234.371706959363,"total_sum":21752354853.633076,"variance":27398647.166616675}
    │ Errors:          7956
    │ Incomplete:      119
    ├── Replica Timeline (42 snapshots) ─────────────────────────
    │   t=15s  spec=1  ready=1
    │   t=30s  spec=1  ready=1
    │   t=45s  spec=1  ready=1
    │   t=60s  spec=2  ready=1
    │   t=75s  spec=2  ready=1
    │   t=90s  spec=2  ready=1
    │   t=105s  spec=2  ready=1
    │   t=120s  spec=2  ready=1
    │   t=135s  spec=2  ready=1
    │   t=150s  spec=2  ready=1
    │   t=165s  spec=2  ready=1
    │   t=180s  spec=2  ready=2
    │   t=195s  spec=2  ready=2
    │   t=210s  spec=2  ready=2
    │   t=225s  spec=2  ready=2
    │   t=240s  spec=2  ready=2
    │   t=255s  spec=2  ready=2
    │   t=270s  spec=2  ready=2
    │   t=285s  spec=2  ready=2
    │   t=300s  spec=3  ready=2
    │   t=315s  spec=3  ready=2
    │   t=330s  spec=3  ready=2
    │   t=345s  spec=3  ready=2
    │   t=360s  spec=3  ready=2
    │   t=375s  spec=3  ready=2
    │   t=390s  spec=3  ready=2
    │   t=405s  spec=3  ready=2
    │   t=420s  spec=3  ready=3
    │   t=435s  spec=3  ready=3
    │   t=450s  spec=3  ready=3
    │   t=465s  spec=3  ready=3
    │   t=480s  spec=3  ready=3
    │   t=495s  spec=3  ready=3
    │   t=510s  spec=3  ready=3
    │   t=525s  spec=3  ready=3
    │   t=540s  spec=3  ready=3
    │   t=555s  spec=3  ready=3
    │   t=570s  spec=3  ready=3
    │   t=585s  spec=3  ready=3
    │   t=600s  spec=3  ready=3
    │   t=615s  spec=3  ready=3
    │   t=630s  spec=3  ready=3
    └────────────────────────────────────────────────────────────

  STEP: Saving multi-model benchmark results to file @ 04/15/26 22:23:37.201
  Results saved to /tmp/multi-model-benchmark-results.json
  Multi-model benchmark complete — cleaning up
    Scaled ms-qwen-qwen3-0-6b-llm-d-modelservice-decode back to 1
    Scaled ms-unsloth-meta-llama-3-1-8b-llm-d-modelservice-decode back to 1
• [768.479 seconds]
------------------------------
[AfterSuite] 
/Users/abhishekmalvankar/go-conv/llm-d-workload-variant-autoscaler/test/benchmark/suite_test.go:131
  STEP: Killing Prometheus port-forward @ 04/15/26 22:23:43.289
[AfterSuite] PASSED [0.000 seconds]
------------------------------

Ran 1 of 6 Specs in 769.826 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 5 Skipped
--- PASS: TestBenchmark (769.83s)
PASS
ok      github.com/llm-d/llm-d-workload-variant-autoscaler/test/benchmark       771.049s

==========================================
Multi-model benchmark completed. Exit code: 0
==========================================
abhishekmalvankar@wecm-9-67-148-249 llm-d-workload-variant-autoscaler % date
Wed Apr 15 22:24:07 EDT 2026
abhishekmalvankar@wecm-9-67-148-249 llm-d-workload-variant-autoscaler % oc project
Using project "asmalvan-test-6" on server "https://api.pokprod001.ete14.res.ibm.com:6443".
abhishekmalvankar@wecm-9-67-148-249 llm-d-workload-variant-autoscaler % oc get pods
NAME                                                              READY   STATUS    RESTARTS   AGE
gaie-qwen-qwen3-0-6b-epp-746d56c64f-r789x                         1/1     Running   0          14m
gaie-unsloth-meta-llama-3-1-8b-epp-6cfd55c595-s9xxd               1/1     Running   0          14m
ms-qwen-qwen3-0-6b-llm-d-modelservice-decode-75c84475f8-mx5xr     1/1     Running   0          20m
ms-unsloth-meta-llama-3-1-8b-llm-d-modelservice-decode-779phn6v   1/1     Running   0          17m
multi-model-inference-gateway-istio-f5f74b8f9-5wkbz               1/1     Running   0          17m
workload-variant-autoscaler-controller-manager-79b48b6cc5-9b9hh   1/1     Running   0          18m
workload-variant-autoscaler-controller-manager-79b48b6cc5-gjdqt   1/1     Running   0          19m

asm582 · 2026-04-16T02:27:00Z

/lgtm
/approve

Remove the modelArtifacts.labels guard that was added as a workaround for older chart versions. This change is out of scope for the bash-to-Go conversion PR. Made-with: Cursor

kahilam · 2026-04-16T16:16:13Z

Re-tested after reverting deploy/lib/infra_llmd.sh to match main (commit f860acf):

make undeploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

make deploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  NAMESPACE_SCOPED=true SKIP_BUILD=true \
  DECODE_REPLICAS=1 IMG_TAG=v0.6.0 LLM_D_RELEASE=v0.6.0 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

make test-multi-model-scaling \
  ENVIRONMENT=openshift \
  LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

Results:

Both models deployed and reachable through Gateway
Llama scaled to spec=4 ready=4, Qwen stable at spec=2 ready=2
Revert of infra_llmd.sh did not break deployment

Ran 1 of 6 Specs in 807.235 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 5 Skipped
--- PASS: TestBenchmark (807.24s)
PASS
ok  github.com/llm-d/llm-d-workload-variant-autoscaler/test/benchmark  808.106s

==========================================
Multi-model benchmark completed. Exit code: 0
==========================================

kahilam · 2026-04-16T16:17:25Z

@asm582 Reverted deploy/lib/infra_llmd.sh to match main. Re-tested the full deploy + benchmark cycle — all passing. Results posted above.

asm582 · 2026-04-16T17:15:17Z

/lgtm
/approve

kahilam · 2026-04-16T17:18:24Z

@lionelvillard , would you please approve this PR?

lionelvillard · 2026-04-16T20:16:51Z

/ok-to-test

github-actions · 2026-04-16T20:17:01Z

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

github-actions · 2026-04-16T20:17:05Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

kahilam · 2026-04-16T20:20:43Z

/ok-to-test

github-actions · 2026-04-16T20:21:04Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-04-16T20:42:13Z

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

kahilam requested a review from asm582 April 15, 2026 18:59

kahilam requested a review from lionelvillard April 15, 2026 19:18

kahilam added 2 commits April 15, 2026 14:29

Fix remaining golangci-lint errors

fa005ec

- Use strconv.Itoa instead of fmt.Sprintf for int conversion (perfsprint) - Use string concatenation for svc/ prefix (perfsprint) - Remove trailing blank line in portforward.go (gofmt) Made-with: Cursor

github-actions bot added the lgtm Looks good to me, indicates that a PR is ready to be merged. label Apr 15, 2026

github-actions bot approved these changes Apr 15, 2026

View reviewed changes

kahilam enabled auto-merge (squash) April 15, 2026 23:24

github-actions bot previously approved these changes Apr 16, 2026

View reviewed changes

asm582 disabled auto-merge April 16, 2026 02:27

asm582 reviewed Apr 16, 2026

View reviewed changes

Comment thread deploy/lib/infra_llmd.sh Outdated

Revert deploy/lib/infra_llmd.sh to match main

f860acf

Remove the modelArtifacts.labels guard that was added as a workaround for older chart versions. This change is out of scope for the bash-to-Go conversion PR. Made-with: Cursor

kahilam dismissed github-actions[bot]’s stale review via f860acf April 16, 2026 15:46

github-actions bot removed the lgtm Looks good to me, indicates that a PR is ready to be merged. label Apr 16, 2026

github-actions bot added the lgtm Looks good to me, indicates that a PR is ready to be merged. label Apr 16, 2026

github-actions bot approved these changes Apr 16, 2026

View reviewed changes

asm582 enabled auto-merge (squash) April 16, 2026 17:16

Conversation

kahilam commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key improvements over the bash version:

Files changed:

Functional parity:

Tested on OpenShift cluster:

Uh oh!

github-actions bot commented Apr 15, 2026

GPU Pre-flight Check ✅

Uh oh!

github-actions bot commented Apr 15, 2026

GPU Pre-flight Check ✅

Uh oh!

asm582 commented Apr 15, 2026

Uh oh!

kahilam commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asm582 commented Apr 15, 2026

Uh oh!

asm582 commented Apr 15, 2026

Uh oh!

kahilam commented Apr 15, 2026

Uh oh!

kahilam commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asm582 commented Apr 15, 2026

Uh oh!

kahilam commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

kahilam commented Apr 15, 2026

Uh oh!

asm582 commented Apr 16, 2026

Uh oh!

asm582 commented Apr 16, 2026

Uh oh!

Uh oh!

kahilam commented Apr 16, 2026

Uh oh!

kahilam commented Apr 16, 2026

Uh oh!

asm582 commented Apr 16, 2026

Uh oh!

kahilam commented Apr 16, 2026

Uh oh!

lionelvillard commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

kahilam commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kahilam commented Apr 15, 2026 •

edited

Loading

kahilam commented Apr 15, 2026 •

edited

Loading

kahilam commented Apr 15, 2026 •

edited

Loading