[sync] upstream llm-d main branch b745c47 [2026-03-26] by zdtsw · Pull Request #40 · opendatahub-io/workload-variant-autoscaler

zdtsw · 2026-03-26T17:37:50Z

Description

Syncs llm-d/llm-d-workload-variant-autoscaler main branch into ODH main branch.
with two updates:

dockerfile.konflux added
owner file updated
created by update: add agent to do upstream sync #41

How Has This Been Tested?

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

New Features
- Added queueing model analyzer for improved capacity estimation and replica allocation.
- Introduced per-variant replica bounds (minReplicas/maxReplicas) for finer scaling control.
- Added support for disaggregated prefill/decode models with role-aware scaling.
- Implemented new scoring-based optimizer for improved resource allocation fairness.
Documentation
- Updated minimum Go version requirement to 1.25.0.
- Enhanced HPA behavior tuning guidance.
Removals
- Removed deprecated benchmark mode configuration.
- Removed AI-powered workflow automation (link-checker, typo-checker, upstream-monitor).

Replaced by centralized nightly checks in llm-d/llm-d-infra#83. The nightly-org-checks workflow now runs typo, link, and upstream checks across all repos in both llm-d and llm-d-incubation orgs. Signed-off-by: Andrew Anderson <andy@clubanderson.com>

…lm-d#863) Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

* docs: recommend one-shot setup and document InferencePool CRD race condition In docs/developer-guide/development.md: - Recommend 'CREATE_CLUSTER=true make deploy-wva-emulated-on-kind' as the preferred one-shot setup to avoid a CRD timing race - Preserve the two-step alternative with a note pointing to Known Setup Issues - Add 'Known Setup Issues' section with symptom, cause, and two remediation options for the InferencePool CRD-not-found error Signed-off-by: Srujan Reddy <srjnreddy33@gmail.com> * dummy commit --------- Signed-off-by: Srujan Reddy <srjnreddy33@gmail.com>

…scalingv2 (llm-d#871) Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

* Basic model tuner functionality * queue analyzer engine skeleton * tuner integration init * initial implementation of computeAllVariantCapacities * Register queueing model metrics queries for arrival rate and max batch size * integration v2 * wire queueing model analyzer to engine v1 * handle error cases * set go version to 1.24.0 * code refactoring * add comments and rearrange code * fix openshift ci e2e gate for fork (llm-d#843) * fix openshift ci e2e gate for fork Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com> * clarify openshift e2e triggers Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com> * update comments to be explicit Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com> --------- Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com> * document wva release process (llm-d#845) * document wva release process Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com> * nit Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com> --------- Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com> * wire queueing model analyzer to engine v1 * set go version to 1.24.0 * add QueueingModelScalingConfig struct * add QueueingModelConfigMapName constant and helper * add config map functionality that lets user set config for queue model * rename QueueingModel config types and functions to QMAnalyzer Rename types, struct fields, and functions to use the QMAnalyzer naming convention instead of QueueingModel for config-related code in internal/config and internal/controller packages, plus callers in internal/engines/saturation. * Renames the internal analyzer config type QueueingModelConfig to QMConfig for consistency with the QMAnalyzer naming used in the config/controller layer. * guess slo using all variants of model * added unit tests for parameters and utils * fix lint-and-test errors * add model_name filtering to dispatch rate query and use 1m rate window * added pod based tuning and weighted sum of metrics * move queueing model configmap to deploy/ directory --------- Signed-off-by: vishakha-ramani <92736776+vishakha-ramani@users.noreply.github.com> Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com> Co-authored-by: tantawi <tantawi@us.ibm.com> Co-authored-by: Mohammed Munir Abdi <abdimamy@gmail.com>

* feat: add priority, roles, multi-analyzer scoring, and GreedyByScore optimizer Implement five interconnected enhancements to the V2 saturation engine: 1. Per-model priority config with "{modelID}#{namespace}" lookup and default fallback via resolveSaturationConfig() 2. P/D role types (prefill/decode/both) on VariantReplicaState, VariantDecision, and VariantCapacity, injected from deployment labels (llm-d.ai/role) 3. Per-role capacity aggregation in the V2 analyzer via aggregateByRole(), with scheduler queue demand split by role: prefill gets inputTokens, decode gets inputTokens+outputTokens 4. Multi-analyzer scoring infrastructure with AnalyzerScoreConfig and composite score = priority * sum(requiredCapacity_i * score_i) 5. Rename GreedyBySaturationOptimizer to GreedyByScoreOptimizer with score-based priority ordering, per-role work units for P/D disaggregated models, and proportional P/D balancing * refactor: demand-proportional P/D distribution in GreedyByScoreOptimizer Replace post-hoc proportionalPDBalance() with demand-proportional allocation integrated into the fair-share algorithm itself. Previously, disaggregated models created separate per-role work units that competed independently in fair-share, then a post-processing step patched up the P:D ratio based on initial replica counts. Now each model enters fair-share as a single entity. When allocating replicas, allocateByRole() distributes them between prefill/decode roles proportional to their per-role RequiredCapacity. If one role can't use GPUs (e.g. accelerator exhausted), subsequent iterations let the other role absorb its share naturally. Removed: proportionalPDBalance, initTargetsForRole, mergeWorkTargets Added: allocateByRole, allocateToVariants, roleDemands on modelWork * fix: prevent role absorption in demand-proportional P/D allocation When one role cannot fully allocate (e.g., accelerator exhausted), consume its unallocated share from remaining so it does not overflow to other roles in subsequent fair-share iterations. * refactor: use Analyzers list for V2 selection with per-analyzer threshold overrides Replace analyzerName-based V2 detection with IsV2() that checks for Analyzers list (new-style) or analyzerName (backward compat). Add per-analyzer ScaleUpThreshold/ScaleDownBoundary overrides on AnalyzerScoreConfig with global fallback via EffectiveScaleUpThreshold/ EffectiveScaleDownBoundary methods. * refactor: convert interfaces tests to Ginkgo Convert saturation_scaling_test.go from standard testing to Ginkgo Describe/It/Expect style with DescribeTable for validation cases. Add suite_test.go for Ginkgo test runner bootstrap. * refactor: unify enforcer to single EnforcePolicyOnDecisions entry point Remove V1-specific EnforcePolicy (map-based) and its helpers (applyScaleToZero, ensureMinimumReplicas). The V1 path now converts targets to decisions first, then uses EnforcePolicyOnDecisions — the same path as V2. This eliminates the dependency on VariantSaturationAnalysis for cost data in the enforcer. * refactor: move SaturationScalingConfig types from interfaces to config package Move SaturationScalingConfig, AnalyzerScoreConfig, and related constants (DefaultPriority, DefaultScaleUpThreshold, DefaultScaleDownBoundary) from internal/interfaces/ to internal/config/ where they belong alongside the rest of the configuration types. Update all imports across 15 files.

) * 🐛 Enable scale-from-zero on CKS and OCP with KEDA support - Remove environment skip in scale_from_zero_test.go — test now runs on all platforms (KEDA must be pre-installed on the cluster) - Add retry logic to detect_inference_pool_api_group() to handle the race where InferencePool instances haven't been created yet after helmfile deploy - Make deploy_keda() skip helm install when KEDA CRD already exists (pre-installed on OCP via CMA operator, on CKS via helm) - Remove environment guard on SCALER_BACKEND=keda — supported everywhere Signed-off-by: Andy Anderson <andy@clubanderson.com> Signed-off-by: Andrew Anderson <andy@clubanderson.com> * 🐛 Increase deploy wait timeout from 60s to 600s for model loading The kubectl wait --timeout=60s for all deployments in the llm-d namespace was too short for model-serving pods (vLLM) that need to download and load large models (e.g. Meta-Llama-3.1-8B) into GPU memory. This caused both OCP and CKS nightly E2E to fail at the "Deploy guide via WVA install.sh" step. Default is now 600s (10 min), overridable via DEPLOY_WAIT_TIMEOUT env var. The vLLM startupProbe already allows up to 30 minutes. Signed-off-by: Andrew Anderson <andy@clubanderson.com> * 🐛 Address Copilot review feedback on KEDA and scale-from-zero - deploy_keda(): Check operator pods + APIService, not just CRD, to avoid false skip when stale CRD remains after prior uninstall - detect_inference_pool_api_group(): Implement actual namespace-first then cluster-wide fallback (comment said fallback but code didn't) - Pin KEDA chart version (KEDA_CHART_VERSION, default 2.19.0) for reproducible installs - Fix ENABLE_SCALE_TO_ZERO default inconsistency in helm --set - Add Skip guard in scale-from-zero test for non-KEDA environments where HPA rejects minReplicas=0 - Fix misleading comment that said scale-from-zero requires KEDA - Document per-environment KEDA_NAMESPACE values in suite_test.go Signed-off-by: Andrew Anderson <andy@clubanderson.com> --------- Signed-off-by: Andy Anderson <andy@clubanderson.com> Signed-off-by: Andrew Anderson <andy@clubanderson.com>

…-d#864) * api: add minReplicas and maxReplicas to VariantAutoscalingSpec Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> * api: add behavior field to VariantAutoscalingConfigSpec for HPA scaling policies Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> * test: fix VA fixtures for maxReplicas validation and add CRD field tests for minReplicas, maxReplicas, and behavior Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> * refactor(api): change default maxReplicas from 10 to 2 Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> * refactor(api): remove behavior field to align with release plan Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> --------- Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

* feat: add min/max replicas as VA annotations with optimizer integration Add per-variant min/max replica bounds via VA annotations (wva.llmd.ai/min-replicas, wva.llmd.ai/max-replicas) and integrate them into both V1 and V2 scaling paths: - Parse bounds from VA annotations in BuildVariantStates - Respect maxReplicas in CostAwareOptimizer (spillover to next variant) - Respect minReplicas in costAwareScaleDown (hard floor per variant) - Respect maxReplicas in GreedyBySaturationOptimizer allocateForModel - Respect min/max in V1 limiter allocateForDecision - Clamp targets in V1 CalculateSaturationTargets - Disable scale-to-zero enforcement when any variant has minReplicas > 0 - Propagate bounds through VariantDecision for observability * refactor: use VA spec fields for min/max replicas instead of annotations Remove annotation-based min/max replica bounds (wva.llmd.ai/min-replicas, wva.llmd.ai/max-replicas) and read directly from VA spec.MinReplicas and spec.MaxReplicas fields added in llm-d#864. This eliminates the annotation parsing layer and aligns with the CRD as the single source of truth. * fix(e2e): align VA min/max replicas with test expectations - Set explicit MinReplicas=1 and MaxReplicas=10 in VA builder defaults (was implicit MinReplicas via kubebuilder default and MaxReplicas=2) - Add VAOption functional options (WithMinReplicas, WithMaxReplicas) for tests that need custom replica bounds - Scale-to-zero and scale-from-zero tests now create VAs with MinReplicas=0 so the engine allows scaling to zero replicas - MaxReplicas raised from 2 to 10 to match HPA maxReplicas and avoid artificially capping scale-up in load tests * fix: address PR review — enforce minReplicas in GreedyByScore scale-down - Pass stateMap to costAwareScaleDown in GreedyByScoreOptimizer so minReplicas is respected during scale-down (was missing) - Update doc comments: "VA annotation" → "VA spec field" in VariantDecision, VariantReplicaState, and saturation analyzer - Add tests verifying mixed-minReplicas behavior: variant with minReplicas=0 scales to zero while sibling with minReplicas>0 is preserved (CostAware and GreedyByScore)

…eAffinity (llm-d#882) * Auto discover accelerator name from deployment nodeSelector/nodeAffinity * address comments * add back comment * add comment * nil check * fix lint/merge error

Signed-off-by: Sum Huynh <31661254+shuynh2017@users.noreply.github.com>

…lm-d#894) * api: change NumReplicas to *int32 pointer for scale-to-zero support Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> Made-with: Cursor Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> * fix: dereference *int32 NumReplicas in fmt prints and revert unrelated go fmt changes Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> --------- Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

unparseable is a valid English variant of unparsable, used intentionally in configmap_helpers.go. Signed-off-by: Andrew Anderson <andy@clubanderson.com>

* stablizationWindow recommendation * address comment

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> Signed-off-by: Vivek Karunai Kiri Ragavan <60690810+vivekk16@users.noreply.github.com>

…d#886) Adds a third value for SCALER_BACKEND in deploy/install.sh: - 'none': skips all scaler backend deployment (no Prometheus Adapter install, no KEDA install, no APIService patching) Use case: llmd benchmark scripts that install WVA on GPU clusters where KEDA is already pre-installed. Previously, the default SCALER_BACKEND=prometheus-adapter would deploy Prometheus Adapter and patch the v1beta1.external.metrics.k8s.io APIService away from KEDA, breaking HPA scaling for all cluster workloads. Changes: - deploy/install.sh: add 'none' branch in main deploy flow, verify_deployment, and print_summary; update SCALER_BACKEND comment and env var docs - Makefile: update SCALER_BACKEND inline comment to document all three values Usage: SCALER_BACKEND=none make deploy-wva-emulated-on-kind Signed-off-by: Srujan Reddy <srjnreddy33@gmail.com>

…in "config" for OCP (llm-d#852) * fix: cert name and make mount for optional - keep only one replica for deployment via Kustomize - add serect for SA as optional for mount - if our EPP disable --metrics-endpoint-auth then no need it - by default GIE set true, so we need it otherwise will failed to get metrics - fix cert from CM openshift-service-ca.crt - add missing RBAC on inferencepool and deployment/scale - add missing rolebinding on controller for the monitoring view Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: code review - remove replica set in the deployment - make mount not optional - use existing path in the pod, skip configmap and mount Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: code review - remove comments Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com>

…for new VAs (llm-d#922) * fix: resolve accelerator bootstrapping deadlock for new VAs in multi-VA E2E test New VAs without prior metrics get stuck in applySaturationDecisions() because acceleratorName cannot be resolved from empty VA status or currentAllocations. This prevents HPA metric emission, creating a 3+ minute blind period where the HPA cannot scale the deployment. Add fallback accelerator resolution from deployment nodeSelector/nodeAffinity and VA label. Pre-load guidellm image into Kind cluster to eliminate runtime pull delays. Increase load job timeout from 5 to 8 minutes to account for tokenizer download on first run. * fix: use burst load for multi-VA test to trigger simulator KV cache tracking The simulator only tracks KV cache for /v1/completions requests. guidellm defaults to /v1/chat/completions, which bypasses KV cache tracking entirely. This causes avgSpareKv to remain high (0.8) despite active load, preventing the saturation engine from triggering scale-up. Switch to burst load (curl) targeting /v1/completions directly, matching the pattern used by the working smoke scale-up test. Use 2400 prompts with 400 output tokens to sustain load across multiple engine cycles. * fix: wait for scale-up instead of job completion in multi-VA E2E test The burst load jobs send 2400 requests at ~42s each, which takes ~84 minutes to complete — far exceeding the 10-minute test timeout. On Kind the lower network latency masks this, but on OpenShift the jobs always time out. Match the proven smoke test pattern: verify load jobs are running, wait for the saturation engine to detect load and scale up VA-A, then check the cost preference assertion.

… is provided (llm-d#925) Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

* skip multiple va test Signed-off-by: Lionel Villard <villard@us.ibm.com> * remove Focus Signed-off-by: Lionel Villard <villard@us.ibm.com> --------- Signed-off-by: Lionel Villard <villard@us.ibm.com>

* Add shuynh2017 to the OWNERS list * update go to 1.25.0 * update workflows docker docs * fix e2e test * Update .github/workflows/ci-e2e-openshift.yaml Co-authored-by: Mohammed Munir Abdi <abdimamy@gmail.com> Signed-off-by: Sum Huynh <31661254+shuynh2017@users.noreply.github.com> * Update .github/workflows/ci-e2e-openshift.yaml Co-authored-by: Mohammed Munir Abdi <abdimamy@gmail.com> Signed-off-by: Sum Huynh <31661254+shuynh2017@users.noreply.github.com> * fix format * fix format --------- Signed-off-by: Sum Huynh <31661254+shuynh2017@users.noreply.github.com> Co-authored-by: Mohammed Munir Abdi <abdimamy@gmail.com>

Add a check-code-changes job to the OpenShift E2E workflow that detects documentation-only PRs and skips the entire E2E pipeline, saving GPU runner time. Uses the same dorny/paths-filter config as ci-pr-checks.yaml. Comment triggers (/ok-to-test, /retest) and workflow_dispatch always run. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Unify /ok-to-test as single trigger for all E2E suites Replace /trigger-e2e-full, /test-e2e-full, and /test-full with /ok-to-test in ci-pr-checks.yaml so maintainers use one command to trigger both the full Kind E2E and OpenShift E2E workflows. /retest remains as an OpenShift-only re-run command. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address review comments: fix permission wording and clarify concurrency - Docs: change "Maintainers/admins only" to "Users with write access" to match the actual workflow permission check (admin/maintain/write) - Clarify concurrency header comment to note workflow_dispatch falls back to run_id since no PR number is available Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Lionel Villard <villard@us.ibm.com> --------- Signed-off-by: Lionel Villard <villard@us.ibm.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…m-d#909) Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

# Conflicts: # .github/workflows/ci-e2e-openshift.yaml # .github/workflows/ci-pr-checks.yaml # Dockerfile # OWNERS # api/v1alpha1/variantautoscaling_types.go # api/v1alpha1/zz_generated.deepcopy.go # deploy/install.sh # docs/developer-guide/testing.md # internal/controller/configmap_bootstrap.go # test/e2e/fixtures/va_builder.go # test/e2e/scale_from_zero_test.go

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

coderabbitai · 2026-03-26T17:38:11Z

📝 Walkthrough

Walkthrough

The PR introduces Go 1.25 support, upgrades Kubernetes autoscaling API from v1 to v2, adds min/max replica bounds to VariantAutoscaling CRD, and implements a queueing-model analyzer for SLO-driven capacity analysis. It refactors saturation configuration from interfaces to internal/config, extends the optimization pipeline with role-disaggregation support and a new GreedyByScoreOptimizer, and updates deployment/Helm configuration to support scale-to-zero via GIE flow control. Documentation and agentic workflow files are removed.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Actionable Issues

1. Queueing Model Analyzer Parameter Storage Race Condition

File: internal/engines/analyzers/queueingmodel/analyzer.go

The QueueingModelAnalyzer.modelsParameterStore is a map protected by ParameterStore's mutex, but the analyzer passes ParameterStore pointers across async contexts. If Update() is called during active Analyze() operations on overlapping models, concurrent map mutations can occur. The ParameterStore.Get() call in analyzer.go line ~400 (parameter retrieval) and concurrent Set() in the tuner path lack explicit synchronization guarantees at the analyzer level.

CWE-366: Improper Locking or Synchronization

2. Accelerator Resolution Precedence Not Documented

Files: internal/utils/utils.go, internal/engines/saturation/engine_v2.go

GetAcceleratorNameFromDeployment() searches nodeSelector, then nodeAffinity, then falls back to VA label, but multiple competing sources create inconsistent behavior. If a Deployment's nodeSelector specifies GPU-A and its nodeAffinity specifies GPU-B, the function returns GPU-A silently. No validation prevents conflicting specifications, and no audit trail documents which source was selected.

Risk: Silent acceptance of conflicting infrastructure intent.

3. MinReplicas/MaxReplicas Validation Gap in CRD

File: config/crd/bases/llmd.ai_variantautoscalings.yaml

CEL validation rule allows minReplicas to be unset when maxReplicas is set (rule: optional.all(d, !has(d.minReplicas) || d.minReplicas <= d.maxReplicas)). This permits maxReplicas: 2, minReplicas: unset, which the code treats as minReplicas: 1 (defaulted in VariantAutoscalingSpec). Enforcement occurs post-schema validation, creating a temporal gap where state reconciliation could act on stale CRD-validated values.

CWE-99: Improper Control of Resource Identifiers

4. `DecisionToOptimizedAlloc` Return Type Change Not Validated Across Callers

File: internal/engines/common/cache.go

Function signature changed from (int, string, metav1.Time) to (*int32, string, metav1.Time). All callers must handle nil pointers, but test coverage in cache_test.go only validates non-nil case (replicas == nil || *replicas != 3). If a caller forgets nil-guard, dereferencing a nil pointer causes panic. No compile-time check enforces coverage.

CWE-476: Null Pointer Dereference

5. QMConfig SLO Targets Not Validated at ConfigMap Parse Time

Files: internal/controller/configmap_helpers.go, internal/engines/analyzers/queueingmodel/config.go

parseQMAnalyzerConfig() unmarshals YAML and calls Validate(), which rejects configs with only TargetTTFT or only TargetITL set (must be paired). However, error handling logs and skips invalid entries, allowing silently dropped configuration. No feedback mechanism alerts operators that their ConfigMap entries were ignored.

CWE-391: Unchecked Error Condition

6. GreedyByScoreOptimizer Fair-Share Loop Assumes Non-Zero Demand

File: internal/engines/pipeline/greedy_score_optimizer.go

The fair-share allocation loop computes meanRemaining := totalRemaining / float64(len(activeModels)) without verifying totalRemaining > 0. If all models have zero RequiredCapacity, meanRemaining is 0.0 and the allocator skips all models, returning zero decisions. This is valid behavior but lacks explicit guard; if a bug zeros all scores upstream, silent under-allocation results.

Risk: Silent no-op behavior with zero allocation.

7. NumReplicas Pointer Dereference in Test Without Nil Guards

File: test/e2e/parallel_load_scaleup_test.go, test/e2e/saturation_test.go

Tests dereference va.Status.DesiredOptimizedAlloc.NumReplicas after nil checks in some places but assume non-nil after Eventually() polling completes. If polling succeeds but NumReplicas remains nil (controller bug), dereferencing causes panic. Example: line in saturation_test.go where scaledOptimized := *va.Status.DesiredOptimizedAlloc.NumReplicas executes without nil-guard inside the Eventually callback.

CWE-476: Null Pointer Dereference

8. Scale-To-Zero Enforcer Logic Split Between V1 and V2 Not Unified

File: internal/engines/saturation/engine.go

V1 path now converts saturation targets to decisions before enforcing scale-to-zero; V2 path enforces per-model. The skipping condition hasMinReplicasAboveZero(variantStates) differs per path. If a model has one variant with MinReplicas: 1 and another with MinReplicas: 0, V1 skips scale-to-zero for the whole model, but V2 enforces per-request. Inconsistent behavior across analyzers.

Risk: Non-deterministic scale-to-zero behavior.

9. ReplicaMetrics Now Carries Undefined/Unmeasured Fields

File: internal/interfaces/saturation_analyzer.go

ReplicaMetrics extended with ArrivalRate, MaxBatchSize, AvgTTFT, AvgITL for queueing model use. Saturation analyzer does not populate these fields; they remain 0. If saturation analyzer code path is invoked and then mistakenly used by queueing-model logic (or vice versa), zero values silently corrupt analysis without error.

CWE-391: Unchecked Error Condition

10. Dockerfile Multi-Stage Build Variable Scope Issue

File: Dockerfile

APP_BUILD_ROOT ARG default removed from builder stage but is used in deploy stage. If APP_BUILD_ROOT is not provided at build time, the deploy stage references an undefined variable, defaulting to empty string and breaking the binary path copy. No build-time error occurs; failure happens at container startup.

CWE-20: Improper Input Validation

coderabbitai

Actionable comments posted: 1

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

test/e2e/parallel_load_scaleup_test.go (1)
369-379: ⚠️ Potential issue | 🟠 Major

Use a local optimized replica count inside this poll.

scaledOptimized survives across Eventually iterations and is only overwritten on non-nil reads. If NumReplicas drops back to nil on a later reconcile, this assertion reuses the previous value and can pass on stale state.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/parallel_load_scaleup_test.go` around lines 369 - 379, The poll uses
a persisted variable scaledOptimized that can carry stale values across
Eventually iterations; reset or use a local variable inside the poll closure so
each iteration starts fresh: declare a new local optimizedCount (or reassign
scaledOptimized = 0) immediately before reading
va.Status.DesiredOptimizedAlloc.NumReplicas, set it only when NumReplicas !=
nil, use that local variable in GinkgoWriter.Printf and in the g.Expect
assertion (referencing scaledOptimized / optimizedCount and
va.Status.DesiredOptimizedAlloc.NumReplicas) so a nil read won't reuse a
previous non-nil value.
internal/controller/configmap_helpers.go (1)
37-49: ⚠️ Potential issue | 🟠 Major

Decode these ConfigMaps in strict mode.

Both parsers use yaml.Unmarshal, which silently drops unknown keys. QueueingModelScalingConfig.Validate() accepts zero values, so a typo can turn into a no-op config that is still counted as valid and applied cluster-wide. Use a yaml.Decoder with KnownFields(true) before Validate().
Suggested fix
+import "strings"
 ...
-		if err := yaml.Unmarshal([]byte(yamlStr), &qmConfig); err != nil {
+		dec := yaml.NewDecoder(strings.NewReader(yamlStr))
+		dec.KnownFields(true)
+		if err := dec.Decode(&qmConfig); err != nil {
 			logger.Error(err, "Failed to parse queueing model config entry", "key", key)
 			continue
 		}
As per coding guidelines, "REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps".
Also applies to: 60-70
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/configmap_helpers.go` around lines 37 - 49, The code is
using yaml.Unmarshal to decode ConfigMap entries into
config.SaturationScalingConfig (and similarly for QueueingModelScalingConfig),
which silently ignores unknown fields and can let typos produce zero-value/no-op
configs that still pass Validate(); replace the unmarshal call with a
yaml.Decoder created from the entry bytes, call decoder.KnownFields(true) before
Decode to enforce strict decoding, handle and log decode errors, then proceed to
call satConfig.Validate() (and the analogous Validate() for
QueueingModelScalingConfig) and only count/store the config if both decode and
validate succeed.
internal/engines/pipeline/enforcer.go (1)
147-169: ⚠️ Potential issue | 🟠 Major

Minimum-replica recovery ignores MinReplicas / MaxReplicas.

Lines 147-159 can pick the cheapest variant even when its MaxReplicas is 0, and Line 163 hardcodes the fallback to 1 instead of the configured minimum. Either this violates an explicit bound, or a later bounds clamp puts the model back at zero and defeats the no-scale-to-zero policy. Filter out variants that cannot legally run one replica, and seed the survivor to its MinReplicas when present.
Suggested fix
-	// Total is 0 — find cheapest variant and set it to 1
+	// Total is 0 — find the cheapest variant that can legally keep replicas
 	cheapestIdx := -1
 	cheapestCost := float64(-1)
 
 	for i := range decisions {
 		d := &decisions[i]
 		if d.ModelID != modelID || d.Namespace != namespace {
 			continue
 		}
+		if d.MaxReplicas != nil && *d.MaxReplicas < 1 {
+			continue
+		}
 		cost := d.Cost
 		if cost <= 0 {
 			cost = saturation.DefaultVariantCost
 		}
 		if cheapestCost < 0 || cost < cheapestCost || (cost == cheapestCost && d.VariantName < decisions[cheapestIdx].VariantName) {
@@
 	}
 
 	if cheapestIdx >= 0 {
-		decisions[cheapestIdx].TargetReplicas = 1
+		target := 1
+		if decisions[cheapestIdx].MinReplicas != nil && *decisions[cheapestIdx].MinReplicas > target {
+			target = *decisions[cheapestIdx].MinReplicas
+		}
+		decisions[cheapestIdx].TargetReplicas = target
 		updateDecisionAction(&decisions[cheapestIdx], optimizerName)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/engines/pipeline/enforcer.go` around lines 147 - 169, The current
cheapest-variant selection ignores variant replica bounds and then forces
TargetReplicas=1; instead, when scanning decisions for the cheapest match (use
decisions[i], ModelID, Namespace, cost logic as in current loop), skip any
variant whose MaxReplicas == 0 (or MaxReplicas < 1) so you only consider
variants that can legally run replicas; after choosing cheapestIdx, compute
target := decisions[cheapestIdx].MinReplicas if MinReplicas > 0 else 1, then
clamp target to decisions[cheapestIdx].MaxReplicas (i.e., target = min(target,
MaxReplicas)); set decisions[cheapestIdx].TargetReplicas = target, call
updateDecisionAction(&decisions[cheapestIdx], optimizerName), and log using
logger.Info with the variant name and resulting target/cost (retain cheapestCost
and tie-break behavior).
.github/workflows/ci-e2e-openshift.yaml (1)
291-315: ⚠️ Potential issue | 🟠 Major

Fork PRs on pull_request events bypass approval requirements—no blocking status is created.

When the gate job sets should_run='false' for fork PRs, the e2e-openshift job skips via its job condition (if: needs.gate.outputs.should_run == 'true'). Skipped jobs report as "Success" and do not block required checks per GitHub's branch protection behavior. Additionally, the report-status job only runs on issue_comment events (line 1180), meaning no commit status is created on the initial pull_request event. As a result, a branch protection rule requiring "e2e-openshift" has no status to enforce on pull_request, allowing the PR to merge without approval.

Create an explicit pending commit status in the gate job on pull_request for fork PRs using the exact context name ("CI - OpenShift E2E Tests / e2e-openshift") that branch protection will check. This ensures the merge is blocked until a maintainer comments /ok-to-test and the e2e-openshift job completes successfully via issue_comment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci-e2e-openshift.yaml around lines 291 - 315, When
detecting a fork PR where you set core.setOutput('should_run','false') (the
external-contributor branch in the gate logic around tryPostComment and
botComment), add an explicit GitHub commit status with state "pending" using the
exact context "CI - OpenShift E2E Tests / e2e-openshift" so branch protection
sees a blocking status; implement this by calling the GitHub Statuses API (via
octokit.rest.repos.createCommitStatus or actions/github-script) for the PR head
SHA only on pull_request events, include a short description and optional
target_url pointing at the workflow run, and keep the existing comment/posting
behavior unchanged so maintainers can later trigger /ok-to-test which will flip
the status when e2e-openshift runs.

🟠 Major comments (23)

Dockerfile-2-2 (1)
2-2: ⚠️ Potential issue | 🟠 Major

Pin the Go builder image by digest.

quay.io/projectquay/golang:1.25 is mutable, so a retagged or compromised upstream image can change the toolchain under the same commit and produce different controller binaries (CWE-494). Use an immutable digest here, like the Dockerfile.konflux already does.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` at line 2, The FROM line currently uses a mutable tag
"quay.io/projectquay/golang:1.25 AS builder"; replace this with an immutable
image digest form (e.g., "quay.io/projectquay/golang@sha256:<digest> AS
builder") so the Go builder image is pinned; obtain the correct sha256 digest
for the 1.25 image (as done in Dockerfile.konflux) and update the FROM
instruction accordingly to ensure reproducible builds.
deploy/kind-emulator/install.sh-223-239 (1)
223-239: ⚠️ Potential issue | 🟠 Major

Validate cached GUIDELLM_IMG architecture matches the target platform.

The code computes a target platform based on KIND_IMAGE_PLATFORM or host architecture, but if the image tag already exists locally (docker image inspect succeeds on line 232), it skips the docker pull --platform call entirely and loads whatever is cached. On mixed-architecture setups, a cached image from a previous run with a different architecture can cause kind load docker-image to preload the wrong image, resulting in exec format error at runtime.

Either always pull GUIDELLM_IMG with --platform "$platform" before loading (unconditional pull), or validate the cached image's architecture matches the target:
docker image inspect --format '{{.Architecture}}' "$GUIDELLM_IMG"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/kind-emulator/install.sh` around lines 223 - 239, The script currently
skips pulling GUIDELLM_IMG when a local image exists, which can leave a
wrong-architecture image loaded into Kind; update the logic around GUIDELLM_IMG
to either always pull with --platform "$platform" before loading or, if keeping
the cache check, validate the cached image architecture using docker image
inspect --format '{{.Architecture}}' "$GUIDELLM_IMG" and compare it to the
desired $platform (map linux/arm64 -> arm64, linux/amd64 -> amd64) and only
reuse the cached image when they match; otherwise perform docker pull --platform
"$platform" "$GUIDELLM_IMG" and proceed to load.
internal/saturation/analyzer_test.go-8-8 (1)
8-8: 🛠️ Refactor suggestion | 🟠 Major

Variable shadows imported package name.

config := config.SaturationScalingConfig{...} shadows the config package import. This compiles but is confusing and prevents accessing other config exports in the same scope. Rename the local variable.
Proposed fix (apply to all occurrences at lines 20, 83, 145, 197, 231, 271, 328)
-	config := config.SaturationScalingConfig{
+	satCfg := config.SaturationScalingConfig{
 		KvCacheThreshold:     0.80,
 		QueueLengthThreshold: 5,
 		KvSpareTrigger:       0.10,
 		QueueSpareTrigger:    3,
 	}
 	// ... later in the test:
-		config,
+		satCfg,
Also applies to: 20-25
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/saturation/analyzer_test.go` at line 8, The test file declares a
local variable named "config" that shadows the imported config package (e.g.,
config := config.SaturationScalingConfig{...}); rename the local variable
(suggestion: cfg or localCfg) wherever it is used (all occurrences constructing
SaturationScalingConfig) so code references to the imported package
(config.SaturationScalingConfig, other config exports) are not shadowed and
remain accessible.
internal/engines/pipeline/greedy_saturation_algorithm.go-112-120 (1)
112-120: ⚠️ Potential issue | 🟠 Major

Apply the MinReplicas floor before GPU allocation.

By the time Lines 114-115 raise TargetReplicas, the allocator has already been asked for the smaller pre-floor delta. A case like current=1, target=2, min=3 returns TargetReplicas=3 with one replica worth of GPUs and WasLimited=false, even if spare GPUs exist. Fold the min floor into replicasNeeded before TryAllocate, or mark the post-floor delta as limited.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/engines/pipeline/greedy_saturation_algorithm.go` around lines 112 -
120, The MinReplicas floor is applied after GPU allocation which causes
under-requesting GPUs; update the logic so MinReplicas is folded into
replicasNeeded before calling TryAllocate (or adjust replicasNeeded =
max(replicasNeeded, *d.MinReplicas)) so the allocator is asked for the
post-floor delta, and then set d.TargetReplicas from that result; ensure
WasLimited is computed relative to the post-floor comparison (replicasAllocated
< replicasNeeded) so the limited flag is correct. Reference: MinReplicas,
TargetReplicas, replicasNeeded, replicasAllocated, TryAllocate, WasLimited.
test/e2e/parallel_load_scaleup_test.go-239-245 (1)
239-245: ⚠️ Potential issue | 🟠 Major

Don't treat a nil optimizer result as stabilized when minReplicas is 0.

When scale-to-zero is enabled, this coerces a missing NumReplicas to 0, so the Eventually can pass before the VA has published any optimizer output. That turns the later assertions into a race on uninitialized status.
Fix
-			var optimized int32
-			if currentVA.Status.DesiredOptimizedAlloc.NumReplicas != nil {
-				optimized = *currentVA.Status.DesiredOptimizedAlloc.NumReplicas
-			}
+			g.Expect(currentVA.Status.DesiredOptimizedAlloc.NumReplicas).NotTo(BeNil(),
+				"VA should publish DesiredOptimizedAlloc before stabilization")
+			optimized := *currentVA.Status.DesiredOptimizedAlloc.NumReplicas
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/parallel_load_scaleup_test.go` around lines 239 - 245, The test is
treating a nil DesiredOptimizedAlloc.NumReplicas as 0 which lets the Eventually
pass when hpaMinReplicas==0 before the optimizer actually published a value;
change the check to require NumReplicas to be non-nil before evaluating its
value: in the block that reads
currentVA.Status.DesiredOptimizedAlloc.NumReplicas (and computes optimized)
ensure you first assert/currently wait that NumReplicas != nil (or explicitly
fail/wait when nil), then read *NumReplicas and compare it to hpaMinReplicas
(update the GinkgoWriter.Printf and the Expect that references optimized and
hpaMinReplicas accordingly) so the test won't consider an absent optimizer
result as stabilized.
test/e2e/scale_from_zero_test.go-39-45 (1)
39-45: ⚠️ Potential issue | 🟠 Major

Skip when flow-control prerequisites are absent.

This gate only checks backend/flag state. If GIE flow control or the required InferenceObjective is missing, the spec still runs and then spends several minutes waiting for a queue metric that can never appear. Probe the prerequisite resource/metric up front and Skip(...) early.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/scale_from_zero_test.go` around lines 39 - 45, Update the
scale-from-zero test gate to also detect missing GIE flow-control and the
required InferenceObjective before running the spec: in the same pre-test check
near the existing cfg.ScalerBackend / cfg.ScaleToZeroEnabled block in
scale_from_zero_test.go, probe (using the cluster test client used elsewhere in
tests) whether the GIE flow-control resource/metric is present and whether the
named InferenceObjective exists; if either is absent, call Skip(...) with a
clear message (similar style to the existing Skip) so the test exits early
instead of waiting for a metric that cannot appear.
charts/workload-variant-autoscaler/values.yaml-87-90 (1)
87-90: ⚠️ Potential issue | 🟠 Major

Do not ship H100 as the default accelerator fallback.

With the new auto-discovery path, a hardcoded fallback masks discovery failures and misclassifies unresolved workloads as H100. That skews cost-based optimization and allocation on any non-H100 cluster. Leave this empty by default and require an explicit override only when operators really need a manual fallback.
Suggested fix
-  accelerator: H100
+  accelerator: ""
As per coding guidelines, "REVIEW PRIORITIES: 2. Architectural issues and anti-patterns".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/workload-variant-autoscaler/values.yaml` around lines 87 - 90, The
values.yaml currently hardcodes accelerator: H100 which masks auto-discovery
failures; remove the hardcoded fallback so the accelerator key is empty/unset by
default (or set to null/""), forcing explicit operator override when needed;
update any docs/comments near the accelerator key and any code reading
.Values.accelerator to treat empty/null as "no fallback" and fall back to
auto-discovery logic in functions that reference accelerator.
test/e2e/smoke_test.go-124-132 (1)
124-132: ⚠️ Potential issue | 🟠 Major

Scope the pre-test cleanup to this suite's resources.

Line 125 deletes every VariantAutoscaling in cfg.LLMDNamespace, and Lines 126-132 only warn if that sweep fails. In a shared namespace or parallel run, this can remove another spec's fixture and make the suite order-dependent. Limit cleanup to the objects this suite created (for example by label or name prefix), and fail setup if that scoped cleanup cannot complete.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/smoke_test.go` around lines 124 - 132, The pre-test cleanup
currently calls utils.DeleteAllVariantAutoscalings(ctx, crClient,
cfg.LLMDNamespace) which removes every VariantAutoscaling in the namespace;
change this to a scoped cleanup that only targets objects created by this suite
(e.g., filter by a label or name prefix), by either using or adding a helper
such as utils.DeleteVariantAutoscalingsByLabel(ctx, crClient, cfg.LLMDNamespace,
"<suite-label>") or adding an optional selector parameter to
DeleteAllVariantAutoscalings; also make the setup fail (return error / call
Fail/Expect) if the scoped cleanup returns an error instead of merely logging a
warning so test ordering/shared namespaces cannot be impacted. Ensure you update
the test invocation to pass the suite-specific label/prefix and replace the
current GinkgoWriter warning branch with a hard failure on error.
test/e2e/fixtures/va_builder.go-136-145 (1)
136-145: ⚠️ Potential issue | 🟠 Major

The new default MinReplicas blocks scale-to-zero callers from opting out.

Lines 136-145 now hardcode Spec.MinReplicas=1 for every fixture VA, but the *WithDefaults helpers still don't accept VAOptions. That means callers can set the HPA minimum to 0 while the controller-side VA bound remains 1, so min-replica enforcement will clamp the variant back above zero.
Suggested fix
 func CreateVariantAutoscalingWithDefaults(
 	ctx context.Context,
 	crClient client.Client,
 	namespace, name, deploymentName, modelID, accelerator string,
-	controllerInstance string,
+	controllerInstance string,
+	opts ...VAOption,
 ) error {
-	return CreateVariantAutoscaling(ctx, crClient, namespace, name, deploymentName, modelID, accelerator, 30.0, controllerInstance)
+	return CreateVariantAutoscaling(ctx, crClient, namespace, name, deploymentName, modelID, accelerator, 30.0, controllerInstance, opts...)
 }
@@
 func EnsureVariantAutoscalingWithDefaults(
 	ctx context.Context,
 	crClient client.Client,
 	namespace, name, deploymentName, modelID, accelerator string,
-	controllerInstance string,
+	controllerInstance string,
+	opts ...VAOption,
 ) error {
-	return EnsureVariantAutoscaling(ctx, crClient, namespace, name, deploymentName, modelID, accelerator, 30.0, controllerInstance)
+	return EnsureVariantAutoscaling(ctx, crClient, namespace, name, deploymentName, modelID, accelerator, 30.0, controllerInstance, opts...)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/fixtures/va_builder.go` around lines 136 - 145, The fixture
currently hardcodes Spec.MinReplicas = ptr.To(int32(1)) which prevents callers
from opting into scale-to-zero; update the VA builder so MinReplicas is left nil
by default and only set to 1 if no VAOption overrides it: remove the immediate
assignment of MinReplicas in the va literal (the va variable), apply opts(va) as
currently done, then after applying options set va.Spec.MinReplicas =
ptr.To(int32(1)) only when it is still nil; ensure this change is reflected in
the WithDefaults helpers and that VAOption functions can set va.Spec.MinReplicas
to 0 to opt out.
test/e2e/smoke_test.go-736-738 (1)
736-738: ⚠️ Potential issue | 🟠 Major

Reset desiredReplicas on each poll before reading the pointer.

Lines 736-738 only update the outer desiredReplicas when NumReplicas is non-nil. If the status goes nil after a previous non-nil sample, the assertion keeps using the stale value and the Eventually can pass on old data.
Suggested fix
-				if va.Status.DesiredOptimizedAlloc.NumReplicas != nil {
-					desiredReplicas = int(*va.Status.DesiredOptimizedAlloc.NumReplicas)
-				}
+				currentDesired := 0
+				if va.Status.DesiredOptimizedAlloc.NumReplicas != nil {
+					currentDesired = int(*va.Status.DesiredOptimizedAlloc.NumReplicas)
+				}
+				desiredReplicas = currentDesired
@@
-				scaleUpDetected := desiredReplicas > int(initialOptimized)
+				scaleUpDetected := currentDesired > int(initialOptimized)
@@
-					g.Expect(desiredReplicas).To(BeNumerically(">", int(initialOptimized)),
-						fmt.Sprintf("VA should recommend more replicas than initial under load (current: %d, initial: %d, elapsed: %v)", desiredReplicas, initialOptimized, elapsed))
+					g.Expect(currentDesired).To(BeNumerically(">", int(initialOptimized)),
+						fmt.Sprintf("VA should recommend more replicas than initial under load (current: %d, initial: %d, elapsed: %v)", currentDesired, initialOptimized, elapsed))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/smoke_test.go` around lines 736 - 738, Reset desiredReplicas at the
start of each poll iteration before reading
va.Status.DesiredOptimizedAlloc.NumReplicas: ensure the polling loop sets
desiredReplicas = 0 (or another sentinel) before checking
va.Status.DesiredOptimizedAlloc.NumReplicas so that if the pointer is nil on a
later sample you don't reuse a stale value; update the block that currently
checks va.Status.DesiredOptimizedAlloc.NumReplicas to assign into
desiredReplicas only when the pointer is non-nil and rely on the reset to
represent nil cases for the Eventually assertion.
internal/engines/saturation/engine_queueing_model.go-157-165 (1)
157-165: ⚠️ Potential issue | 🟠 Major

Apply default TTFT/ITL targets from the default QM entry.

Lines 157-165 only copy TuningEnabled and SLOMultiplier from allConfigs["default"]. A valid default targetTTFT / targetITL pair is silently dropped, so cluster-wide SLO overrides never reach the analyzer.
Suggested fix
 	if defaultCfg, ok := allConfigs["default"]; ok {
 		if defaultCfg.TuningEnabled != nil {
 			cfg.TuningEnabled = *defaultCfg.TuningEnabled
 		}
 		if defaultCfg.SLOMultiplier > 1.0 {
 			cfg.SLOMultiplier = defaultCfg.SLOMultiplier
 		}
+		if defaultCfg.TargetTTFT > 0 && defaultCfg.TargetITL > 0 {
+			modelKey := queueingmodel.MakeModelKey(namespace, modelID)
+			cfg.SLOTargets = map[string]*queueingmodel.SLOTarget{
+				modelKey: {
+					TargetTTFT: defaultCfg.TargetTTFT,
+					TargetITL:  defaultCfg.TargetITL,
+				},
+			}
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/engines/saturation/engine_queueing_model.go` around lines 157 - 165,
The default QM entry handling currently only copies TuningEnabled and
SLOMultiplier from allConfigs["default"]; extend that block to also propagate
default targetTTFT and targetITL into the working cfg so cluster-wide SLO
targets reach the analyzer. Concretely, inside the if defaultCfg, ok :=
allConfigs["default"] { ... } block check the defaultCfg fields (e.g.,
TargetTTFT and TargetITL — or their pointer equivalents) and assign them to
cfg.TargetTTFT and cfg.TargetITL when the defaults are present (respect existing
non-zero/non-nil values on cfg so you don't overwrite explicit per-QM targets).
Ensure you use the same presence checks used for TuningEnabled and SLOMultiplier
to detect whether the default targets are set.
.github/workflows/ci-pr-checks.yaml-18-20 (1)
18-20: ⚠️ Potential issue | 🟠 Major

Concurrency guard substring matching allows DoS from untrusted commenters.

Line 19 uses contains(...) for substring matching in the concurrency condition, but the check-full-tests job (line 208) validates with exact-match validCommands.includes(comment) on the trimmed body. A comment like "please /ok-to-test this" passes the concurrency check (substring present, joins shared PR group) but fails validation (exact match required). This lets untrusted commenters cancel legitimate runs even though their comments are rejected—a trivial CI DoS. Use exact-match validation in the concurrency condition to match the command validator.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci-pr-checks.yaml around lines 18 - 20, The concurrency
condition currently uses contains(github.event.comment.body, '/ok-to-test')
which allows substring matches; change it to perform the same exact-trimmed
match used by the check-full-tests job (which uses
validCommands.includes(comment)). Replace the contains(...) check with an exact
comparison against the trimmed comment body (e.g., compare
github.event.comment.body trimmed to the literal '/ok-to-test' or mirror the
validCommands.includes(comment) logic) so the concurrency guard and the
check-full-tests validation use identical exact-match semantics.
internal/interfaces/queueing_model_scaling.go-54-67 (1)
54-67: ⚠️ Potential issue | 🟠 Major

Reject non-finite SLO values during validation (CWE-20).

Lines 54-67 only do range checks, so YAML values like .nan or .inf still pass Validate(): comparisons with NaN are false, and +Inf is greater than zero. That lets invalid config propagate into the analyzer and turn scaling math into NaNs/Infs.
Suggested fix
-import "fmt"
+import (
+	"fmt"
+	"math"
+)
@@
 	// SLOMultiplier: 0 = use default, >1 = valid, <=1 = invalid
+	if math.IsNaN(c.SLOMultiplier) || math.IsInf(c.SLOMultiplier, 0) {
+		return fmt.Errorf("sloMultiplier must be finite, got %v", c.SLOMultiplier)
+	}
 	if c.SLOMultiplier != 0 && c.SLOMultiplier <= 1.0 {
 		return fmt.Errorf("sloMultiplier must be > 1.0, got %.2f (k=1 means rho=0, no load tolerance; k<=1 is physically meaningless)", c.SLOMultiplier)
 	}
 
+	if math.IsNaN(float64(c.TargetTTFT)) || math.IsInf(float64(c.TargetTTFT), 0) {
+		return fmt.Errorf("targetTTFT must be finite, got %v", c.TargetTTFT)
+	}
 	if c.TargetTTFT < 0 {
 		return fmt.Errorf("targetTTFT must be >= 0, got %.2f", c.TargetTTFT)
 	}
+	if math.IsNaN(float64(c.TargetITL)) || math.IsInf(float64(c.TargetITL), 0) {
+		return fmt.Errorf("targetITL must be finite, got %v", c.TargetITL)
+	}
 	if c.TargetITL < 0 {
 		return fmt.Errorf("targetITL must be >= 0, got %.2f", c.TargetITL)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/interfaces/queueing_model_scaling.go` around lines 54 - 67, In the
Validate() method (file queueing_model_scaling.go) add non-finite checks for
SLOMultiplier, TargetTTFT and TargetITL before the existing range checks: reject
values that are NaN or +/-Inf (use math.IsNaN and math.IsInf or equivalent) and
return an error naming the offending field (SLOMultiplier, TargetTTFT,
TargetITL) so configs like .nan or .inf are rejected outright before the
existing comparisons and SLO target co-dependence check.
.github/workflows/ci-pr-checks.yaml-281-292 (1)
281-292: ⚠️ Potential issue | 🟠 Major

Verify the Kind binary before installing it as root (CWE-494).

The binary download lacks integrity verification. Lines 289-291 fetch an executable and install it with elevated privileges without checking authenticity. Kind releases include SHA256 checksums; they should be downloaded and verified before execution.
Suggested fix
-          curl -Lo ./kind "https://kind.sigs.k8s.io/dl/v0.25.0/kind-linux-${KIND_ARCH}"
+          KIND_VERSION="v0.25.0"
+          curl -fsSLo ./kind "https://kind.sigs.k8s.io/dl/${KIND_VERSION}/kind-linux-${KIND_ARCH}"
+          curl -fsSLo ./kind.sha256sum "https://kind.sigs.k8s.io/dl/${KIND_VERSION}/kind-linux-${KIND_ARCH}.sha256sum"
+          echo "$(cat ./kind.sha256sum)  ./kind" | sha256sum -c -
           chmod +x ./kind
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci-pr-checks.yaml around lines 281 - 292, The "Install
Kind" step downloads and installs the Kind binary without integrity checks;
update the step to fetch the official SHA256 checksum for the matching KIND_ARCH
and verify the downloaded binary before running chmod/sudo mv. Specifically,
after computing KIND_ARCH (from ARCH) and downloading with curl, download the
corresponding SHA256SUM (or .sha256) for v0.25.0, verify the binary with
sha256sum -c (or compare computed sha256sum) and exit non‑zero on mismatch, only
then run chmod +x ./kind and sudo mv ./kind /usr/local/bin/kind; ensure the
verification failure path logs an error and stops the workflow.
internal/utils/utils.go-494-495 (1)
494-495: ⚠️ Potential issue | 🟠 Major

Reject ambiguous "In" operators with multiple values to prevent incorrect capacity tracking.

When NodeSelectorOpIn specifies multiple GPU products, the Kubernetes scheduler may select any of them, but the code returns expr.Values[0] unconditionally. This bakes an arbitrary accelerator choice into capacity calculations and allocation decisions, causing wrong recommendations.

Only accept single-value expressions:
Fix
-				if expr.Operator == corev1.NodeSelectorOpIn && len(expr.Values) > 0 {
-					return expr.Values[0]
+				if expr.Operator == corev1.NodeSelectorOpIn {
+					if len(expr.Values) == 1 {
+						return expr.Values[0]
+					}
+					return ""
 				}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/utils/utils.go` around lines 494 - 495, The code currently returns
expr.Values[0] for NodeSelectorOpIn unconditionally; change the branch that
checks expr.Operator == corev1.NodeSelectorOpIn to only accept a single value:
if len(expr.Values) == 1 return that value, otherwise treat it as unsupported
(return an error or a clear "ambiguous selector" sentinel) instead of picking
the first entry; update the surrounding function that handles
expr.Operator/expr.Values to propagate this error/unsupported result so callers
don't bake an arbitrary accelerator choice into capacity calculations.
internal/utils/utils.go-475-481 (1)
475-481: ⚠️ Potential issue | 🟠 Major

Remove preferred node affinity from accelerator inference logic.

Lines 475-481 use PreferredDuringSchedulingIgnoredDuringExecution to infer the accelerator type. This is unreliable: Kubernetes allows the scheduler to ignore preferred affinity and place the pod on a different node type. When that occurs, the inferred accelerator (e.g., MI300 from affinity) will not match the actual GPU on the scheduled node (e.g., A100), causing GPU accounting errors and incorrect scaling decisions downstream. Restrict inference to hard constraints (nodeSelector and RequiredDuringSchedulingIgnoredDuringExecution only), or validate against the actual scheduled pod/node labels before using the value.

Additionally, line 494 (expr.Values[0]) selects the first value from a multi-value In operator without justification. All values in In are semantically equivalent alternatives; there is no specified precedence.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/utils/utils.go` around lines 475 - 481, Remove use of
nodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution in accelerator
inference: stop extracting GPU from the preferred list (delete the loop that
calls extractGPUFromNodeSelectorTerm for
PreferredDuringSchedulingIgnoredDuringExecution) and only infer accelerators
from hard constraints (nodeSelector and
nodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution) or otherwise
validate against the actual scheduled node labels before using the inferred
value; also replace the direct use of expr.Values[0] (in the code handling an
"In" operator) with logic that handles all expr.Values (e.g., iterate over
expr.Values or otherwise treat them as equivalent alternatives) instead of
arbitrarily choosing the first value so multi-value "In" selectors are handled
correctly.
internal/engines/pipeline/greedy_score_optimizer.go-112-115 (1)
112-115: ⚠️ Potential issue | 🟠 Major

remaining changes meaning from score to raw capacity.

Lines 112-115 seed remaining with req.Result.Score, but Lines 347-349 subtract raw PerReplicaCapacity. Once priority or analyzer weights differ from 1, high-score models never converge correctly and can keep absorbing GPUs until caps are hit. Track score ordering separately from capacity still needed, or decrement by the same weighted contribution used to build Score.

Also applies to: 347-349
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/engines/pipeline/greedy_score_optimizer.go` around lines 112 - 115,
The variable remaining currently conflates score and raw capacity (seeded from
req.Result.Score or RequiredCapacity) but later is decremented by raw
PerReplicaCapacity, causing incorrect behavior when weights differ; fix by
introducing two separate variables (e.g., remainingScore and remainingCapacity)
so ordering/selection uses remainingScore (seeded from req.Result.Score) while
capacity bookkeeping uses remainingCapacity (seeded from
req.Result.RequiredCapacity), and when assigning replicas decrement
remainingScore by the exact weighted contribution used to compute Score (the
priority/analyzer-weighted delta) and decrement remainingCapacity by
PerReplicaCapacity; update the code paths that subtract (the block referencing
PerReplicaCapacity and the seeding logic around
req.Result.Score/RequiredCapacity) to use the appropriate variable.
deploy/install.sh-536-543 (1)
536-543: ⚠️ Potential issue | 🟠 Major

Do not infer the pool group from other namespaces.

If Lines 527-534 race before the current release creates its InferencePool, Lines 537-541 fall back to -A and can pick a stale pool from another namespace. On shared clusters that upgrades WVA to the wrong poolGroup, and scale-from-zero watches the wrong resource type. Prefer API discovery/CRD detection, or only trust objects from LLMD_NS.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/install.sh` around lines 536 - 543, The fallback currently uses
cluster-wide `kubectl get ... -A` which can pick up stale `InferencePool` from
other namespaces; change the logic so `DETECTED_POOL_GROUP` is only set from
resources in the target namespace (`LLMD_NS`) or determined via API/CRD
discovery instead of `-A`: first try `kubectl get
inferencepools.inference.networking.k8s.io -n "$LLMD_NS"` and `kubectl get
inferencepools.inference.networking.x-k8s.io -n "$LLMD_NS"`, and if neither
exists, use API discovery (e.g., `kubectl api-resources` / CRD lookup) to detect
the correct group rather than scanning all namespaces. Ensure all checks
reference `LLMD_NS` and only fall back to CRD-based detection when the
namespaced checks fail, then set `DETECTED_POOL_GROUP` accordingly.
deploy/install.sh-1117-1127 (1)
1117-1127: ⚠️ Potential issue | 🟠 Major

This KEDA short-circuit does not verify who owns external.metrics.k8s.io.

Line 1122 only proves that the APIService object exists. If Prometheus Adapter (or another provider) still owns that APIService, this returns early and SCALER_BACKEND=keda comes up without a working KEDA metrics server. Check .spec.service.name and .spec.service.namespace before treating KEDA as operational.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/install.sh` around lines 1117 - 1127, The current short-circuit only
checks that the APIService object exists; update the apiservice check (the
kubectl get apiservice v1beta1.external.metrics.k8s.io branch) to fetch and
validate .spec.service.name and .spec.service.namespace (e.g. via kubectl get
apiservice v1beta1.external.metrics.k8s.io -o jsonpath=...) and ensure they
point to the KEDA metrics service (compare against the expected service
name/namespace used by the KEDA operator deployment, or derive them from the
operator pod/service labels), and only call log_success "KEDA CRD, operator, and
metrics APIService detected — skipping helm install" and return when those
fields match; otherwise emit the existing log_warning and proceed with helm
install.
internal/engines/pipeline/greedy_score_optimizer.go-331-339 (1)
331-339: ⚠️ Potential issue | 🟠 Major

Explicit maxReplicas: 0 is ignored.

Line 332 only enforces the bound when *state.MaxReplicas > 0. With the new semantics, nil means "no cap", so a non-nil zero should still forbid allocation. Drop the > 0 guard.
Suggested fix
-		if state.MaxReplicas != nil && *state.MaxReplicas > 0 {
+		if state.MaxReplicas != nil {
 			maxAdd := *state.MaxReplicas - w.targets[vc.VariantName]
 			if maxAdd <= 0 {
 				continue // already at max
 			}
 			if n > maxAdd {
 				n = maxAdd
 			}
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/engines/pipeline/greedy_score_optimizer.go` around lines 331 - 339,
The current cap logic in the greedy scorer skips enforcement when
state.MaxReplicas is zero because of the `> 0` guard; change the check in the
block that references state.MaxReplicas, w.targets[vc.VariantName],
vc.VariantName and variable n so that any non-nil state.MaxReplicas (including
0) is enforced: remove the `> 0` condition, compute maxAdd := *state.MaxReplicas
- w.targets[vc.VariantName], and keep the existing behavior of continuing when
maxAdd <= 0 and setting n = maxAdd when n > maxAdd.
.github/workflows/ci-e2e-openshift.yaml-83-89 (1)
83-89: ⚠️ Potential issue | 🟠 Major

Pin the new actions to immutable SHAs.

actions/checkout@v4 and dorny/paths-filter@v3 are tag-based refs, vulnerable to retagging attacks (CWE-927). Use full commit SHAs for both uses: entries before merging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci-e2e-openshift.yaml around lines 83 - 89, The workflow
uses tag-based action refs "actions/checkout@v4" and "dorny/paths-filter@v3"
which are mutable; replace each with the corresponding full immutable commit SHA
for the desired release (pin the "uses:" values to the full commit SHAs), e.g.,
update the "uses:" entries for actions/checkout and dorny/paths-filter to their
canonical commit SHAs so the CI references an immutable revision.
deploy/install.sh-1756-1759 (1)
1756-1759: ⚠️ Potential issue | 🟠 Major

Fail fast for SCALER_BACKEND=none when no metrics API is present.

This branch only logs an assumption. On a misconfigured cluster the script still exits green, but HPA can never resolve external.metrics.k8s.io. Add an explicit APIService check here and abort if it is missing.
Suggested fix
     if [ "$SCALER_BACKEND" = "keda" ]; then
         deploy_keda
     elif [ "$SCALER_BACKEND" = "none" ]; then
+        if ! kubectl get apiservice v1beta1.external.metrics.k8s.io >/dev/null 2>&1; then
+            log_error "SCALER_BACKEND=none requires a working external.metrics.k8s.io APIService"
+        fi
         log_info "Skipping scaler backend deployment (SCALER_BACKEND=none)"
         log_info "Assumes an external metrics API (e.g. KEDA) is already installed on the cluster"
     elif [ "$DEPLOY_PROMETHEUS_ADAPTER" = "true" ]; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/install.sh` around lines 1756 - 1759, When SCALER_BACKEND is "none"
the script only logs an assumption but should fail if the external metrics API
is not present; update the branch handling SCALER_BACKEND="none" to verify the
Kubernetes APIService for "v1beta1.external.metrics.k8s.io" (or the canonical
external.metrics.k8s.io APIService name used in the cluster) using kubectl/oc
and, if the APIService is absent or not Available, call log_error and exit
non-zero instead of continuing; keep the existing log_info messages but add the
APIService existence/availability check before returning control so HPA
resolution failures are caught early.
internal/actuator/actuator.go-41-53 (1)
41-53: ⚠️ Potential issue | 🟠 Major

deployment.Status.Replicas >= 0 will always be true, causing the fallback logic to never execute.

Since Status.Replicas is defined as an int32 that should be non-negative per the Kubernetes API spec, the comparison at line 47 always succeeds—even for newly created Deployments where the controller hasn't yet populated the status. This causes the function to return 0 for a fresh Deployment, bypassing the intended fallback to spec.replicas at lines 52–53.

Validate that the Deployment controller has observed the current spec generation before trusting the status:
Suggested fix
 func (a *Actuator) GetCurrentDeploymentReplicasFromDeployment(va *llmdOptv1alpha1.VariantAutoscaling, deployment *appsv1.Deployment) (int32, error) {
 	if deployment == nil {
 		return 0, fmt.Errorf("deployment cannot be nil for %s/%s", va.Namespace, va.GetScaleTargetName())
 	}
 
-	// Prefer status replicas (actual current state)
-	if deployment.Status.Replicas >= 0 {
+	// Prefer status once the deployment controller has observed this spec.
+	if deployment.Status.ObservedGeneration >= deployment.Generation {
 		return deployment.Status.Replicas, nil
 	}
 
-	// Fallback to spec if status not ready
+	// Fallback to spec if status is still stale.
 	if deployment.Spec.Replicas != nil {
 		return *deployment.Spec.Replicas, nil
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/actuator/actuator.go` around lines 41 - 53, The
GetCurrentDeploymentReplicasFromDeployment function currently trusts
deployment.Status.Replicas unconditionally; change the logic to only use
status.Replicas when the controller has observed the current generation (compare
deployment.Status.ObservedGeneration to deployment.Generation) and otherwise
fall back to deployment.Spec.Replicas (or 0/error if spec is nil). Update the
checks inside GetCurrentDeploymentReplicasFromDeployment to: validate deployment
non-nil, if Status.ObservedGeneration == Generation then return Status.Replicas,
else if Spec.Replicas != nil return *Spec.Replicas, otherwise return an
appropriate error or 0; keep references to the function name
GetCurrentDeploymentReplicasFromDeployment and the VariantAutoscaling parameter
to locate the code.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 85a478fb-508d-4cae-896b-4036aa1b3335

📥 Commits

Reviewing files that changed from the base of the PR and between a190271 and f16e6c0.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum, !**/*.sum

📒 Files selected for processing (114)

.github/aw/github-agentic-workflows.md
.github/workflows/ci-e2e-openshift.yaml
.github/workflows/ci-pr-checks.yaml
.github/workflows/link-checker.lock.yml
.github/workflows/link-checker.md
.github/workflows/typo-checker.lock.yml
.github/workflows/typo-checker.md
.github/workflows/upstream-monitor.lock.yml
.github/workflows/upstream-monitor.md
CONTRIBUTING.md
Dockerfile
Dockerfile.konflux
Makefile
OWNERS
README.md
_typos.toml
api/v1alpha1/variantautoscaling_types.go
api/v1alpha1/variantautoscaling_types_test.go
api/v1alpha1/zz_generated.deepcopy.go
charts/workload-variant-autoscaler/README.md
charts/workload-variant-autoscaler/crds/llmd.ai_variantautoscalings.yaml
charts/workload-variant-autoscaler/templates/manager/wva-configmap.yaml
charts/workload-variant-autoscaler/templates/manager/wva-deployment-controller-manager.yaml
charts/workload-variant-autoscaler/templates/prometheus-ca-configmap-prom.yaml
charts/workload-variant-autoscaler/templates/prometheus-ca-configmap-wva.yaml
charts/workload-variant-autoscaler/values-dev.yaml
charts/workload-variant-autoscaler/values.yaml
config/crd/bases/llmd.ai_variantautoscalings.yaml
config/samples/hpa-integration.yaml
deploy/README.md
deploy/configmap-queueing-model.yaml
deploy/install.sh
deploy/kind-emulator/install.sh
deploy/kubernetes/README.md
deploy/openshift/README.md
docs/developer-guide/development.md
docs/developer-guide/testing.md
docs/user-guide/crd-reference.md
go.mod
internal/actuator/actuator.go
internal/actuator/actuator_test.go
internal/collector/registration/queueing_model.go
internal/collector/replica_metrics.go
internal/collector/source/pod_va_mapper_test.go
internal/config/config.go
internal/config/config_test.go
internal/config/helpers.go
internal/config/helpers_test.go
internal/config/loader_test.go
internal/config/saturation_scaling.go
internal/config/saturation_scaling_test.go
internal/config/suite_test.go
internal/controller/configmap_bootstrap.go
internal/controller/configmap_helpers.go
internal/controller/configmap_reconciler.go
internal/controller/indexers/indexers.go
internal/controller/indexers/indexers_test.go
internal/controller/predicates.go
internal/controller/variantautoscaling_controller.go
internal/controller/variantautoscaling_controller_test.go
internal/engines/analyzers/queueingmodel/analyzer.go
internal/engines/analyzers/queueingmodel/config.go
internal/engines/analyzers/queueingmodel/defaults.go
internal/engines/analyzers/queueingmodel/parameters.go
internal/engines/analyzers/queueingmodel/parameters_test.go
internal/engines/analyzers/queueingmodel/tuner/defaults.go
internal/engines/analyzers/queueingmodel/tuner/utils.go
internal/engines/analyzers/queueingmodel/utils.go
internal/engines/analyzers/queueingmodel/utils_test.go
internal/engines/analyzers/saturation_v2/analyzer.go
internal/engines/analyzers/saturation_v2/analyzer_test.go
internal/engines/common/cache.go
internal/engines/common/cache_test.go
internal/engines/pipeline/cost_aware_optimizer.go
internal/engines/pipeline/cost_aware_optimizer_test.go
internal/engines/pipeline/enforcer.go
internal/engines/pipeline/enforcer_test.go
internal/engines/pipeline/greedy_saturation_algorithm.go
internal/engines/pipeline/greedy_saturation_algorithm_test.go
internal/engines/pipeline/greedy_saturation_optimizer.go
internal/engines/pipeline/greedy_score_optimizer.go
internal/engines/pipeline/greedy_score_optimizer_test.go
internal/engines/pipeline/optimizer_interfaces.go
internal/engines/saturation/engine.go
internal/engines/saturation/engine_queueing_model.go
internal/engines/saturation/engine_test.go
internal/engines/saturation/engine_v2.go
internal/engines/saturation/engine_v2_test.go
internal/engines/scalefromzero/engine.go
internal/engines/scalefromzero/engine_test.go
internal/interfaces/analyzer.go
internal/interfaces/queueing_model_scaling.go
internal/interfaces/queueing_model_scaling_test.go
internal/interfaces/saturation_analyzer.go
internal/interfaces/saturation_scaling_test.go
internal/saturation/analyzer.go
internal/saturation/analyzer_test.go
internal/utils/allocation.go
internal/utils/tls.go
internal/utils/tls_test.go
internal/utils/utils.go
internal/utils/utils_test.go
internal/utils/variant.go
internal/utils/variant_test.go
test/e2e/fixtures/va_builder.go
test/e2e/limiter_test.go
test/e2e/parallel_load_scaleup_test.go
test/e2e/saturation_test.go
test/e2e/scale_from_zero_test.go
test/e2e/scale_to_zero_test.go
test/e2e/smoke_test.go
test/e2e/suite_test.go
test/utils/debug_helpers.go
test/utils/e2eutils.go

💤 Files with no reviewable changes (11)

deploy/openshift/README.md
.github/aw/github-agentic-workflows.md
.github/workflows/upstream-monitor.lock.yml
.github/workflows/typo-checker.md
deploy/kubernetes/README.md
.github/workflows/upstream-monitor.md
.github/workflows/link-checker.md
internal/interfaces/saturation_scaling_test.go
.github/workflows/link-checker.lock.yml
internal/engines/pipeline/greedy_saturation_optimizer.go
.github/workflows/typo-checker.lock.yml

coderabbitai · 2026-03-26T17:57:45Z

+	NIS        float64     // Normalized Innovation Squared
+	Covariance [][]float64 // state covariance matrix


⚠️ Potential issue | 🔴 Critical

Return and store copies, not shared *LearnedParameters.

Get hands out the in-map pointer and Set stores the caller's pointer verbatim. That lets any goroutine mutate Covariance, NIS, or LastUpdated after the lock is released, so this store can still race and corrupt shared state.

Proposed fix

+func cloneLearnedParameters(in *LearnedParameters) *LearnedParameters { + if in == nil { + return nil + } + + out := *in + if in.Covariance != nil { + out.Covariance = make([][]float64, len(in.Covariance)) + for i := range in.Covariance { + out.Covariance[i] = append([]float64(nil), in.Covariance[i]...) + } + } + return &out +} + // Get retrieves parameters for a variant (nil if does not exist) func (s *ParameterStore) Get(namespace, variantName string) *LearnedParameters { s.mu.RLock() defer s.mu.RUnlock() key := makeVariantKey(namespace, variantName) - return s.params[key] + return cloneLearnedParameters(s.params[key]) } // Set stores parameters for a variant (overrides any earlier parameters) func (s *ParameterStore) Set(namespace, variantName string, params *LearnedParameters) { s.mu.Lock() defer s.mu.Unlock() key := makeVariantKey(namespace, variantName) - s.params[key] = params + s.params[key] = cloneLearnedParameters(params) }

Also applies to: 35-47

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/engines/analyzers/queueingmodel/parameters.go` around lines 21 - 22, Get currently returns the pointer from the map and Set stores the caller's pointer directly, allowing callers to mutate shared state; change the Get and Set methods for the LearnedParameters map to always produce and consume deep copies: when getting, allocate a new LearnedParameters, copy primitive fields (NIS, LastUpdated), and deep-copy Covariance by allocating outer and inner slices and copying floats; when setting, make the same deep copy of the incoming *LearnedParameters before storing it in the map so the map never holds or exposes caller-owned slice pointers.

should consider this comment

@shuynh2017, agreed — this is a real race condition (CWE-362). Both Get and Set hand out/store raw pointers to LearnedParameters, so any goroutine can mutate Covariance, NIS, or LastUpdated after the lock is released. The cloneLearnedParameters helper correctly isolates ownership on both sides.

Add cloneLearnedParameters helper that deep-copies Covariance (nested slice)

Update Get to return a clone instead of the raw map pointer

Update Set to store a clone instead of the caller's pointer

Failed to handle agent chat message. Please try again.

do you want to open PR in upstream for this fix?

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

sure I can get around to it.

vivekk16

/lgtm

vivekk16 · 2026-03-27T17:14:45Z

/approve

shuynh2017 and others added 30 commits March 9, 2026 20:04

shuynh_remove_unused_var (llm-d#859)

4af3ef9

limiter e2e test fix (llm-d#862)

4a6750a

build: sync Helm chart CRD from config/crd/bases on make manifests (l…

d2a40d5

…lm-d#863) Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

Fix failing test (llm-d#868)

e2c2c15

Fix smoke_test (llm-d#870)

b1b8707

refactor: replace autoscalingv1.CrossVersionObjectReference with auto…

b8d93b2

…scalingv2 (llm-d#871) Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

Feat: Auto discover accelerator name from deployment nodeSelector/nod…

ae64325

…eAffinity (llm-d#882) * Auto discover accelerator name from deployment nodeSelector/nodeAffinity * address comments * add back comment * add comment * nil check * fix lint/merge error

add wva paper to readme (llm-d#891)

4fcefe6

Add shuynh2017 to the list of owners (llm-d#896)

d3f7c8c

Signed-off-by: Sum Huynh <31661254+shuynh2017@users.noreply.github.com>

🐛 Add unparseable to typos allowlist (llm-d#875)

0871dfa

unparseable is a valid English variant of unparsable, used intentionally in configmap_helpers.go. Signed-off-by: Andrew Anderson <andy@clubanderson.com>

doc: stablizationWindow recommendations (llm-d#850)

9348e6e

* stablizationWindow recommendation * address comment

owners: onboard vivekk16 as a reviewer (llm-d#897)

2016778

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> Signed-off-by: Vivek Karunai Kiri Ragavan <60690810+vivekk16@users.noreply.github.com>

fix: skip CA cert loading when insecureSkipVerify is true and no cert…

834af86

… is provided (llm-d#925) Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

test: skiping multiple va saturation test (llm-d#934)

cef2858

* skip multiple va test Signed-off-by: Lionel Villard <villard@us.ibm.com> * remove Focus Signed-off-by: Lionel Villard <villard@us.ibm.com> --------- Signed-off-by: Lionel Villard <villard@us.ibm.com>

feat(helm): expose controller replica count as configurable value (ll…

b745c47

…m-d#909) Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

zdtsw added 2 commits March 26, 2026 18:31

update: add konflux dockerfile

e222f8a

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

fix: owner list

f16e6c0

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

zdtsw mentioned this pull request Mar 26, 2026

update: add agent to do upstream sync #41

Open

3 tasks

zdtsw requested review from Gregory-Pereira and shuynh2017 March 26, 2026 17:39

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

shuynh2017 mentioned this pull request Mar 27, 2026

Bug fix: Fix LearnedParameters race condition llm-d/llm-d-workload-variant-autoscaler#943

Merged

vivekk16 reviewed Mar 27, 2026

View reviewed changes

github-actions bot approved these changes Mar 27, 2026

View reviewed changes

pierDipi added the lgtm label Mar 27, 2026

zdtsw merged commit 2a57b29 into opendatahub-io:main Mar 27, 2026
7 checks passed

		NIS float64 // Normalized Innovation Squared
		Covariance [][]float64 // state covariance matrix

Conversation

zdtsw commented Mar 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Actionable Issues

1. Queueing Model Analyzer Parameter Storage Race Condition

2. Accelerator Resolution Precedence Not Documented

3. MinReplicas/MaxReplicas Validation Gap in CRD

4. DecisionToOptimizedAlloc Return Type Change Not Validated Across Callers

5. QMConfig SLO Targets Not Validated at ConfigMap Parse Time

6. GreedyByScoreOptimizer Fair-Share Loop Assumes Non-Zero Demand

7. NumReplicas Pointer Dereference in Test Without Nil Guards

8. Scale-To-Zero Enforcer Logic Split Between V1 and V2 Not Unified

9. ReplicaMetrics Now Carries Undefined/Unmeasured Fields

10. Dockerfile Multi-Stage Build Variable Scope Issue

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuynh2017 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

zdtsw Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

shuynh2017 Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

shuynh2017 Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

vivekk16 left a comment

Choose a reason for hiding this comment

Uh oh!

vivekk16 commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

zdtsw commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 26, 2026 •

edited

Loading

4. `DecisionToOptimizedAlloc` Return Type Change Not Validated Across Callers

coderabbitai bot Mar 26, 2026 •

edited

Loading