You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: enhance conformance evidence collection with gateway, webhook, and HPA scale-down tests
Enhance the evidence collection script and regenerate all evidence with
additional checks inspired by the Go-based conformance validator:
Script enhancements:
- Gateway: verify GatewayClass Accepted and Gateway Programmed conditions
(not just existence)
- Robust operator: add webhook rejection test (submit invalid CR, verify
webhook denies it)
- HPA: add scale-down verification after scale-up (replace GPU workload
with idle container, verify HPA scales back to minReplicas)
- HPA: fix pod Error status during scale-down by deleting deployment
cleanly before creating idle replacement
- Fix capture function to strip absolute paths from command display
- Fix namespace deletion race with kubectl wait --for=delete
- Tighten HPA verdict to require actual scaling for PASS
- Add early exit for unhealthy pods in HPA wait loop
- Remove readOnlyRootFilesystem from DRA test manifests (blocks CDI
device injection)
- Replace gpu-burn references with CUDA N-Body Simulation
- Sanitize AMI ID in cluster-autoscaling evidence
Evidence regenerated:
- All 8 conformance requirements: PASS
- No leaked local paths or sensitive information
- Consistent format across all evidence documents
Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
capture "InferencePools" kubectl get inferencepools -A
612
632
capture "HTTPRoutes" kubectl get httproutes -A
613
633
614
-
# Verdict
634
+
# Verdict — check both GatewayClass Accepted and Gateway Programmed
615
635
echo"">>"${EVIDENCE_FILE}"
616
-
local gw_count
617
-
gw_count=$(kubectl get gateways -A --no-headers 2>/dev/null | wc -l | tr -d '')
618
-
if [ "${gw_count}"-gt 0 ];then
619
-
echo"**Result: PASS** — kgateway controller running, Gateway API and inference extension CRDs installed, active Gateway programmed with external address.">>"${EVIDENCE_FILE}"
636
+
local gw_accepted gw_programmed
637
+
gw_accepted=$(kubectl get gatewayclass kgateway -o jsonpath='{.status.conditions[?(@.type=="Accepted")].status}'2>/dev/null)
638
+
gw_programmed=$(kubectl get gateway inference-gateway -n kgateway-system -o jsonpath='{.status.conditions[?(@.type=="Programmed")].status}'2>/dev/null)
639
+
if [ "${gw_accepted}"="True" ] && [ "${gw_programmed}"="True" ];then
echo"**Result: PASS** — HPA successfully read gpu_utilization metric and scaled replicas when utilization exceeded target threshold.">>"${EVIDENCE_FILE}"
951
+
if [ "${hpa_scaled}"="true" ] && [ "${hpa_scaled_down}"="true" ];then
952
+
echo"**Result: PASS** — HPA successfully scaled up when GPU utilization exceeded target, and scaled back down when load was removed.">>"${EVIDENCE_FILE}"
953
+
elif [ "${hpa_scaled}"="true" ];then
954
+
echo"**Result: PASS** — HPA successfully read gpu_utilization metric and scaled replicas when utilization exceeded target threshold. Scale-down not verified within timeout.">>"${EVIDENCE_FILE}"
825
955
else
826
956
echo"**Result: FAIL** — HPA did not scale replicas within the timeout. Check GPU workload, DCGM exporter, and prometheus-adapter configuration.">>"${EVIDENCE_FILE}"
0 commit comments