You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: enhance conformance evidence collection with gateway, webhook, and HPA scale-down tests
Enhance the evidence collection script and regenerate all evidence with
additional checks inspired by the Go-based conformance validator:
Script enhancements:
- Gateway: verify GatewayClass Accepted and Gateway Programmed conditions
(not just existence)
- Robust operator: add webhook rejection test (submit invalid CR, verify
webhook denies it)
- HPA: add scale-down verification after scale-up (replace GPU workload
with idle container, verify HPA scales back to minReplicas)
- HPA: fix pod Error status during scale-down by deleting deployment
cleanly before creating idle replacement
- Fix capture function to strip absolute paths from command display
- Fix namespace deletion race with kubectl wait --for=delete
- Tighten HPA verdict to require actual scaling for PASS
- Add early exit for unhealthy pods in HPA wait loop
- Remove readOnlyRootFilesystem from DRA test manifests (blocks CDI
device injection)
- Replace gpu-burn references with CUDA N-Body Simulation
- Sanitize AMI ID in cluster-autoscaling evidence
Evidence regenerated:
- All 8 conformance requirements: PASS
- No leaked local paths or sensitive information
- Consistent format across all evidence documents
Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
capture "InferencePools" kubectl get inferencepools -A
612
638
capture "HTTPRoutes" kubectl get httproutes -A
613
639
614
-
# Verdict
640
+
# Verdict — check both GatewayClass Accepted and Gateway Programmed
615
641
echo"">>"${EVIDENCE_FILE}"
616
-
local gw_count
617
-
gw_count=$(kubectl get gateways -A --no-headers 2>/dev/null | wc -l | tr -d '')
618
-
if [ "${gw_count}"-gt 0 ];then
619
-
echo"**Result: PASS** — kgateway controller running, Gateway API and inference extension CRDs installed, active Gateway programmed with external address.">>"${EVIDENCE_FILE}"
642
+
local gw_accepted gw_programmed
643
+
gw_accepted=$(kubectl get gatewayclass kgateway -o jsonpath='{.status.conditions[?(@.type=="Accepted")].status}'2>/dev/null)
644
+
gw_programmed=$(kubectl get gateway inference-gateway -n kgateway-system -o jsonpath='{.status.conditions[?(@.type=="Programmed")].status}'2>/dev/null)
645
+
if [ "${gw_accepted}"="True" ] && [ "${gw_programmed}"="True" ];then
echo"**Result: PASS** — HPA successfully read gpu_utilization metric and scaled replicas when utilization exceeded target threshold.">>"${EVIDENCE_FILE}"
960
+
if [ "${hpa_scaled}"="true" ] && [ "${hpa_scaled_down}"="true" ];then
961
+
echo"**Result: PASS** — HPA successfully scaled up when GPU utilization exceeded target, and scaled back down when load was removed.">>"${EVIDENCE_FILE}"
962
+
elif [ "${hpa_scaled}"="true" ];then
963
+
echo"**Result: PASS** — HPA successfully read gpu_utilization metric and scaled replicas when utilization exceeded target threshold. Scale-down not verified within timeout.">>"${EVIDENCE_FILE}"
825
964
else
826
965
echo"**Result: FAIL** — HPA did not scale replicas within the timeout. Check GPU workload, DCGM exporter, and prometheus-adapter configuration.">>"${EVIDENCE_FILE}"
0 commit comments