Releases: pmady/keda-gpu-scaler
Releases · pmady/keda-gpu-scaler
v0.4.0: Advanced GPU Metrics & Health Probes
Release v0.4.0 - June 8, 2026
🚀 Major Features
HTTP Health Probes
- Added
/healthzand/readyzendpoints for Kubernetes liveness/readiness probes - Configurable via
--probe-portflag (0 to disable) - Helm integration with conditional probe configuration
- Production ready for Kubernetes deployments
- Thanks to @ibobgunardi for this excellent contribution!
PCIe and NVLink Bandwidth Metrics
- New advanced metrics for GPU bandwidth monitoring and autoscaling
- PCIe metrics:
pcie_tx_kbps,pcie_rx_kbps- CPU↔GPU data transfer throughput - NVLink metrics:
nvlink_tx_mbps,nvlink_rx_mbps- GPU↔GPU communication bandwidth - New profile:
distributed-trainingoptimized for NVLink systems with sane defaults - Hardware-aware - Graceful handling of non-NVLink hardware (metrics return 0)
- Comprehensive docs with practical examples and hardware-specific guidance
- Thanks to @venkata22a for this outstanding implementation!
📚 Documentation & Configuration
Complete metricType Reference
- Full table of all 10 supported metric types with units and descriptions
- Advanced metrics section explaining PCIe vs NVLink use cases
- Practical examples with realistic bandwidth targets
- Hardware guidance for NVLink vs PCIe system optimization
Prometheus Metrics Enhancement
- New gauges: PCIe/NVLink throughput with
directionlabels - Device count metric for cluster monitoring
- Complete coverage of all GPU telemetry
🔧 Infrastructure & Quality
Testing & Reliability
- 100% test coverage for new features
- Dedicated test functions for PCIe/NVLink extraction
- Distributed training profile validation
- All tests pass ✅ | Lint clean ✅
Community Contributions
This release features significant contributions from two community members:
- Bobi Gunardi (CodeLabs Indonesia) - Health probes implementation
- Venkata Edara - PCIe/NVLink metrics with comprehensive documentation
⚠️ Important Notes
Hardware Compatibility
- NVLink metrics return 0 on hardware without NVLink (T4, A10, etc.) - this is normal behavior
- PCIe metrics work on all GPU systems and are recommended for non-NVLink hardware
- Automatic detection - scaler adapts to available hardware capabilities
Configuration Changes
- Health probes are disabled by default (set
--probe-port=8081to enable) - New metric types expand scaling possibilities for enterprise workloads
- Backward compatible - all existing configurations continue to work
🎯 Use Cases Enabled
Data-Parallel Training
- NVLink bandwidth monitoring for distributed training workloads
- Communication bottleneck detection when GPU utilization appears low
- Automatic scaling based on inter-GPU communication saturation
Enterprise GPU Monitoring
- PCIe bandwidth tracking for CPU↔GPU data transfer bottlenecks
- Hardware utilization insights beyond basic GPU utilization
- Cost optimization through precise autoscaling decisions
📦 Installation
# Helm install with health probes enabled
helm install keda-gpu-scaler ./deploy/helm/keda-gpu-scaler \
--set probes.enabled=true \
--set probes.port=8081
# Example distributed training ScaledObject
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: distributed-training
spec:
scaleTargetRef:
name: training-app
triggers:
- type: external
metadata:
scalerAddress: "keda-gpu-scaler.keda.svc.cluster.local:6000"
profile: "distributed-training"
targetValue: "800"
activationThreshold: "100"🏛️ Industry Recognition & Standards
CNCF TAG Infrastructure — AI Technical Community Group
This project is actively contributing to the CNCF TAG Infrastructure — AI Technical Community Group to advance GPU-aware autoscaling capabilities in the cloud native ecosystem.
Community Engagement:
- CNCF TOC Submission: GPU-Aware Autoscaling in Cloud Native AI Infrastructure
- Whitepaper in Progress: "GPU-Aware Autoscaling in Cloud Native AI Infrastructure" - Technical deep-dive addressing GPU scaling blind spots in Kubernetes
- TAG Infrastructure AI TCG: Active participant in shaping AI/ML workload orchestration standards
Industry Impact:
- Addressing critical GPU telemetry gaps in Kubernetes
- Building production-ready solutions for enterprise AI workloads
- Contributing to cloud native GPU scheduling standards
- Establishing GPU autoscaling as a first-class HPC concern
🙏 Acknowledgments
Special thanks to our community contributors for making this release possible:
- @ibobgunardi - Health probes implementation with comprehensive test coverage
- @venkata22a - Advanced GPU metrics with exceptional documentation
📊 Stats
- Files changed: 10
- Lines added: 310+
- Test coverage: 100% for new features
- Contributors: 2 community members
- New metrics: 4 advanced bandwidth metrics
- New profiles: 1 (distributed-training)
Previous release: v0.3.0
Full changelog: CHANGELOG.md
v0.3.0
What changed
- 11 e2e tests covering gRPC health, all profiles, scale-out/in, streaming, error paths, aggregation
- Helm chart: securityContext (privileged for NVML), optional hostPath mounts, updateStrategy, terminationGracePeriod
- CI: added e2e test job, docker build gated on e2e pass
- gofmt -s applied to all Go files
- Added Apache 2.0 LICENSE file
Install
helm install gpu-scaler deploy/helm/keda-gpu-scaler --namespace gpu-scaler --create-namespacev0.2.0
What's Changed
- test: Add GPU collector package tests — MetricsCollector interface coverage, boundary conditions, interface compliance
- deps: Dependabot updates (grpc 1.81.1, zap 1.28.0, golangci-lint-action v9, actions/checkout v6, actions/setup-go v6)
Full Changelog: v0.1.0...v0.2.0
v0.1.0
What is this?
A KEDA External Scaler that reads NVIDIA GPU metrics directly from NVML and autoscales Kubernetes GPU workloads. No Prometheus, no dcgm-exporter, no PromQL.
Highlights
- gRPC server implementing KEDA ExternalScaler protocol
- Direct NVML metrics: utilization, memory, temperature, power
- Pre-built scaling profiles: vllm-inference, triton-inference, training, batch
- Multi-GPU aggregation (max, min, avg, sum)
- Scale-to-zero support
- DaemonSet manifests + Helm chart
Install
\n