Description
kubernetes-sigs/karpenter
is a sub-project under sig-autoscaling. Given that we are an autoscaling project, performance is an extremely high priority for us. We've been wanting to have a comprehensive scale testing suite for the core scheduling and consolidation logic that Karpenter uses for a while now and we are finally ready to prioritize this on our side.
The primary question that we have at this point is that we want to make sure that the underlying infrastructure for compute that we are using to execute the scale tests is large enough that we are getting accurate results and aren't getting throttled by CPU or hyper-constrained by memory. We tried running with the GH actions containers that we are given in our repository and tried running with a kind cluster, but this caused a bunch of throttling on our side and led to slow-downs in our scale testing, affecting our results.
I'd love to hear recommendations from the community on what we should be doing here -- ideally, we can get something like an EKS or GCP cluster and run the Karpenter installation directly on it with as much memory or CPU that it needs to avoid the throttling. We'd run with our kwok cloudprovider version of Karpenter so all node scale-ups and scale-downs would be "fake" so that we wouldn't be utilizing compute from actual instances.
TLDR: We need a managed Kubernetes cluster environment where we can deploy Karpenter to launch fake nodes for our scale testing without getting throttled