Add secondary node group failover and continuous execution mode#12
Conversation
|
| // the autoscaler runs once and exits (suitable for running as a cron job). | ||
| // | ||
| // Example: 300 for 5 minutes | ||
| int32 execution_interval_seconds = 6; |
There was a problem hiding this comment.
I'm not really convinced we should add a feature like this. What platform are you trying to run this on, which doesn't support cron jobs? Plain UNIX has crontab, Kubernetes has cron jobs, etc.
There was a problem hiding this comment.
We are on Kubernetes. I used to use cron jobs to run it every minute. But I found I want it to run every 10 seconds so that it can bring up workers faster. We have no builds in the night so we let workers to scaler down to 0. Our build volumes fluctuate quite a lot during the day.
Totally okay and understandable if this feature is not accepted and we will just keep a fork. Let me know how you think.
|
|
||
| // Optional: Secondary node group that scales up to fill capacity gaps | ||
| // when the primary has health issues (e.g., spot capacity shortage). | ||
| string secondary_node_group_name = 3; |
There was a problem hiding this comment.
I'm always a big fan of the "zero one infinity" rule: https://en.wikipedia.org/wiki/Zero_one_infinity_rule
Instead of going down this route, can't we turn node_group_name into something like this:
// Names of the managed node groups in the EKS cluster, specified in the order in which attempts to allocate resources should be attempted.
repeated string node_group_names = 2;
Key Changes
1. New Continuous Execution Mode
execution_interval_secondsconfiguration option2. Secondary Node Group Failover Support
secondary_node_group_namefield toEKSManagedNodeGroupConfiguration3. Code Refactoring
main.gointo a newautoscaler.gofileAutoscalerstruct that holds reusable clients (Prometheus, AWS, Kubernetes)NewAutoscaler()constructor initializes all clients onceRunOnce()method executes a single autoscaling cycleFiles Changed
cmd/bb_autoscaler/autoscaler.goAutoscalerstruct. Holds reusable clients (Prometheus, AWS, Kubernetes). Prevents resource leaks from recreating HTTP clients in continuous modecmd/bb_autoscaler/main.goAutoscalerpkg/proto/.../bb_autoscaler.protocmd/bb_autoscaler/BUILD.bazel