Skip to content

feat: Added UnavailableOfferingsTTL exposed as the flag & env variable in the karpenter #8013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ankitjain28may
Copy link

Fixes #N/A

Description
Exposed UnavailableOfferingsTTL as both a flag and an environment variable in Karpenter. The default value of 3 minutes was too high for our environment, which operates with highly volatile workloads.

How was this change tested?

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ankitjain28may ankitjain28may requested a review from a team as a code owner April 24, 2025 10:16
Copy link

netlify bot commented Apr 24, 2025

Deploy Preview for karpenter-docs-prod ready!

Name Link
🔨 Latest commit e5eff64
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/680a10c6f1cb330008fb93fa
😎 Deploy Preview https://deploy-preview-8013--karpenter-docs-prod.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@ankitjain28may ankitjain28may force-pushed the f/unavailableOfferingsTTL branch from 911cce0 to e5eff64 Compare April 24, 2025 10:21
Copy link
Contributor

@DerekFrank DerekFrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some preliminary style+structure feedback, will need to discuss with the provider working group to determine if there are any unintended consequences\product reasons to push back on accepting this PR. Would love if you could attend and share your use case directly!

https://karpenter.sh/docs/contributing/community-meetings/

@@ -53,6 +54,7 @@ func (o *Options) AddFlags(fs *coreoptions.FlagSet) {
fs.Float64Var(&o.VMMemoryOverheadPercent, "vm-memory-overhead-percent", utils.WithDefaultFloat64("VM_MEMORY_OVERHEAD_PERCENT", 0.075), "The VM memory overhead as a percent that will be subtracted from the total memory for all instance types when cached information is unavailable.")
fs.StringVar(&o.InterruptionQueue, "interruption-queue", env.WithDefaultString("INTERRUPTION_QUEUE", ""), "Interruption queue is the name of the SQS queue used for processing interruption events from EC2. Interruption handling is disabled if not specified. Enabling interruption handling may require additional permissions on the controller service account. Additional permissions are outlined in the docs.")
fs.IntVar(&o.ReservedENIs, "reserved-enis", env.WithDefaultInt("RESERVED_ENIS", 0), "Reserved ENIs are not included in the calculations for max-pods or kube-reserved. This is most often used in the VPC CNI custom networking setup https://docs.aws.amazon.com/eks/latest/userguide/cni-custom-network.html.")
fs.IntVar(&o.UnavailableOfferingsTTL, "unavailable-offerings-ttl", env.WithDefaultInt("UNAVAILABLE_OFFERINGS_TTL", 180), "The Unavailable offerings TTL is the time before offerings that were marked as unavailable are removed from the cache and are available for launch again.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason to switch this TTL to seconds over minutes? Either way, the description should include a '... in minutes ...' or an '... in seconds ...' to indicate to the user the correct unit.

@@ -37,7 +39,8 @@ type UnavailableOfferings struct {
SeqNum uint64
}

func NewUnavailableOfferings() *UnavailableOfferings {
func NewUnavailableOfferings(ctx context.Context) *UnavailableOfferings {
UnavailableOfferingsTTL := time.Duration(int64(options.FromContext(ctx).UnavailableOfferingsTTL)) * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'd prefer to pass in only the TTL as an argument than the whole ctx object just to keep the function footprint small. Additionally, why store the option as an int just to cast it to an int64 just to convert it to a time.Duration?

@jmdeal
Copy link
Contributor

jmdeal commented Apr 28, 2025

The default value of 3 minutes was too high for our environment, which operates with highly volatile workloads

Are you able to elaborate on this? Your workloads shouldn't have any impact on what this value should be since they don't affect how long it will take for a spot pool to become available again. Generally speaking, I'm not convinced there is a use-case for tuning this value, but would like to know more about your use-case.

@jmdeal jmdeal added the triage/needs-information Marks that the issue still needs more information to properly triage label May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/needs-information Marks that the issue still needs more information to properly triage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants