Skip to content

Commit 83e4fdc

Browse files
dezmoduehakman
authored andcommitted
Add ability to pull user defined images in the warm pool
1 parent 9184618 commit 83e4fdc

File tree

14 files changed

+105
-2
lines changed

14 files changed

+105
-2
lines changed

docs/instance_groups.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -319,18 +319,32 @@ spec:
319319
You can also specify defaults for all instance groups of type Node or APIServer by setting the `warmPool` field in the cluster spec.
320320
If warm pools are enabled at the cluster spec level, you can disable them at the instance group level by setting `maxSize: 0`.
321321

322+
### Additional container images
323+
In some cases it can be convenient to download large container images during the warming phase to reduce the startup time of workloads after a node has been requested and joined the cluster. Additional images can be specified via the `additionalImages` field.
324+
325+
```yaml
326+
spec:
327+
warmPool:
328+
additionalImages:
329+
- nvcr.io/nvidia/tritonserver:24.10-py3
330+
- nvcr.io/nvidia/tritonserver:25.11-vllm-python-py3
331+
```
332+
322333
### Lifecycle hook
323334

324335
By default AWS does not guarantee that the kOps configuration will run to completion. Nor that the instance will timely shut down after completion if the instance is allowed to run that long. In order to guarantee this, a lifecycle hook is needed.
325336

326337
**You have to ensure your metadata API is protected if you enable this. If not, any Pod in the cluster will be able to complete the lifecycle hook with the `ABANDONED` result, preventing any instance from ever joining the cluster.**
327338

328-
The following config will enable the lifecycle hook as well as protect the metadata API from abuse:
339+
By default the lifecycle hook will timeout after 600s. A custom timeout can be set via the `lifecycleHookTimeout` field in the `warmPool` spec. This might be needed if larger additional container images are pulled during the warmup phase so these tasks can finish.
340+
341+
The following config will enable the lifecycle hook, set the timeout to 900s as well as protect the metadata API from abuse:
329342

330343
```yaml
331344
spec:
332345
warmPool:
333346
enableLifecycleHook: true
347+
lifecycleHookTimeout: 900
334348
instanceMetadata:
335349
httpPutResponseHopLimit: 1
336350
httpTokens: required

k8s/crds/kops.k8s.io_clusters.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6541,11 +6541,22 @@ spec:
65416541
description: WarmPool defines the default warm pool settings for instance
65426542
groups (AWS only).
65436543
properties:
6544+
additionalImages:
6545+
description: AdditionalImages is a list of additional container
6546+
images to pull into the warm pool instances.
6547+
items:
6548+
type: string
6549+
type: array
65446550
enableLifecycleHook:
65456551
description: |-
65466552
EnableLifecycleHook determines if an ASG lifecycle hook will be added ensuring that nodeup runs to completion.
65476553
Note that the metadata API must be protected from arbitrary Pods when this is enabled.
65486554
type: boolean
6555+
lifecycleHookTimeout:
6556+
description: LifecycleHookTimeout is the timeout for the ASG lifecycle
6557+
hook in seconds.
6558+
format: int32
6559+
type: integer
65496560
maxSize:
65506561
description: |-
65516562
MaxSize is the maximum size of the warm pool. The desired size of the instance group

k8s/crds/kops.k8s.io_instancegroups.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1189,11 +1189,22 @@ spec:
11891189
description: WarmPool configures an ASG warm pool for the instance
11901190
group
11911191
properties:
1192+
additionalImages:
1193+
description: AdditionalImages is a list of additional container
1194+
images to pull into the warm pool instances.
1195+
items:
1196+
type: string
1197+
type: array
11921198
enableLifecycleHook:
11931199
description: |-
11941200
EnableLifecycleHook determines if an ASG lifecycle hook will be added ensuring that nodeup runs to completion.
11951201
Note that the metadata API must be protected from arbitrary Pods when this is enabled.
11961202
type: boolean
1203+
lifecycleHookTimeout:
1204+
description: LifecycleHookTimeout is the timeout for the ASG lifecycle
1205+
hook in seconds.
1206+
format: int32
1207+
type: integer
11971208
maxSize:
11981209
description: |-
11991210
MaxSize is the maximum size of the warm pool. The desired size of the instance group

pkg/apis/kops/cluster.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1106,6 +1106,10 @@ type WarmPoolSpec struct {
11061106
// EnableLifecyleHook determines if an ASG lifecycle hook will be added ensuring that nodeup runs to completion.
11071107
// Note that the metadata API must be protected from arbitrary Pods when this is enabled.
11081108
EnableLifecycleHook bool `json:"enableLifecycleHook,omitempty"`
1109+
// LifecycleHookTimeout is the timeout for the ASG lifecycle hook in seconds.
1110+
LifecycleHookTimeout *int32 `json:"lifecycleHookTimeout,omitempty"`
1111+
// AdditionalImages is a list of additional container images to pull into the warm pool instances.
1112+
AdditionalImages []string `json:"additionalImages,omitempty"`
11091113
}
11101114

11111115
func (in *WarmPoolSpec) IsEnabled() bool {

pkg/apis/kops/v1alpha2/cluster.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -898,4 +898,8 @@ type WarmPoolSpec struct {
898898
// EnableLifecycleHook determines if an ASG lifecycle hook will be added ensuring that nodeup runs to completion.
899899
// Note that the metadata API must be protected from arbitrary Pods when this is enabled.
900900
EnableLifecycleHook bool `json:"enableLifecycleHook,omitempty"`
901+
// LifecycleHookTimeout is the timeout for the ASG lifecycle hook in seconds.
902+
LifecycleHookTimeout *int32 `json:"lifecycleHookTimeout,omitempty"`
903+
// AdditionalImages is a list of additional container images to pull into the warm pool instances.
904+
AdditionalImages []string `json:"additionalImages,omitempty"`
901905
}

pkg/apis/kops/v1alpha2/zz_generated.conversion.go

Lines changed: 4 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/apis/kops/v1alpha2/zz_generated.deepcopy.go

Lines changed: 10 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/apis/kops/v1alpha3/cluster.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -862,4 +862,8 @@ type WarmPoolSpec struct {
862862
// EnableLifecycleHook determines if an ASG lifecycle hook will be added ensuring that nodeup runs to completion.
863863
// Note that the metadata API must be protected from arbitrary Pods when this is enabled.
864864
EnableLifecycleHook bool `json:"enableLifecycleHook,omitempty"`
865+
// LifecycleHookTimeout is the timeout for the ASG lifecycle hook in seconds.
866+
LifecycleHookTimeout *int32 `json:"lifecycleHookTimeout,omitempty"`
867+
// AdditionalImages is a list of additional container images to pull into the warm pool instances.
868+
AdditionalImages []string `json:"additionalImages,omitempty"`
865869
}

pkg/apis/kops/v1alpha3/zz_generated.conversion.go

Lines changed: 4 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/apis/kops/v1alpha3/zz_generated.deepcopy.go

Lines changed: 10 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)