forked from NVIDIA/k8s-nim-operator
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathapps.nvidia.com_nimcaches.yaml
More file actions
607 lines (606 loc) · 29 KB
/
apps.nvidia.com_nimcaches.yaml
File metadata and controls
607 lines (606 loc) · 29 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.16.2
name: nimcaches.apps.nvidia.com
spec:
group: apps.nvidia.com
names:
kind: NIMCache
listKind: NIMCacheList
plural: nimcaches
singular: nimcache
scope: Namespaced
versions:
- additionalPrinterColumns:
- jsonPath: .status.state
name: Status
type: string
- jsonPath: .status.pvc
name: PVC
type: string
- format: date-time
jsonPath: .metadata.creationTimestamp
name: Age
type: date
name: v1alpha1
schema:
openAPIV3Schema:
description: NIMCache is the Schema for the nimcaches API.
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: NIMCacheSpec defines the desired state of NIMCache.
properties:
certConfig:
description: |-
CertConfig is the name of the ConfigMap containing the custom certificates.
for secure communication.
Deprecated: use `Proxy` instead to configure custom certificates for using proxy.
properties:
mountPath:
description: MountPath is the path where the certificates should
be mounted in the container.
type: string
name:
description: Name of the ConfigMap containing the certificate
data.
type: string
required:
- mountPath
- name
type: object
env:
description: Env are the additional custom environment variabes for
the caching job
items:
description: EnvVar represents an environment variable present in
a Container.
properties:
name:
description: Name of the environment variable. Must be a C_IDENTIFIER.
type: string
value:
description: |-
Variable references $(VAR_NAME) are expanded
using the previously defined environment variables in the container and
any service environment variables. If a variable cannot be resolved,
the reference in the input string will be unchanged. Double $$ are reduced
to a single $, which allows for escaping the $(VAR_NAME) syntax: i.e.
"$$(VAR_NAME)" will produce the string literal "$(VAR_NAME)".
Escaped references will never be expanded, regardless of whether the variable
exists or not.
Defaults to "".
type: string
valueFrom:
description: Source for the environment variable's value. Cannot
be used if value is not empty.
properties:
configMapKeyRef:
description: Selects a key of a ConfigMap.
properties:
key:
description: The key to select.
type: string
name:
default: ""
description: |-
Name of the referent.
This field is effectively required, but due to backwards compatibility is
allowed to be empty. Instances of this type with an empty value here are
almost certainly wrong.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
type: string
optional:
description: Specify whether the ConfigMap or its key
must be defined
type: boolean
required:
- key
type: object
x-kubernetes-map-type: atomic
fieldRef:
description: |-
Selects a field of the pod: supports metadata.name, metadata.namespace, `metadata.labels['<KEY>']`, `metadata.annotations['<KEY>']`,
spec.nodeName, spec.serviceAccountName, status.hostIP, status.podIP, status.podIPs.
properties:
apiVersion:
description: Version of the schema the FieldPath is
written in terms of, defaults to "v1".
type: string
fieldPath:
description: Path of the field to select in the specified
API version.
type: string
required:
- fieldPath
type: object
x-kubernetes-map-type: atomic
resourceFieldRef:
description: |-
Selects a resource of the container: only resources limits and requests
(limits.cpu, limits.memory, limits.ephemeral-storage, requests.cpu, requests.memory and requests.ephemeral-storage) are currently supported.
properties:
containerName:
description: 'Container name: required for volumes,
optional for env vars'
type: string
divisor:
anyOf:
- type: integer
- type: string
description: Specifies the output format of the exposed
resources, defaults to "1"
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
resource:
description: 'Required: resource to select'
type: string
required:
- resource
type: object
x-kubernetes-map-type: atomic
secretKeyRef:
description: Selects a key of a secret in the pod's namespace
properties:
key:
description: The key of the secret to select from. Must
be a valid secret key.
type: string
name:
default: ""
description: |-
Name of the referent.
This field is effectively required, but due to backwards compatibility is
allowed to be empty. Instances of this type with an empty value here are
almost certainly wrong.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
type: string
optional:
description: Specify whether the Secret or its key must
be defined
type: boolean
required:
- key
type: object
x-kubernetes-map-type: atomic
type: object
required:
- name
type: object
type: array
groupID:
description: GroupID is the group ID for the caching job
format: int64
type: integer
nodeSelector:
additionalProperties:
type: string
description: NodeSelector is the node selector labels to schedule
the caching job.
type: object
proxy:
description: ProxySpec defines the proxy configuration for NIMService.
properties:
certConfigMap:
type: string
httpProxy:
type: string
httpsProxy:
type: string
noProxy:
type: string
type: object
resources:
description: Resources defines the minimum resources required for
the caching job to run(cpu, memory, gpu).
properties:
cpu:
anyOf:
- type: integer
- type: string
description: CPU indicates the minimum number of CPUs to use while
caching NIM
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
memory:
anyOf:
- type: integer
- type: string
description: |-
Memory indicates the minimum amount of memory to use while caching NIM
Valid values are numbers followed by one of the suffixes Ki, Mi, Gi, or Ti (e.g. "4Gi", "4096Mi").
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
type: object
runtimeClassName:
description: RuntimeClassName is the runtimeclass for the caching
job
type: string
source:
description: Source is the NIM model source to cache
properties:
dataStore:
description: DataStore represents models stored in NVIDIA NeMo
DataStore service
properties:
authSecret:
description: AuthSecret is the name of the secret containing
the "HF_TOKEN" token
minLength: 1
type: string
datasetName:
description: DatasetName is the name of the dataset
type: string
endpoint:
description: Endpoint is the HuggingFace endpoint from NeMo
DataStore
pattern: ^https?://.*/v1/hf/?$
type: string
modelName:
description: ModelName is the name of the model
type: string
modelPuller:
description: ModelPuller is the containerized huggingface-cli
image to pull the data
minLength: 1
type: string
namespace:
default: default
description: Namespace is the namespace within NeMo DataStore
type: string
pullSecret:
description: PullSecret is the name of the image pull secret
for the modelPuller image
minLength: 1
type: string
revision:
description: Revision is the revision of the object to be
cached. This is either a commit hash, branch name or tag.
minLength: 1
type: string
required:
- authSecret
- endpoint
- modelPuller
- namespace
- pullSecret
type: object
x-kubernetes-validations:
- message: Exactly one of modelName or datasetName must be defined
rule: '(has(self.modelName) ? 1 : 0) + (has(self.datasetName)
? 1 : 0) == 1'
hf:
description: HuggingFaceHub represents models stored in HuggingFace
Hub
properties:
authSecret:
description: AuthSecret is the name of the secret containing
the "HF_TOKEN" token
minLength: 1
type: string
datasetName:
description: DatasetName is the name of the dataset
type: string
endpoint:
description: Endpoint is the HuggingFace endpoint
pattern: ^https?://.*$
type: string
modelName:
description: ModelName is the name of the model
type: string
modelPuller:
description: ModelPuller is the containerized huggingface-cli
image to pull the data
minLength: 1
type: string
namespace:
description: Namespace is the namespace within the HuggingFace
Hub
minLength: 1
type: string
pullSecret:
description: PullSecret is the name of the image pull secret
for the modelPuller image
minLength: 1
type: string
revision:
description: Revision is the revision of the object to be
cached. This is either a commit hash, branch name or tag.
minLength: 1
type: string
required:
- authSecret
- endpoint
- modelPuller
- namespace
- pullSecret
type: object
x-kubernetes-validations:
- message: Exactly one of modelName or datasetName must be defined
rule: '(has(self.modelName) ? 1 : 0) + (has(self.datasetName)
? 1 : 0) == 1'
ngc:
description: NGCSource represents models stored in NGC
properties:
authSecret:
description: The name of an existing pull secret containing
the NGC_API_KEY
type: string
model:
description: Model spec for caching
properties:
buildable:
description: Buildable indicates generic model profiles
that can be optimized with an NVIDIA engine for any
GPUs
type: boolean
engine:
description: Engine is the backend engine (tensorrt_llm,
vllm)
type: string
gpus:
description: GPU is the spec for matching GPUs for caching
optimized models
items:
description: GPUSpec is the spec required to cache models
for selected gpu type.
properties:
ids:
description: IDs are the device-ids for a specific
GPU SKU
items:
type: string
type: array
product:
description: Product is the GPU product string (h100,
a100, l40s)
type: string
type: object
type: array
lora:
description: Lora indicates a finetuned model with LoRa
adapters
type: boolean
precision:
description: Precision is the precision for model quantization
type: string
profiles:
description: Profiles are the specific model profiles
to cache. When these are provided, rest of the model
parameters for profile selection are ignored
items:
type: string
type: array
qosProfile:
description: QoSProfile is the supported QoS profile types
for the models (throughput, latency)
type: string
tensorParallelism:
description: TensorParallelism is the minimum GPUs required
for the model computations
type: string
type: object
modelEndpoint:
description: ModelEndpoint is the endpoint for the model to
be cached for Universal NIM
type: string
modelPuller:
description: ModelPuller is the container image that can pull
the model
type: string
x-kubernetes-validations:
- message: modelPuller is an immutable field. Please create
a new NIMCache resource instead when you want to change
this container.
rule: self == oldSelf
pullSecret:
description: PullSecret to pull the model puller image
type: string
required:
- authSecret
- modelPuller
type: object
x-kubernetes-validations:
- message: Only one of 'model' or 'modelEndpoint' can be specified
rule: '!(has(self.model) && has(self.modelEndpoint))'
type: object
x-kubernetes-validations:
- message: Exactly one of ngc, dataStore, or hf must be defined
rule: '(has(self.ngc) ? 1 : 0) + (has(self.dataStore) ? 1 : 0) +
(has(self.hf) ? 1 : 0) == 1'
storage:
description: Storage is the target storage for caching NIM model
properties:
hostPath:
description: |-
HostPath is the host path volume for caching NIM
Deprecated: use PVC instead.
type: string
pvc:
description: PersistentVolumeClaim is the pvc volume used for
caching NIM
properties:
annotations:
additionalProperties:
type: string
description: Annotations for the PVC
type: object
create:
description: |-
Create specifies whether to create a new PersistentVolumeClaim (PVC).
If set to false, an existing PVC must be referenced via the `Name` field.
type: boolean
name:
description: Name of the PVC to use. Required if `Create`
is false (i.e., using an existing PVC).
type: string
size:
description: Size of the NIM cache in Gi, used during PVC
creation
type: string
storageClass:
description: |-
StorageClass to be used for PVC creation. Leave it as empty if the PVC is already created or
a default storage class is set in the cluster.
type: string
subPath:
description: SubPath is the path inside the PVC that should
be mounted
type: string
volumeAccessMode:
description: VolumeAccessMode is the volume access mode of
the PVC
type: string
type: object
type: object
tolerations:
description: Tolerations for running the job to cache the NIM model
items:
description: |-
The pod this Toleration is attached to tolerates any taint that matches
the triple <key,value,effect> using the matching operator <operator>.
properties:
effect:
description: |-
Effect indicates the taint effect to match. Empty means match all taint effects.
When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
type: string
key:
description: |-
Key is the taint key that the toleration applies to. Empty means match all taint keys.
If the key is empty, operator must be Exists; this combination means to match all values and all keys.
type: string
operator:
description: |-
Operator represents a key's relationship to the value.
Valid operators are Exists and Equal. Defaults to Equal.
Exists is equivalent to wildcard for value, so that a pod can
tolerate all taints of a particular category.
type: string
tolerationSeconds:
description: |-
TolerationSeconds represents the period of time the toleration (which must be
of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default,
it is not set, which means tolerate the taint forever (do not evict). Zero and
negative values will be treated as 0 (evict immediately) by the system.
format: int64
type: integer
value:
description: |-
Value is the taint value the toleration matches to.
If the operator is Exists, the value should be empty, otherwise just a regular string.
type: string
type: object
type: array
userID:
description: UserID is the user ID for the caching job
format: int64
type: integer
required:
- source
- storage
type: object
status:
description: NIMCacheStatus defines the observed state of NIMCache.
properties:
conditions:
items:
description: Condition contains details for one aspect of the current
state of this API Resource.
properties:
lastTransitionTime:
description: |-
lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
format: date-time
type: string
message:
description: |-
message is a human readable message indicating details about the transition.
This may be an empty string.
maxLength: 32768
type: string
observedGeneration:
description: |-
observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
format: int64
minimum: 0
type: integer
reason:
description: |-
reason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
maxLength: 1024
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
status:
description: status of the condition, one of True, False, Unknown.
enum:
- "True"
- "False"
- Unknown
type: string
type:
description: type of condition in CamelCase or in foo.example.com/CamelCase.
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
required:
- lastTransitionTime
- message
- reason
- status
- type
type: object
type: array
profiles:
items:
description: NIMProfile defines the profiles that were cached.
properties:
config:
additionalProperties:
type: string
type: object
model:
type: string
name:
type: string
release:
type: string
type: object
type: array
pvc:
type: string
state:
type: string
type: object
type: object
served: true
storage: true
subresources:
status: {}