9
9
- [ Proposal] ( #proposal )
10
10
- [ Design Details] ( #design-details )
11
11
- [ Device Class API] ( #device-class-api )
12
- - [ Resource Slice API (Alternative to Device Class API)] ( #resource-slice-api-alternative-to-device-class-api )
13
12
- [ Resource Claim API] ( #resource-claim-api )
14
13
- [ Pod API] ( #pod-api )
15
14
- [ Scheduling for Extended Resource backed by DRA] ( #scheduling-for-extended-resource-backed-by-dra )
@@ -76,11 +75,11 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
76
75
77
76
Extended resource provides a simple, concise approach to describe resource
78
77
capacity, and resource consumption. In constrast, Dynamic Resource
79
- Allocation (DRA) provides a more expressive, flexible, powerful approach, yet
78
+ Allocation (DRA) provides a more expressive, flexible approach, yet
80
79
more complicated, and harder to use.
81
80
82
81
This KEP provides a solution to enable cluster administrators to advertise the
83
- dynamic resources (in ` ResourceSlice ` ) as extended resource in ` DeviceClass ` ,
82
+ dynamic resources (in ` ResourceSlice ` ) as extended resource via ` DeviceClass ` .
84
83
and enables the application developers, and operators to continue using
85
84
extended resource to request for such resources.
86
85
@@ -211,8 +210,8 @@ non-goals of this KEP.
211
210
212
211
### Goals
213
212
214
- * Introduce the ability for DRA to advertise extended resources, and for the
215
- scheduler to consider them for allocation.
213
+ * Introduce the ability to advertise DRA resources as extended resources, and
214
+ for the scheduler to consider them for allocation.
216
215
217
216
* Enable application operators to use the existing extended resource request in
218
217
pod spec to request for DRA resources.
@@ -224,16 +223,15 @@ non-goals of this KEP.
224
223
* Device plugin API must not change. The existing device plugin drivers must
225
224
continue working without change.
226
225
227
- * DRA driver API change must be minimal, if there is any. Core kubernetes
228
- (kube-scheduler, kubelet) is preferred over DRA driver for any change needed
229
- to support the feature.
226
+ * DRA driver API must not change. Core kubernetes (kube-scheduler, kubelet) is
227
+ preferred over DRA driver for any change needed to support the feature.
230
228
231
229
### Non-Goals
232
230
233
231
* Minimize kubelet or kube-scheduler changes. The feature requires necessary
234
232
changes in both scheduling and actuation.
235
233
236
- * Keep advertising pod .status.Capacity for extended resources backed by DRA.
234
+ * Keep advertising ` node .status.Capacity` for extended resources backed by DRA.
237
235
It is used for extended resources backed by device plugin only.
238
236
239
237
# # Proposal
@@ -242,51 +240,48 @@ The basic idea is the following:
242
240
243
241
1. Introduce `extended resource backed by DRA`. It is like the current extended
244
242
resource backed by device plugin, in that, it has a string name, and a
245
- discrete countable quantity. Its capacity is provided through dynamic
246
- resource `ResourceSlice`, its consumption is specified through pod's extended
243
+ discrete countable quantity. Its capacity is provided through DRA
244
+ ` ResourceSlice` , its consumption is specified through pod's extended
247
245
resource request.
248
246
1. Introduce a field `ExtendedResourceName` to `DeviceClass` to allow cluster
249
247
administrators to advertise certain class of devices as extended resource.
250
- 1. Alternatively, introduce a field `ExtendedResourceName` to `ResourceSlice`
251
- and `Device` to allow cluster administrators to configure DRA device driver
252
- to advertise certain devices as extended resource.
253
248
1. Introduce a special `ResourceClaim` object to keep track of device allocations
254
249
for all extended resource requests backed by DRA for a pod. kube-scheduler
255
250
uses DRA scheduling algorithm to fit pod's extended resource request to a
256
- node that advertises the extended resource in DRA `ResorceSlice` or traditional
257
- extended resources. When using DRA devices, it creates a special `ResourceClaim`
258
- for the pod with the allocation result recording which devices were picked. More
259
- details on this special `ResourceClaim` follow below. When using extended
260
- resources advertised for a node by device plugin, the existing resource
261
- tracking reserves them.
251
+ node that advertises the extended resource in DRA `ResorceSlice` or extended
252
+ resources backed by device plugin . When using DRA devices, it creates a
253
+ special `ResourceClaim` for the pod with the allocation result recording
254
+ which devices were picked. More details on this special `ResourceClaim`
255
+ follow below. When using extended resources advertised for a node by device
256
+ plugin, the existing resource tracking reserves them.
262
257
1. kubelet asks DRA driver to prepare devices in the special `ResourceClaim`,
263
- and pass the devices to containers with the extended resource requests.
258
+ and pass the devices to containers in a pod with the extended resource requests.
264
259
265
260
Some quick clarifications around the basic concepts : extended resource backed by
266
261
device plugin, extended resource backed by DRA, and dynamic resource.
267
262
268
263
* extended resource backed by device plugin uses pod's
269
264
spec.containers[].resources.requests to request for resources, it consumes the capacity
270
- from node's status.capacity. It is of type : string, int64
265
+ from node's status.capacity. It is of type ( string, int64)
271
266
* dynamic resource uses `ResourceClaim` to request for resources, and
272
267
` ResourceSlice` to provide resource capacity. A pod asks for resources through
273
268
resource claim requests in pod's spec.resources.claims. Dynamic resource type
274
269
is described in resource slice, simply speaking, it is a list of devices, with
275
270
each device being described as structured parameters.
276
271
* extended resource backend by DRA is a combination of the two above. It uses pods'
277
272
spec.containers[].resources.requests to request for resources, and uses
278
- `ResourceSlice` to provide resource capacity. Hence, it is of type : string,
279
- int64 on the consumption side, and list of devices with a common
273
+ ` ResourceSlice` to provide resource capacity. Hence, it is of type ( string, int64)
274
+ on the consumption side, and list of devices with a common
280
275
` ExtendedResourceName` on the capacity side.
281
276
282
277
With these additions in place, the DRA devices can be consumed by extended resource
283
278
requests, or by DRA resouce claims. The scheduler has everything it needs to support
284
279
the dynamic allocation of devices to requests made through extended resource and
285
280
resource claims. No static partition of resources between extended resources and
286
281
resource claims is needed. The kubelet and DRA driver has everything they need
287
- to admit and pass the allocated devices to the pod to run.
282
+ to admit a pod and pass the allocated devices to the containers in the pod to run.
288
283
289
- Note the following cluster setup requirement and constraint :
284
+ Note the following cluster setup configuration and constraint :
290
285
291
286
* One node in cluster has a extended resource backed by DRA, and another node in the
292
287
cluster has the same named extended resource backend by device plugin.
@@ -339,53 +334,6 @@ type DeviceClassSpec struct {
339
334
}
340
335
```
341
336
342
- ### Resource Slice API (Alternative to Device Class API)
343
- The exact set of proposed API changes on Resource Slice can be seen below:
344
- ``` go
345
- // ResourceSliceSpec contains the information published by the driver in one ResourceSlice.
346
- type ResourceSliceSpec struct {
347
- ...
348
-
349
- // The extended resource name for all the devices in the ResourceSlice
350
- // advertised as
351
- //
352
- // +optional
353
- ExtendedResourceName *string
354
- }
355
-
356
- // Device represents one individual hardware instance that can be selected based
357
- // on its attributes. Besides the name, exactly one field must be set.
358
- // +k8s:deepcopy-gen=true
359
- type Device struct {
360
- // Name is unique identifier among all devices managed by
361
- // the driver in the pool. It must be a DNS label.
362
- //
363
- // +required
364
- Name string ` json:"name"`
365
- ...
366
-
367
- // ExtendedResourceName is the extended resource name
368
- // the device is advertised as. It must be a DNS label.
369
- // It overrides the ExtendedResourceName at ResourceSlice if both are
370
- // present.
371
- //
372
- // +optional
373
- ExtendedResourceName *string
374
- }
375
- ```
376
-
377
- The devices can be advertised with an extended resource name. The extended
378
- resource name can be specified on each individual device. Different
379
- devices can be advertised as different extended resource name, or not
380
- advertised as extended resource at all.
381
-
382
- Alternatively, the extended resource name can be specified at the
383
- ` ResourceSlice ` level, then all the devices in the resource slice are
384
- advertised as the given extended resource name. If a device has a different
385
- extended resource name than that given in the ` ResoureSlice ` , the device's
386
- extended resource name is used for that device.
387
-
388
-
389
337
### Resource Claim API
390
338
391
339
A special resource claim object is created to keep track of device allocations for
@@ -415,8 +363,8 @@ garbage collector.
415
363
preBind phase. The in-memory one in the assumed cache is created earlier
416
364
during Reserve phase.
417
365
* It is * deleted*
418
- * together with the owning pod's deletion.
419
- * by the scheduler dynamic resource plugin during unReserve phase.
366
+ * either together with the owning pod's deletion.
367
+ * or by the scheduler dynamic resource plugin during unReserve phase.
420
368
* It is * read* by the kubelet DRA device driver to prepare the devices listed
421
369
therein when preparing to run the pod.
422
370
@@ -446,7 +394,8 @@ then the name of the `DeviceRequest` is "c0-e2".
446
394
447
395
A new field ` extendedResourceClaimStatus ` is added to Pod's status to track
448
396
the special resouceclaim object created for the extended resource requests
449
- in the pod.
397
+ in the pod. This is needed for kublet to pass the devices allocated by driver
398
+ to the containers in the pod.
450
399
451
400
``` go
452
401
// PodExtendedResourceClaimStatus is stored in the PodStatus for each extended
@@ -506,11 +455,11 @@ status:
506
455
- names :
507
456
- container-name
508
457
- foo.domain/bar
509
- - c1 -e2
458
+ - c0 -e2
510
459
resourceClaimName : ccc-gpu-57999b9c4c-vpq68-gpu-8s27z
511
460
` ` `
512
- where ` deviceRequest` name is "c1 -e2", and container-name is the 2nd container
513
- in the pod, foo.domain/bar is the 3rd extended resource in the container.
461
+ where ` deviceRequest` name is "c0 -e2", and container-name is the first container
462
+ in the pod, foo.domain/bar is the 3rd extended resource in the container's requests .
514
463
515
464
Note the validations for extendedResourceClaimStatus are different from the
516
465
validations for resourceClaimStatuses.
0 commit comments