Skip to content

DRA: Derived Attributes#6081

Open
gauravkghildiyal wants to merge 2 commits into
kubernetes:masterfrom
gauravkghildiyal:dra-derived-attributes
Open

DRA: Derived Attributes#6081
gauravkghildiyal wants to merge 2 commits into
kubernetes:masterfrom
gauravkghildiyal:dra-derived-attributes

Conversation

@gauravkghildiyal
Copy link
Copy Markdown
Member

@gauravkghildiyal gauravkghildiyal commented May 15, 2026

  • One-line PR description: Add KEP-6080 introducing derivedAttributes for flexible DRA device co-allocation.
  • Other comments: This KEP introduces scoped CEL expressions (derivedAttributes) to DeviceRequest to synthesize virtual grouping keys on the fly.

/sig scheduling
/wg device-management

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. wg/device-management Categorizes an issue or PR as relevant to WG Device Management. labels May 15, 2026
@github-project-automation github-project-automation Bot moved this to Needs Triage in SIG Scheduling May 15, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gauravkghildiyal
Once this PR has been reviewed and has the lgtm label, please assign sanposhiho for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label May 15, 2026
@k8s-ci-robot k8s-ci-robot requested review from dom4ha and macsko May 15, 2026 00:23
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 15, 2026
@gauravkghildiyal gauravkghildiyal mentioned this pull request May 15, 2026
4 tasks
@gauravkghildiyal gauravkghildiyal force-pushed the dra-derived-attributes branch from 217ed13 to 7d68c53 Compare May 15, 2026 00:33
@gauravkghildiyal gauravkghildiyal force-pushed the dra-derived-attributes branch from 7d68c53 to df713b1 Compare May 15, 2026 00:35
- kube-apiserver
- kube-controller-manager
- kube-scheduler
- kubelet
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt kubelet needs to know about these

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense removed.

(I'm not too sure about kube-controller-manager either, but maybe its needed there because it clones the ResourceClaimTemplate to ResourceClaims)

## Proposal

We propose extending `.devices.requests` with `derivedAttributes` and
`.devices.constraints` with `matchDerivedAttribute`. A derived attribute defines
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not change the constraints. These should just look like any other attribute to the entire constraints section. The only API addition needed should be the derivedAttributes field.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Even better.

Updated.

@johnbelamaric
Copy link
Copy Markdown
Member

@thameem-abbas I think this might be useful for your use cases.

@johnbelamaric
Copy link
Copy Markdown
Member

cc @pohly

@pravk03
Copy link
Copy Markdown
Contributor

pravk03 commented May 15, 2026

/cc

@k8s-ci-robot k8s-ci-robot requested a review from pravk03 May 15, 2026 03:21
Copy link
Copy Markdown
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay to me.

Shadow API review: we need to sort out some details, see comments.

- **Risk**: A new CEL expression needs to be evaluated for each candidate device
(in addition to any CEL expressions evaluated for device selectors).
- **Mitigation**: This evaluation adds a constant time increment to each device
evaluation. The remainder of the scheduling cycle—including constraint
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's constant per device, but each device might be considered more than once. Caching derived attributes mitigates that.

type DerivedAttribute struct {
// Name is the identifier for this derived attribute, used in constraints.
//
// +required
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document the valid string format here.

// for each candidate device.
// +featureGate=DRADerivedAttributes
// +optional
DerivedAttributes []DerivedAttribute `json:"derivedAttributes,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size limit?


// Expression is a CEL expression evaluated against each candidate device.
// The expression must evaluate to a primitive scalar (string, integer,
// boolean) or a list of scalars ([]string, []int64, []bool) to act as a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have semver. Are those valid results?

- **Environment**: The CEL environment for `Expression` is exactly the same
as that for `CELDeviceSelector`, containing a single variable `device`.
- **Return Type**: The CEL expression must evaluate to a scalar (string,
integer, boolean) or a list of scalars (`[]string`, `[]int64`, `[]bool`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "must evaluate to" cannot be validated during admission time (depends on unknown input types).

How are evaluation errors at runtime handled?

// of the attributes which were prefixed by "dra.example.com").
// - capacity (map[string]object): the device's capacities, grouped by prefix.
//
// +required
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size limit?

- "@gauravkghildiyal"
owning-sig: sig-scheduling
participating-sigs:
- sig-network
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can review from sig network

object. The resulting value acts as a virtual attribute that can be referenced
directly by existing `.devices.constraints[].matchAttribute` fields.

### Core Manifest Example: GPU & NIC NUMA Alignment
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great example , cc: @ffromani @pravk03

syntactic boundary between static (FQDN) and derived (bare) attributes,
eliminating shadowing risks but preventing direct attribute overrides.

Recommended: **Option 1 (Allowing FQDNs)**. Most manifest authors will naturally
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that Option 2 will break existing users of the well defined pcieRoot standard attribute, if a new driver that does not implement the standard attribute want to use it I think it should have to override it with the FQDN

Copy link
Copy Markdown
Contributor

@yongruilin yongruilin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declarative Validation plans to support validation for new API fields without handwritten validation counterpart directly.
But for complicated validation(e.g. CEL compilable), might still need to reply on handwritten.
/cc @jpbetz

Comment on lines +316 to +317
// +optional
DerivedAttributes []DerivedAttribute `json:"derivedAttributes,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +optional
DerivedAttributes []DerivedAttribute `json:"derivedAttributes,omitempty"`
// +optional
// +k8s:optional
// +k8s:maxItems=<int>
DerivedAttributes []DerivedAttribute `json:"derivedAttributes,omitempty"`

@k8s-ci-robot k8s-ci-robot requested a review from jpbetz May 15, 2026 21:27
@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation May 18, 2026
@kad
Copy link
Copy Markdown
Member

kad commented May 18, 2026

/cc

@k8s-ci-robot k8s-ci-robot requested a review from kad May 18, 2026 14:07
@liggitt liggitt added this to @liggitt May 20, 2026
@liggitt liggitt moved this to Todo in @liggitt May 20, 2026
@liggitt liggitt moved this to Assigned in API Reviews May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Projects

Status: Todo
Status: Assigned
Status: 👀 In review
Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

9 participants