-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Description
Extend the SriovResourceFilter CRD to allow administrators to add custom attributes to devices that match specific filtering criteria, with support for configurable attribute prefixes. This enhancement enables administrators to tag and categorize SR-IOV Virtual Functions with additional metadata beyond the hardware-based filtering currently available, while providing namespace isolation through custom attribute prefixes.
Motivation
Currently, SriovResourceFilter allows filtering devices based on hardware characteristics:
- Vendor ID (
vendors) - Device ID (
devices) - PCI addresses (
pciAddresses) - Physical Function names (
pfNames) - Root devices (
rootDevices) - NUMA nodes (
numaNodes) - Driver bindings (
drivers)
While this filtering system effectively selects devices, it doesn't provide a way to add additional metadata or attributes to the matched devices. This limits the ability to:
- Tag devices with custom labels for application-specific selection
- Add capability indicators (e.g., "high-bandwidth", "low-latency", "production", "testing")
- Specify performance characteristics (e.g., link speed, supported protocols)
- Indicate network topology (e.g., rack location, switch connectivity)
- Mark devices for specific workload types (e.g., "ml-training", "storage", "telco")
- Add business logic metadata (e.g., cost tier, SLA level)
- Use organization-specific attribute namespaces to avoid conflicts in shared or multi-tenant environments
- Integrate with external systems using custom attribute prefixes matching existing tooling
Current Implementation
The current SriovResourceFilter API (pkg/api/sriovdra/v1alpha1/api.go) only supports filtering:
type ResourceFilter struct {
Vendors []string `json:"vendors,omitempty"`
Devices []string `json:"devices,omitempty"`
PciAddresses []string `json:"pciAddresses,omitempty"`
PfNames []string `json:"pfNames,omitempty"`
RootDevices []string `json:"rootDevices,omitempty"`
NumaNodes []string `json:"numaNodes,omitempty"`
Drivers []string `json:"drivers,omitempty"`
}Devices matching these filters are exposed with a resourceName attribute but no additional custom metadata.
Proposed Solution
Add an additionalAttributes field to the Config struct: a map of custom key-value attributes where the key can optionally include a prefix.
Key Format:
- With prefix:
"prefix/attributeName"(e.g.,"acme.corp/performanceTier") - Without prefix:
"attributeName"(e.g.,"performanceTier") - defaults tosriovnetwork.openshift.ioprefix
This allows administrators to:
- Tag devices with custom metadata beyond hardware characteristics
- Use custom attribute prefixes for organizational namespacing by including the prefix in the key
- Mix multiple prefixes in a single configuration
- Enable fine-grained device selection using CEL expressions with custom attributes
1. Update API Definition
Modify pkg/api/sriovdra/v1alpha1/api.go:
type Config struct {
ResourceName string `json:"resourceName,omitempty"`
ResourceFilters []ResourceFilter `json:"resourceFilters,omitempty"`
AdditionalAttributes map[string]string `json:"additionalAttributes,omitempty"` // NEW
}
type ResourceFilter struct {
Vendors []string `json:"vendors,omitempty"`
Devices []string `json:"devices,omitempty"`
PciAddresses []string `json:"pciAddresses,omitempty"`
PfNames []string `json:"pfNames,omitempty"`
RootDevices []string `json:"rootDevices,omitempty"`
NumaNodes []string `json:"numaNodes,omitempty"`
Drivers []string `json:"drivers,omitempty"`
}Notes:
- The
additionalAttributesfield is at theConfiglevel, meaning all devices matching any of theresourceFiltersfor a givenresourceNamewill receive the same additional attributes. - Attribute keys can include an optional prefix using the format
"prefix/attributeName"(e.g.,"acme.corp/performanceTier") - If no prefix is specified in the key (no
/character), the default prefixsriovnetwork.openshift.iois used - Multiple prefixes can be mixed within a single
additionalAttributesmap
2. Update Device State Management
Modify the device discovery and filtering logic in pkg/devicestate/discovery.go and pkg/devicestate/state.go to:
- Parse attribute keys to extract prefix and attribute name (split on first
/) - Use default prefix
sriovnetwork.openshift.ioif no prefix is found in the key - Apply additional attributes to devices matching filter criteria under their respective prefixes
- Store these attributes alongside existing device attributes
- Expose attributes through the DRA API for CEL-based selection
3. Update Controller Logic
Update pkg/controller/resourcefiltercontroller.go to:
- Extract additional attributes from
SriovResourceFilter - Parse attribute keys to identify prefixed vs non-prefixed attributes (split on
/) - Validate attribute key format (prefix must be valid DNS subdomain if present)
- Validate attribute names (ensure they don't conflict with reserved names)
- Pass parsed attributes to device state manager
- Handle attribute updates when SriovResourceFilter is modified
Example Usage
Example 1: Network Performance Tiers (Using Default Prefix)
apiVersion: sriovnetwork.openshift.io/v1alpha1
kind: SriovResourceFilter
metadata:
name: network-performance-tiers
namespace: dra-sriov-driver
spec:
nodeSelector:
kubernetes.io/hostname: worker-node-1
configs:
- resourceName: "premium-network"
resourceFilters:
- vendors: ["15b3"] # Mellanox/NVIDIA
devices: ["101b"] # ConnectX-6
pfNames: ["eth0"]
# Keys without "/" use default prefix "sriovnetwork.openshift.io"
additionalAttributes:
performanceTier: "premium"
maxBandwidth: "100Gbps"
rdmaSupport: "true"
networkSegment: "production"
slaLevel: "gold"
- resourceName: "standard-network"
resourceFilters:
- vendors: ["8086"] # Intel
pfNames: ["eth1"]
# Keys without "/" use default prefix "sriovnetwork.openshift.io"
additionalAttributes:
performanceTier: "standard"
maxBandwidth: "25Gbps"
rdmaSupport: "false"
networkSegment: "development"
slaLevel: "silver"Devices will have attributes under the default prefix: device.attributes["sriovnetwork.openshift.io"].performanceTier
Example 2: Workload-Specific Tagging (Using Custom Prefix)
apiVersion: sriovnetwork.openshift.io/v1alpha1
kind: SriovResourceFilter
metadata:
name: workload-specific-vfs
namespace: dra-sriov-driver
spec:
configs:
- resourceName: "ml-training-vfs"
resourceFilters:
- pfNames: ["mlx5_0"]
numaNodes: ["0"]
# Keys with "prefix/" format use the specified custom prefix
additionalAttributes:
"acme.corp/workloadType": "ml-training"
"acme.corp/topology": "numa-local"
"acme.corp/optimizedFor": "throughput"
"acme.corp/costCenter": "ai-research"
- resourceName: "storage-vfs"
resourceFilters:
- pfNames: ["eth2"]
drivers: ["vfio-pci"]
# Different custom prefix in the keys
additionalAttributes:
"storage.acme.corp/workloadType": "storage"
"storage.acme.corp/protocol": "nvme-tcp"
"storage.acme.corp/optimizedFor": "latency"
"storage.acme.corp/storagePool": "fast-tier"Devices will have attributes under their respective custom prefixes:
- ML training VFs:
device.attributes["acme.corp"].workloadType - Storage VFs:
device.attributes["storage.acme.corp"].workloadType
Example 3: Mixing Multiple Prefixes in One Configuration
You can mix multiple prefixes (and unprefixed keys) in a single additionalAttributes map:
apiVersion: sriovnetwork.openshift.io/v1alpha1
kind: SriovResourceFilter
metadata:
name: mixed-prefix-example
namespace: dra-sriov-driver
spec:
configs:
- resourceName: "enterprise-network"
resourceFilters:
- vendors: ["15b3"]
pfNames: ["eth0"]
# Mix default prefix, custom prefixes, and organizational prefixes
additionalAttributes:
# These use default prefix (sriovnetwork.openshift.io)
performanceTier: "high"
certified: "true"
# These use custom organizational prefix
"acme.corp/department": "engineering"
"acme.corp/costCenter": "infra-12345"
# These use billing-specific prefix
"billing.acme.corp/chargeCode": "PROJ-789"
"billing.acme.corp/tier": "premium"Devices will have attributes under multiple prefixes:
device.attributes["sriovnetwork.openshift.io"].performanceTierdevice.attributes["acme.corp"].departmentdevice.attributes["billing.acme.corp"].chargeCode
Example 4: Multiple Filters with Shared Attributes
When you have multiple filter criteria but want all matching devices to share the same attributes:
apiVersion: sriovnetwork.openshift.io/v1alpha1
kind: SriovResourceFilter
metadata:
name: high-performance-pool
namespace: dra-sriov-driver
spec:
configs:
- resourceName: "high-performance"
resourceFilters:
- vendors: ["15b3"] # Mellanox/NVIDIA devices
pfNames: ["eth0", "eth1"]
- vendors: ["8086"] # OR Intel devices
devices: ["159b", "1592"] # with specific device IDs
numaNodes: ["0"]
# All devices matching ANY of the above filters get these attributes:
additionalAttributes:
performanceTier: "high"
certified: "true"
environment: "production"In this example, devices from either Mellanox on eth0/eth1 OR Intel with specific device IDs on NUMA 0 all receive the same additionalAttributes.
Example 5: Using Attributes for NUMA-Aware Device Selection
Custom attributes can be used to enable advanced selection criteria, such as NUMA alignment for multi-device workloads:
apiVersion: sriovnetwork.openshift.io/v1alpha1
kind: SriovResourceFilter
metadata:
name: numa-aware-vfs
namespace: dra-sriov-driver
spec:
configs:
- resourceName: "numa0-highperf"
resourceFilters:
- pfNames: ["eth0", "eth1"]
numaNodes: ["0"]
# Add NUMA-specific attributes for device alignment
additionalAttributes:
numaOptimized: "true"
"topology.acme.corp/numaNode": "0"
"topology.acme.corp/affinityGroup": "compute-0"
- resourceName: "numa1-highperf"
resourceFilters:
- pfNames: ["eth2", "eth3"]
numaNodes: ["1"]
additionalAttributes:
numaOptimized: "true"
"topology.acme.corp/numaNode": "1"
"topology.acme.corp/affinityGroup": "compute-1"Workload requesting devices aligned by custom affinity group attribute:
---
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
namespace: ml-workload
name: numa-aligned-resources
spec:
spec:
devices:
requests:
- name: vf
exactly:
deviceClassName: sriovnetwork.openshift.io
count: 1
selectors:
- cel:
expression: |
device.attributes["sriovnetwork.openshift.io"].numaOptimized == "true"
- name: gpu
exactly:
deviceClassName: gpu.example.com
count: 1
# Use the custom additional attribute to align VF and GPU
constraints:
- matchAttribute: "topology.acme.corp/affinityGroup"
requests: ["vf", "gpu"]
---
apiVersion: v1
kind: Pod
metadata:
namespace: ml-workload
name: ml-training-pod
spec:
containers:
- name: trainer
image: ml-framework:latest
resources:
claims:
- name: devices
resourceClaims:
- name: devices
resourceClaimTemplateName: numa-aligned-resourcesIn this example:
- VFs are tagged with custom topology attributes including
topology.acme.corp/affinityGroup - The GPU devices would also need to be tagged with matching
topology.acme.corp/affinityGroupattributes - The
matchAttributeconstraint uses the custom attribute to ensure VF and GPU have the same affinity group value - This enables flexible, organization-specific device alignment beyond standard Kubernetes attributes
- Devices in affinity group
compute-0orcompute-1will be co-located based on the custom tagging scheme
Example 6: Selecting Devices with Additional Attributes
Once devices have additional attributes, workloads can select them using CEL expressions.
Selecting with default prefix:
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: premium-network-vf
spec:
spec:
devices:
requests:
- name: vf
exactly:
deviceClassName: sriovnetwork.openshift.io
selectors:
- cel:
expression: |
device.attributes["sriovnetwork.openshift.io"].performanceTier == "premium" &&
device.attributes["sriovnetwork.openshift.io"].rdmaSupport == "true" &&
device.attributes["sriovnetwork.openshift.io"].slaLevel == "gold"Selecting with custom prefix:
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: ml-optimized-vf
spec:
spec:
devices:
requests:
- name: vf
exactly:
deviceClassName: sriovnetwork.openshift.io
selectors:
- cel:
expression: |
device.attributes["acme.corp"].workloadType == "ml-training" &&
device.attributes["acme.corp"].topology == "numa-local"Combining multiple prefixes in selection:
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: certified-storage-vf
spec:
spec:
devices:
requests:
- name: vf
exactly:
deviceClassName: sriovnetwork.openshift.io
selectors:
- cel:
expression: |
device.attributes["storage.acme.corp"].workloadType == "storage" &&
device.attributes["storage.acme.corp"].optimizedFor == "latency" &&
device.attributes["sriovnetwork.openshift.io"].resourceName == "storage-vfs"Implementation Details
Files to modify:
-
pkg/api/sriovdra/v1alpha1/api.go- Add
AdditionalAttributes map[string]stringtoConfigstruct - Update API documentation and comments
- Add
-
pkg/controller/resourcefiltercontroller.go- Extract additional attributes from
SriovResourceFilter - Parse attribute keys to extract prefix (split on first
/) - Validate attribute key format (prefix part must be valid DNS subdomain if present)
- Validate attribute names (ensure they don't conflict with reserved names)
- Pass parsed attributes to device state manager
- Extract additional attributes from
-
pkg/devicestate/state.go- Modify device filtering logic to apply additional attributes with custom or default prefix
- Implement logic to parse attribute keys:
- If key contains
/, use part before/as prefix and part after as attribute name - If key has no
/, use default prefix (sriovnetwork.openshift.io) and full key as attribute name
- If key contains
- Store attributes under the appropriate prefix in device metadata
- Ensure attributes are properly namespaced and serialized for DRA
-
pkg/devicestate/discovery.go- Update device structure to support custom attribute prefixes
- Merge additional attributes under their respective prefixes with hardware-detected attributes
-
pkg/consts/consts.go- Ensure default attribute prefix constant is defined and accessible
-
CRD Definition
- Update
deployments/helm/dra-driver-sriov/templates/sriovnetwork.openshift.io_sriovresourcefilters.yaml - Add OpenAPI schema validation for additionalAttributes
- Update
-
Documentation
- Update README.md with examples
- Add new demo in
demo/custom-attributes/ - Document attribute naming conventions and best practices
Validation Requirements
-
Attribute Key Validation:
- Parse keys to extract prefix and attribute name (split on first
/) - If key contains
/:- Prefix part (before
/) must follow DNS subdomain format (e.g.,example.com,storage.acme.corp) - Prefix maximum length: 253 characters
- Prefix must consist of lowercase alphanumeric characters, hyphens, or dots
- Prefix must start and end with an alphanumeric character
- Validate that prefix doesn't conflict with Kubernetes reserved prefixes (
k8s.io,kubernetes.io, etc.)
- Prefix part (before
- If key has no
/:- Entire key is the attribute name
- Default prefix
sriovnetwork.openshift.iowill be used
- Attribute name part (after
/or full key if no/):- Follow Kubernetes label naming conventions
- Alphanumeric with hyphens/underscores
- Maximum length of 63 characters
- Reserved attribute names (e.g.,
resourceName,pciAddress, etc.) should be rejected
- Parse keys to extract prefix and attribute name (split on first
-
Attribute Value Validation:
- Values should be valid strings
- Consider length limits (e.g., max 256 characters per value)
-
Conflict Handling:
- When a device matches multiple
Configentries (differentresourceNamevalues), it receives attributes from all matching configs - If multiple configs use different prefixes (in their keys), attributes are stored under their respective prefixes (no conflict)
- If multiple configs use the same prefix and assign the same attribute name with different values:
- Later configs override earlier ones, OR
- Reject conflicting attribute assignments with clear error messages
- Within a single config, all
resourceFiltersare ORed together, and devices matching ANY filter get the config'sadditionalAttributes
- When a device matches multiple
Benefits
- Flexible Device Categorization: Administrators can create logical groupings beyond hardware characteristics
- Business Logic Integration: Attach cost, SLA, or organizational metadata to devices
- Workload-Specific Allocation: Applications can request devices with specific capabilities
- Network Topology Awareness: Tag devices based on network location, switch connectivity, etc.
- Multi-Tenancy Support: Label devices for specific tenants or projects
- Policy Enforcement: Enable policy-based device allocation using custom attributes
- Migration Path: Easier migration from other device plugins with custom labeling
- Namespace Isolation: Custom attribute prefixes allow different organizations or teams to manage their own attribute namespaces without conflicts
- Integration with External Systems: Use organization-specific prefixes (e.g.,
company.com) for better integration with existing tooling and policies - Attribute Organization: Different prefixes can separate concerns (e.g.,
network.acme.corpfor network attributes,billing.acme.corpfor cost attributes) - Advanced Resource Alignment: Enable sophisticated multi-device allocation strategies using custom attributes with DRA constraints (e.g., NUMA-aware co-location of VFs and GPUs)
- Performance Optimization: Tag devices with topology and locality information to support performance-critical workload placement
Use Cases
- Performance-Based Selection: Tag devices by performance tier (premium/standard/basic)
- Compliance & Security: Mark devices certified for specific compliance requirements
- Cost Optimization: Tag devices with cost information for workload placement decisions
- Testing vs. Production: Separate device pools by environment
- Geographic/Topology Awareness: Label devices by datacenter, rack, or network segment
- Workload Affinity: Pre-tag devices optimized for specific workload types (AI/ML, storage, telco)
- NUMA-Aware Allocation: Tag devices with NUMA topology information for use with resource alignment constraints in multi-device workloads
- Multi-Organization Deployments: Use custom prefixes per organization in shared clusters (e.g.,
org-a.shared-cluster.com,org-b.shared-cluster.com) - Department-Specific Attributes: Different departments can use their own prefixes for attributes (e.g.,
engineering.acme.corp,finance.acme.corp) - Service-Level Separation: Use prefixes to separate service concerns (e.g.,
network.infra,storage.infra,compute.infra) - External Integration: Custom prefixes matching external asset management systems or CMDBs
- Device Affinity Groups: Create logical affinity groups for devices that should be co-located or share specific characteristics
Testing Requirements
- Unit tests for attribute key parsing (split on
/) - Unit tests for attribute name validation
- Unit tests for attribute value validation
- Unit tests for prefix validation (DNS subdomain format when key contains
/) - Unit tests for attribute merging and conflict resolution
- Unit tests for default prefix behavior when key has no
/ - Unit tests for custom prefix behavior when key contains
prefix/attributeName - Unit tests for mixing multiple prefixes in a single
additionalAttributesmap - Integration tests with multiple SriovResourceFilters using different prefixes in keys
- E2E tests with CEL-based device selection using both default prefix (unprefixed keys) and custom prefixes (prefixed keys)
- E2E tests combining attributes from multiple prefixes in a single CEL expression
- Validation that reserved attribute names are rejected
- Validation that invalid prefixes (in keys) are rejected
Acceptance Criteria
-
ConfigAPI includesadditionalAttributesfield - Attribute keys are properly parsed to extract prefix and attribute name (split on
/) - When key has no
/, default prefixsriovnetwork.openshift.iois used - When key contains
/, the part before/is used as custom prefix - Attributes are properly applied to matching devices under the correct prefix
- Attributes are exposed through DRA device attributes API under their respective prefixes
- CEL expressions can select devices using additional attributes with default prefix (unprefixed keys)
- CEL expressions can select devices using additional attributes with custom prefix (prefixed keys)
- CEL expressions can combine attributes from multiple prefixes in a single expression
- Multiple prefixes can be mixed in a single
additionalAttributesmap - Attribute name validation prevents conflicts with reserved names
- Attribute value validation ensures valid strings
- Prefix validation ensures DNS subdomain format when key contains
/ - Invalid or malformed prefixes (in keys) are rejected with clear error messages
- CRD schema is updated with proper OpenAPI validation for additionalAttributes
- Documentation includes examples with both default prefix (unprefixed keys) and custom prefixes (prefixed keys)
- Documentation includes examples of mixing multiple prefixes in one configuration
- Documentation includes best practices for choosing attribute prefixes
- Unit and integration tests validate functionality with both default and custom prefixes
- Helm chart templates support the new field
Backward Compatibility
This is a non-breaking change:
additionalAttributesis optional (omitempty)- Existing SriovResourceFilters without this field continue to work unchanged
- When
additionalAttributesis not specified, no additional attributes are added (existing behavior) - When attribute keys have no
/, the default prefixsriovnetwork.openshift.iois used (consistent with current behavior) - No migration required for existing deployments
Best Practices for Attribute Keys and Prefixes
When defining attribute keys in additionalAttributes, consider the following guidelines:
- Use Domain Names You Control: For custom prefixes, use domains you own (e.g.,
acme.com/attributeName,network.acme.com/bandwidth) - Follow DNS Conventions: Use reverse-DNS notation for better organization (e.g.,
com.acme.network/tier) - Hierarchical Organization: Use dots in prefixes to create logical hierarchies (e.g.,
network.production.acme.com/sla,network.development.acme.com/tier) - Avoid Conflicts: Don't use prefixes that might conflict with Kubernetes or other systems (avoid
k8s.io,kubernetes.io, etc.) - Keep It Simple: Use meaningful, short prefixes that are easy to remember and type
- Document Your Prefixes: Maintain documentation of which prefixes are used for what purposes
- Consistency: Use consistent naming across your organization
- Default for Standard Attributes: Use unprefixed keys (default prefix) for standard, widely-understood attributes
- Prefix for Organization-Specific: Use custom prefixes for organization or domain-specific attributes
Examples of Good Attribute Keys:
performanceTier- Uses default prefix (sriovnetwork.openshift.io)acme.corp/department- Simple organization prefixnetwork.acme.corp/bandwidth- Service-specific prefixtelco.edge.acme.corp/latency- Hierarchical, service and location-specificbilling.acme.com/chargeCode- Domain-specific prefix
Examples to Avoid:
kubernetes.io/tier- Reserved for Kubernetesk8s.io/zone- Reserved for Kuberneteslocalhost/attr- Not meaningful in distributed systemstest/value- Too generic, likely to cause conflicts
Future Enhancements
Potential future extensions to this feature:
- Support for structured attributes (nested maps/lists)
- Attribute templates with variable substitution
- Automatic attribute inference from device properties
- Attribute-based device grouping and constraints
- Integration with Kubernetes device plugin feature gates
- Attribute prefix wildcards for CEL expressions (e.g.,
*.acme.corp) - Prefix-based RBAC for attribute management
References
- Kubernetes CEL in Device Selection: https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-selection
- DRA Device Attributes: https://kubernetes.io/docs/reference/config-api/resource.v1/
- Label Naming Conventions: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set