Skip to content

Commit adf2583

Browse files
committed
Design proposal for the platform-orchestrator-abstraction
Signed-off-by: Sebastian Sch <[email protected]>
1 parent 8b60d24 commit adf2583

File tree

1 file changed

+290
-0
lines changed

1 file changed

+290
-0
lines changed
Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,290 @@
1+
---
2+
title: Platform and Orchestrator Abstraction
3+
authors:
4+
- sriov-network-operator team
5+
reviewers:
6+
- TBD
7+
creation-date: 21-07-2025
8+
last-updated: 21-07-2025
9+
---
10+
11+
# Platform and Orchestrator Abstraction
12+
13+
## Summary
14+
15+
This design document describes the introduction of platform and orchestrator abstraction layers in the SR-IOV Network Operator. These abstractions separate platform-specific (infrastructure provider) logic from orchestrator-specific (Kubernetes distribution) logic, making it easier to add support for new infrastructure platforms and Kubernetes distributions.
16+
17+
## Motivation
18+
19+
The SR-IOV Network Operator has historically been tightly coupled to specific infrastructure platforms and Kubernetes distributions, particularly OpenShift. As the operator expanded to support different virtualization platforms like OpenStack, AWS, Oracle and various Kubernetes distributions, the need for a clean abstraction layer became apparent.
20+
21+
### Use Cases
22+
23+
1. **Multi-Platform Support**: Enable the operator to run efficiently on different infrastructure platforms (bare metal, OpenStack, AWS,Oracle, etc.) with platform-specific optimizations
24+
2. **Multi-Orchestrator Support**: Support different Kubernetes distributions (vanilla Kubernetes, OpenShift, etc.) with orchestrator-specific behaviors
25+
3. **Extensibility**: Make it easy to add new platforms and orchestrators without modifying core operator logic
26+
4. **Testing**: Enable better unit testing with mockable interfaces for platform and orchestrator specific operations
27+
28+
### Goals
29+
30+
* Create a clean abstraction layer that separates platform-specific logic from orchestrator-specific logic
31+
* Implement support for bare metal and OpenStack platforms
32+
* Implement support for Kubernetes and OpenShift orchestrators
33+
* Provide a plugin architecture that makes it easy to add new platforms and orchestrators
34+
* Maintain backward compatibility with existing functionality
35+
* Enable better testability through interface-based design
36+
37+
### Non-Goals
38+
39+
* Support all possible infrastructure platforms in the initial implementation
40+
* Change existing API structures or user-facing interfaces
41+
42+
## Proposal
43+
44+
### Workflow Description
45+
46+
The operator will use two main abstraction layers:
47+
48+
1. **Platform Interface**: Handles infrastructure-specific operations like device discovery, bridge management, and plugin selection
49+
2. **Orchestrator Interface**: Handles Kubernetes distribution-specific operations like cluster type detection, additional node draining logic, and cluster-specific configurations
50+
51+
The platform is detected at startup based on node metadata and environment variables, while the orchestrator is detected based on cluster characteristics and available APIs.
52+
53+
### API Extensions
54+
55+
#### Platform Interface
56+
57+
```golang
58+
type Interface interface {
59+
Init() error
60+
GetHostHelpers() helper.HostHelpersInterface
61+
62+
DiscoverSriovDevices() ([]sriovnetworkv1.InterfaceExt, error)
63+
DiscoverBridges() (sriovnetworkv1.Bridges, error)
64+
65+
GetPlugins(ns *sriovnetworkv1.SriovNetworkNodeState) (plugin.VendorPlugin, []plugin.VendorPlugin, error)
66+
SystemdGetPlugin(phase string) (plugin.VendorPlugin, error)
67+
}
68+
```
69+
70+
#### Orchestrator Interface
71+
72+
```golang
73+
type Interface interface {
74+
ClusterType() consts.ClusterType
75+
Flavor() consts.ClusterFlavor
76+
BeforeDrainNode(context.Context, *corev1.Node) (bool, error)
77+
AfterCompleteDrainNode(context.Context, *corev1.Node) (bool, error)
78+
}
79+
```
80+
81+
### Implementation Details/Notes/Constraints
82+
83+
#### Platform Implementations
84+
85+
1. **Bare Metal Platform (`pkg/platform/baremetal/`)**:
86+
- Uses standard SR-IOV device discovery
87+
- Supports vendor-specific plugins (Intel, Mellanox)
88+
- Handles bridge discovery and management
89+
- Supports both daemon and systemd configuration modes
90+
91+
2. **OpenStack Platform (`pkg/platform/openstack/`)**:
92+
- Uses virtual device discovery based on OpenStack metadata
93+
- Reads device information from config-drive or metadata service
94+
- Uses virtual plugin for VF configuration
95+
- Does not support systemd mode or bridge management
96+
97+
#### Orchestrator Implementations
98+
99+
1. **Kubernetes Orchestrator (`pkg/orchestrator/kubernetes/`)**:
100+
- Simple implementation with minimal cluster-specific logic
101+
- No special drain handling (returns true for all drain operations)
102+
- Vanilla Kubernetes flavor
103+
104+
2. **OpenShift Orchestrator (`pkg/orchestrator/openshift/`)**:
105+
- Complex drain handling with Machine Config Pool management
106+
- Supports both regular OpenShift and Hypershift flavors
107+
- Manages MCP pausing during node operations
108+
109+
#### Platform Detection
110+
111+
Platform detection occurs in the daemon startup code based on:
112+
- Node provider ID examination
113+
- Environment variables
114+
- Available metadata services
115+
116+
```golang
117+
// Platform detection logic
118+
for key, pType := range vars.PlatformsMap {
119+
if strings.Contains(strings.ToLower(nodeInfo.Spec.ProviderID), strings.ToLower(key)) {
120+
vars.PlatformType = pType
121+
}
122+
}
123+
```
124+
125+
#### Factory Pattern
126+
127+
Both platform and orchestrator use factory patterns for instantiation:
128+
129+
```golang
130+
// Platform factory
131+
func New(hostHelpers helper.HostHelpersInterface) (Interface, error) {
132+
switch vars.PlatformType {
133+
case consts.Baremetal:
134+
return baremetal.New(hostHelpers)
135+
case consts.VirtualOpenStack:
136+
return openstack.New(hostHelpers)
137+
default:
138+
return nil, fmt.Errorf("unknown platform type %s", vars.PlatformType)
139+
}
140+
}
141+
142+
// Orchestrator factory
143+
func New() (Interface, error) {
144+
switch vars.ClusterType {
145+
case consts.ClusterTypeOpenshift:
146+
return openshift.New()
147+
case consts.ClusterTypeKubernetes:
148+
return kubernetes.New()
149+
default:
150+
return nil, fmt.Errorf("unknown orchestration type: %s", vars.ClusterType)
151+
}
152+
}
153+
```
154+
155+
### Upgrade & Downgrade considerations
156+
157+
The abstraction layer is designed to be backward compatible. Existing configurations and behaviors are preserved, with the abstraction layer providing the same functionality through the new interface structure.
158+
159+
No user-facing API changes are required, and existing SR-IOV configurations will continue to work without modification.
160+
161+
### Test Plan
162+
163+
The implementation includes comprehensive unit tests for both platform and orchestrator abstractions:
164+
165+
1. **Platform Tests**: Test device discovery, plugin loading, and platform-specific behaviors for both bare metal and OpenStack platforms
166+
2. **Orchestrator Tests**: Test cluster type detection, drain handling, and orchestrator-specific behaviors for both Kubernetes and OpenShift
167+
3. **Integration Tests**: Ensure the abstractions work correctly with the existing daemon and operator logic
168+
4. **Mock Interfaces**: Generated mock interfaces enable comprehensive unit testing of components that depend on platform and orchestrator abstractions
169+
170+
## Benefits for Adding New Platforms
171+
172+
### 1. Clear Separation of Concerns
173+
174+
The abstraction separates infrastructure-specific logic (platform) from Kubernetes distribution-specific logic (orchestrator), making it easier to reason about and implement support for new platforms.
175+
176+
### 2. Standardized Interface
177+
178+
New platforms only need to implement the well-defined `Platform Interface`, which includes:
179+
- Device discovery methods
180+
- Plugin selection logic
181+
- Platform-specific initialization
182+
183+
### 3. Minimal Core Changes
184+
185+
Adding a new platform requires:
186+
1. Creating a new package under `pkg/platform/<platform-name>/`
187+
2. Implementing the `Platform Interface`
188+
3. Adding the platform to the factory function
189+
4. Adding platform detection logic
190+
191+
No changes to core operator logic, existing platforms, or user-facing APIs are required.
192+
193+
### 4. Plugin Architecture
194+
195+
The platform interface includes plugin selection methods, allowing each platform to:
196+
- Choose appropriate vendor plugins
197+
- Use platform-specific plugins (like the virtual plugin for OpenStack)
198+
- Support different configuration modes (daemon vs systemd)
199+
200+
### 5. Independent Development and Testing
201+
202+
Each platform implementation is self-contained, enabling:
203+
- Independent development of platform support
204+
- Platform-specific unit tests
205+
- Mock-based testing of platform interactions
206+
- Easier debugging and maintenance
207+
208+
### Example: Adding a New Platform (AWS Implementation)
209+
210+
The AWS platform implementation demonstrates how to add support for a new cloud platform:
211+
212+
1. Create `pkg/platform/aws/aws.go`:
213+
```golang
214+
package aws
215+
216+
import (
217+
sriovnetworkv1 "github.com/k8snetworkplumbingwg/sriov-network-operator/api/v1"
218+
"github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/helper"
219+
plugin "github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/plugins"
220+
virtualplugin "github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/plugins/virtual"
221+
)
222+
223+
type Aws struct {
224+
hostHelpers helper.HostHelpersInterface
225+
loadedDevicesInfo sriovnetworkv1.InterfaceExts
226+
}
227+
228+
func New(hostHelpers helper.HostHelpersInterface) (*Aws, error) {
229+
return &Aws{hostHelpers: hostHelpers}, nil
230+
}
231+
232+
func (a *Aws) GetPlugins(_ *sriovnetworkv1.SriovNetworkNodeState) (plugin.VendorPlugin, []plugin.VendorPlugin, error) {
233+
virtual, err := virtualplugin.NewVirtualPlugin(a.hostHelpers)
234+
return virtual, []plugin.VendorPlugin{}, err
235+
}
236+
237+
func (a *Aws) DiscoverSriovDevices() ([]sriovnetworkv1.InterfaceExt, error) {
238+
// AWS-specific device discovery using EC2 metadata service
239+
// Fetches MAC addresses and subnet IDs from metadata service
240+
// Maps devices to AWS network configuration
241+
}
242+
243+
func (a *Aws) SystemdGetPlugin(_ string) (plugin.VendorPlugin, error) {
244+
return nil, fmt.Errorf("aws platform not supported in systemd")
245+
}
246+
247+
// Implement other interface methods...
248+
```
249+
250+
2. Add to the platform factory in `pkg/platform/platform.go`:
251+
```golang
252+
func New(hostHelpers helper.HostHelpersInterface) (Interface, error) {
253+
switch vars.PlatformType {
254+
case consts.Baremetal:
255+
return baremetal.New(hostHelpers)
256+
case consts.VirtualOpenStack:
257+
return openstack.New(hostHelpers)
258+
case consts.VirtualAWS: // New addition
259+
return aws.New(hostHelpers)
260+
default:
261+
return nil, fmt.Errorf("unknown platform type %s", vars.PlatformType)
262+
}
263+
}
264+
```
265+
266+
3. Add platform detection logic in `pkg/vars/vars.go`:
267+
```golang
268+
var PlatformsMap = map[string]consts.PlatformTypes{
269+
"openstack": consts.VirtualOpenStack,
270+
"aws": consts.VirtualAWS, // New addition
271+
}
272+
```
273+
274+
4. Add platform constant in `pkg/consts/platforms.go`:
275+
```golang
276+
const (
277+
Baremetal PlatformTypes = iota
278+
VirtualOpenStack
279+
VirtualAWS // New addition
280+
)
281+
```
282+
283+
Key features of the AWS implementation:
284+
- **Metadata Service Integration**: Uses AWS EC2 metadata service to discover network configuration
285+
- **Virtual Plugin Usage**: Leverages the existing virtual plugin for SR-IOV VF management
286+
- **Subnet ID Mapping**: Maps network interfaces to AWS subnet IDs for proper network filtering
287+
- **Comprehensive Testing**: Includes extensive unit tests with mocked HTTP calls
288+
- **Error Handling**: Robust error handling for metadata service failures
289+
290+
This approach makes the SR-IOV Network Operator truly platform-agnostic while maintaining clean, maintainable code.

0 commit comments

Comments
 (0)