Skip to content

Commit d12edf3

Browse files
committed
Design proposal for the platform-orchestrator-abstraction
Signed-off-by: Sebastian Sch <[email protected]>
1 parent 8b60d24 commit d12edf3

File tree

1 file changed

+284
-0
lines changed

1 file changed

+284
-0
lines changed
Lines changed: 284 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,284 @@
1+
---
2+
title: Platform and Orchestrator Abstraction
3+
authors:
4+
- sriov-network-operator team
5+
reviewers:
6+
- TBD
7+
creation-date: 21-07-2025
8+
last-updated: 21-07-2025
9+
---
10+
11+
# Platform and Orchestrator Abstraction
12+
13+
## Summary
14+
15+
This design document describes the introduction of platform and orchestrator abstraction layers in the SR-IOV Network Operator. These abstractions separate platform-specific (infrastructure provider) logic from orchestrator-specific (Kubernetes distribution) logic, making it easier to add support for new infrastructure platforms and Kubernetes distributions.
16+
17+
## Motivation
18+
19+
The SR-IOV Network Operator has historically been tightly coupled to specific infrastructure platforms and Kubernetes distributions, particularly OpenShift. As the operator expanded to support different virtualization platforms like OpenStack, AWS, Oracle and various Kubernetes distributions, the need for a clean abstraction layer became apparent.
20+
21+
### Use Cases
22+
23+
1. **Multi-Platform Support**: Enable the operator to run efficiently on different infrastructure platforms (bare metal, OpenStack, AWS, Oracle, etc.) with platform-specific optimizations
24+
2. **Multi-Orchestrator Support**: Support different Kubernetes distributions (vanilla Kubernetes, OpenShift, etc.) with orchestrator-specific behaviors
25+
26+
### Goals
27+
28+
* Create a clean abstraction layer that separates platform-specific logic from orchestrator-specific logic
29+
* Re-implement existing support for bare metal and OpenStack platforms using the new abstraction layer
30+
* Re-implement existing support for Kubernetes and OpenShift orchestrators using the new abstraction layer
31+
* Provide a plugin architecture that makes it easy to add new platforms and orchestrators
32+
* Maintain backward compatibility with existing functionality
33+
* Enable better testability through interface-based design
34+
35+
### Non-Goals
36+
37+
* Support all possible infrastructure platforms in the initial implementation
38+
* Change existing SR-IOV CRD API structures or user-facing configuration interfaces
39+
40+
## Proposal
41+
42+
### Workflow Description
43+
44+
1. **Daemon Startup**: The SR-IOV daemon detects the platform type by examining the node's provider ID and environment variables
45+
2. **Platform Initialization**: The appropriate platform implementation is instantiated using the factory pattern and initialized
46+
3. **Orchestrator Detection**: The orchestrator type is detected based on cluster APIs and characteristics
47+
4. **Device Discovery**: The platform interface discovers available SR-IOV devices using platform-specific methods
48+
5. **Plugin Selection**: The platform selects appropriate vendor plugins based on discovered devices and platform constraints
49+
6. **Configuration Application**: When SR-IOV configurations change, the daemon uses the platform interface to apply changes through the selected plugins
50+
7. **Node Management**: During node operations, the orchestrator interface handles any distribution-specific logic like cordon/uncordon coordination
51+
52+
*NOTE:* The platform is detected at startup based on node metadata and environment variables, while the orchestrator is detected based on cluster characteristics and available APIs.
53+
54+
```mermaid
55+
flowchart TD
56+
A[Daemon Startup] --> B[Platform Detection]
57+
B --> C{Platform Type?}
58+
C -->|Bare Metal| D[Bare Metal Platform]
59+
C -->|OpenStack| E[OpenStack Platform]
60+
C -->|Other| F[Other Platform]
61+
62+
D --> G[Platform Initialization]
63+
E --> G
64+
F --> G
65+
66+
G --> H[Orchestrator Detection]
67+
H --> I{Orchestrator Type?}
68+
I -->|OpenShift| J[OpenShift Orchestrator]
69+
I -->|Kubernetes| K[Kubernetes Orchestrator]
70+
71+
J --> L[Device Discovery]
72+
K --> L
73+
74+
L --> M[Plugin Selection]
75+
M --> N[Ready for Configuration]
76+
77+
N --> O[SR-IOV Config Change?]
78+
O -->|Yes| P[Apply Configuration via Platform]
79+
O -->|No| Q[Node Operation?]
80+
81+
P --> R[Update Device State]
82+
R --> O
83+
84+
Q -->|Yes| S[Orchestrator Cordon/Uncordon Logic]
85+
Q -->|No| O
86+
S --> T[Platform Plugin Operations]
87+
T --> O
88+
89+
style A fill:#e1f5fe
90+
style N fill:#c8e6c9
91+
style P fill:#fff3e0
92+
style S fill:#fce4ec
93+
```
94+
95+
### API Extensions
96+
97+
#### Platform Interface
98+
99+
```golang
100+
type Interface interface {
101+
// Init initializes the platform-specific components and validates the platform environment
102+
Init() error
103+
104+
// GetHostHelpers returns the platform-specific host helpers interface for system operations
105+
// This allows platforms to provide platform-specific implementations for file operations,
106+
// command execution, and system interactions while maintaining testability through mocking
107+
GetHostHelpers() helper.HostHelpersInterface
108+
109+
// DiscoverSriovDevices discovers and returns all available SR-IOV capable devices on the platform
110+
// The discovery method varies by platform (PCI scanning for bare metal, metadata service for cloud)
111+
DiscoverSriovDevices() ([]sriovnetworkv1.InterfaceExt, error)
112+
113+
// DiscoverBridges discovers and returns bridge configuration information
114+
// Not supported on all platforms (e.g., not available in some virtualized environments)
115+
DiscoverBridges() (sriovnetworkv1.Bridges, error)
116+
117+
// GetPlugins returns the appropriate vendor plugins for the platform based on the node state
118+
// Returns the selected plugin and a list of all available plugins for the platform
119+
GetPlugins(ns *sriovnetworkv1.SriovNetworkNodeState) (plugin.VendorPlugin, []plugin.VendorPlugin, error)
120+
121+
// SystemdGetPlugin returns the appropriate plugin for systemd-based configuration phases
122+
// Not supported on all platforms (returns error for platforms that don't support systemd mode)
123+
SystemdGetPlugin(phase string) (plugin.VendorPlugin, error)
124+
}
125+
```
126+
127+
#### Orchestrator Interface
128+
129+
```golang
130+
type Interface interface {
131+
// ClusterType returns the detected Kubernetes distribution type (e.g., OpenShift, Kubernetes)
132+
ClusterType() consts.ClusterType
133+
134+
// Flavor returns the specific flavor/variant of the orchestrator (e.g., Vanilla, Hypershift for OpenShift)
135+
Flavor() consts.ClusterFlavor
136+
137+
// BeforeNodeCordonProcess performs orchestrator-specific actions before cordoning a node
138+
// Returns true if the cordon operation should proceed, false to skip cordoning
139+
BeforeNodeCordonProcess(context.Context, *corev1.Node) (bool, error)
140+
141+
// AfterNodeUncordonProcess performs orchestrator-specific cleanup actions after uncordoning completes
142+
// Returns true if post-uncordon operations completed successfully
143+
AfterNodeUncordonProcess(context.Context, *corev1.Node) (bool, error)
144+
}
145+
```
146+
147+
### Implementation Details/Notes/Constraints
148+
149+
#### Platform Implementations
150+
151+
1. **Bare Metal Platform (`pkg/platform/baremetal/`)**:
152+
- Uses standard SR-IOV device discovery
153+
- Supports vendor-specific plugins (Intel, Mellanox)
154+
- Handles bridge discovery and management
155+
- Supports both daemon and systemd configuration modes
156+
157+
2. **OpenStack Platform (`pkg/platform/openstack/`)**:
158+
- Uses virtual device discovery based on OpenStack metadata
159+
- Reads device information from config-drive or metadata service
160+
- Uses virtual plugin for VF configuration
161+
- Does not support systemd mode or bridge management
162+
163+
#### Orchestrator Implementations
164+
165+
1. **Kubernetes Orchestrator (`pkg/orchestrator/kubernetes/`)**:
166+
- Simple implementation with minimal cluster-specific logic
167+
- No special cordon/uncordon handling (returns true for all cordon/uncordon operations)
168+
- Vanilla Kubernetes flavor
169+
170+
2. **OpenShift Orchestrator (`pkg/orchestrator/openshift/`)**:
171+
- Complex cordon/uncordon handling with Machine Config Pool management
172+
- Supports both regular OpenShift and Hypershift flavors
173+
- Manages MCP pausing during node operations
174+
175+
#### Platform Detection
176+
177+
Platform detection occurs in the daemon startup code based on:
178+
- Node provider ID examination
179+
- Environment variables
180+
- Available metadata services
181+
182+
```golang
183+
// Platform detection logic
184+
for key, pType := range vars.PlatformsMap {
185+
if strings.Contains(strings.ToLower(nodeInfo.Spec.ProviderID), strings.ToLower(key)) {
186+
vars.PlatformType = pType
187+
}
188+
}
189+
```
190+
191+
#### Factory Pattern
192+
193+
Both platform and orchestrator use factory patterns for instantiation, facilitating easy extensions for new implementations:
194+
195+
```golang
196+
// Platform factory
197+
func New(hostHelpers helper.HostHelpersInterface) (Interface, error) {
198+
switch vars.PlatformType {
199+
case consts.Baremetal:
200+
return baremetal.New(hostHelpers)
201+
case consts.VirtualOpenStack:
202+
return openstack.New(hostHelpers)
203+
default:
204+
return nil, fmt.Errorf("unknown platform type %s", vars.PlatformType)
205+
}
206+
}
207+
208+
// Orchestrator factory
209+
func New() (Interface, error) {
210+
switch vars.ClusterType {
211+
case consts.ClusterTypeOpenshift:
212+
return openshift.New()
213+
case consts.ClusterTypeKubernetes:
214+
return kubernetes.New()
215+
default:
216+
return nil, fmt.Errorf("unknown orchestration type: %s", vars.ClusterType)
217+
}
218+
}
219+
```
220+
221+
### Upgrade & Downgrade considerations
222+
223+
Existing configurations and behaviors are preserved, with the abstraction layer providing the same functionality through the new interface structure.
224+
225+
No user-facing API changes are required, and existing SR-IOV configurations will continue to work without modification.
226+
227+
### Test Plan
228+
229+
The implementation includes comprehensive unit tests for both platform and orchestrator abstractions:
230+
231+
1. **Platform Tests**: Test device discovery, plugin loading, and platform-specific behaviors for both bare metal and OpenStack platforms
232+
2. **Orchestrator Tests**: Test cluster type detection, cordon/uncordon handling, and orchestrator-specific behaviors for both Kubernetes and OpenShift
233+
3. **Integration Tests**: Ensure the abstractions work correctly with the existing daemon and operator logic
234+
4. **Mock Interfaces**: Generated mock interfaces enable comprehensive unit testing of components that depend on platform and orchestrator abstractions
235+
236+
## Benefits for Adding New Platforms
237+
238+
### 1. Clear Separation of Concerns
239+
240+
The abstraction separates infrastructure-specific logic (platform) from Kubernetes distribution-specific logic (orchestrator), making it easier to reason about and implement support for new platforms.
241+
242+
### 2. Standardized Interface
243+
244+
New platforms only need to implement the well-defined `Platform Interface`, which includes:
245+
- Device discovery methods
246+
- Plugin selection logic
247+
- Platform-specific initialization
248+
249+
### 3. Minimal Core Changes
250+
251+
Adding a new platform requires:
252+
1. Creating a new package under `pkg/platform/<platform-name>/`
253+
2. Implementing the `Platform Interface`
254+
3. Adding the platform to the factory function
255+
4. Adding platform detection logic
256+
257+
No changes to core operator logic, existing platforms, or user-facing APIs are required.
258+
259+
### 4. Plugin Architecture
260+
261+
The platform interface includes plugin selection methods, allowing each platform to:
262+
- Choose appropriate vendor plugins
263+
- Use platform-specific plugins (like the virtual plugin for OpenStack)
264+
- Support different configuration modes (daemon vs systemd)
265+
266+
### 5. Independent Development and Testing
267+
268+
Each platform implementation is self-contained, enabling:
269+
- Independent development of platform support
270+
- Platform-specific unit tests
271+
- Mock-based testing of platform interactions
272+
- Easier debugging and maintenance
273+
274+
### Example: Adding a New Platform
275+
276+
Adding support for a new platform follows a standardized process:
277+
278+
1. **Create Platform Package**: Create a new package under `pkg/platform/<platform-name>/` that implements the Platform Interface
279+
2. **Platform Detection**: Add platform detection logic to identify the new platform based on node metadata or environment variables
280+
3. **Factory Registration**: Register the new platform in the factory function to enable instantiation
281+
4. **Plugin Integration**: Implement platform-specific plugin selection and device discovery logic
282+
5. **Testing**: Add comprehensive unit tests for the new platform implementation
283+
284+
This standardized approach ensures that new platforms integrate seamlessly with the existing operator architecture without requiring changes to core logic or other platform implementations.

0 commit comments

Comments
 (0)