|
| 1 | +--- |
| 2 | +title: Platform and Orchestrator Abstraction |
| 3 | +authors: |
| 4 | + - sriov-network-operator team |
| 5 | +reviewers: |
| 6 | + - TBD |
| 7 | +creation-date: 21-07-2025 |
| 8 | +last-updated: 21-07-2025 |
| 9 | +--- |
| 10 | + |
| 11 | +# Platform and Orchestrator Abstraction |
| 12 | + |
| 13 | +## Summary |
| 14 | + |
| 15 | +This design document describes the introduction of platform and orchestrator abstraction layers in the SR-IOV Network Operator. These abstractions separate platform-specific (infrastructure provider) logic from orchestrator-specific (Kubernetes distribution) logic, making it easier to add support for new infrastructure platforms and Kubernetes distributions. |
| 16 | + |
| 17 | +## Motivation |
| 18 | + |
| 19 | +The SR-IOV Network Operator has historically been tightly coupled to specific infrastructure platforms and Kubernetes distributions, particularly OpenShift. As the operator expanded to support different virtualization platforms like OpenStack, AWS, Oracle and various Kubernetes distributions, the need for a clean abstraction layer became apparent. |
| 20 | + |
| 21 | +### Use Cases |
| 22 | + |
| 23 | +1. **Multi-Platform Support**: Enable the operator to run efficiently on different infrastructure platforms (bare metal, OpenStack, AWS, Oracle, etc.) with platform-specific optimizations |
| 24 | +2. **Multi-Orchestrator Support**: Support different Kubernetes distributions (vanilla Kubernetes, OpenShift, etc.) with orchestrator-specific behaviors |
| 25 | + |
| 26 | +### Goals |
| 27 | + |
| 28 | +* Create a clean abstraction layer that separates platform-specific logic from orchestrator-specific logic |
| 29 | +* Re-implement existing support for bare metal and OpenStack platforms using the new abstraction layer |
| 30 | +* Re-implement existing support for Kubernetes and OpenShift orchestrators using the new abstraction layer |
| 31 | +* Provide a plugin architecture that makes it easy to add new platforms and orchestrators |
| 32 | +* Maintain backward compatibility with existing functionality |
| 33 | +* Enable better testability through interface-based design |
| 34 | + |
| 35 | +### Non-Goals |
| 36 | + |
| 37 | +* Support all possible infrastructure platforms in the initial implementation |
| 38 | +* Change existing SR-IOV CRD API structures or user-facing configuration interfaces |
| 39 | + |
| 40 | +## Proposal |
| 41 | + |
| 42 | +### Workflow Description |
| 43 | + |
| 44 | +1. **Daemon Startup**: The SR-IOV daemon detects the platform type by examining the node's provider ID and environment variables |
| 45 | +2. **Platform Initialization**: The appropriate platform implementation is instantiated using the factory pattern and initialized |
| 46 | +3. **Orchestrator Detection**: The orchestrator type is detected based on cluster APIs and characteristics |
| 47 | +4. **Device Discovery**: The platform interface discovers available SR-IOV devices using platform-specific methods |
| 48 | +5. **Plugin Selection**: The platform selects appropriate vendor plugins based on discovered devices and platform constraints |
| 49 | +6. **Configuration Application**: When SR-IOV configurations change, the daemon uses the platform interface to apply changes through the selected plugins |
| 50 | +7. **Node Management**: During node operations, the orchestrator interface handles any distribution-specific logic like cordon/uncordon coordination |
| 51 | + |
| 52 | +*NOTE:* The platform is detected at startup based on node metadata and environment variables, while the orchestrator is detected based on cluster characteristics and available APIs. |
| 53 | + |
| 54 | +```mermaid |
| 55 | +flowchart TD |
| 56 | + A[Daemon Startup] --> B[Platform Detection] |
| 57 | + B --> C{Platform Type?} |
| 58 | + C -->|Bare Metal| D[Bare Metal Platform] |
| 59 | + C -->|OpenStack| E[OpenStack Platform] |
| 60 | + C -->|Other| F[Other Platform] |
| 61 | +
|
| 62 | + D --> G[Platform Initialization] |
| 63 | + E --> G |
| 64 | + F --> G |
| 65 | +
|
| 66 | + G --> H[Orchestrator Detection] |
| 67 | + H --> I{Orchestrator Type?} |
| 68 | + I -->|OpenShift| J[OpenShift Orchestrator] |
| 69 | + I -->|Kubernetes| K[Kubernetes Orchestrator] |
| 70 | +
|
| 71 | + J --> L[Device Discovery] |
| 72 | + K --> L |
| 73 | +
|
| 74 | + L --> M[Plugin Selection] |
| 75 | + M --> N[Ready for Configuration] |
| 76 | +
|
| 77 | + N --> O[SR-IOV Config Change?] |
| 78 | + O -->|Yes| P[Apply Configuration via Platform] |
| 79 | + O -->|No| Q[Node Operation?] |
| 80 | +
|
| 81 | + P --> R[Update Device State] |
| 82 | + R --> O |
| 83 | +
|
| 84 | + Q -->|Yes| S[Orchestrator Cordon/Uncordon Logic] |
| 85 | + Q -->|No| O |
| 86 | + S --> T[Platform Plugin Operations] |
| 87 | + T --> O |
| 88 | +
|
| 89 | + style A fill:#e1f5fe |
| 90 | + style N fill:#c8e6c9 |
| 91 | + style P fill:#fff3e0 |
| 92 | + style S fill:#fce4ec |
| 93 | +``` |
| 94 | + |
| 95 | +### API Extensions |
| 96 | + |
| 97 | +#### Platform Interface |
| 98 | + |
| 99 | +```golang |
| 100 | +type Interface interface { |
| 101 | + // Init initializes the platform-specific components and validates the platform environment |
| 102 | + Init() error |
| 103 | + |
| 104 | + // GetHostHelpers returns the platform-specific host helpers interface for system operations |
| 105 | + // This allows platforms to provide platform-specific implementations for file operations, |
| 106 | + // command execution, and system interactions while maintaining testability through mocking |
| 107 | + GetHostHelpers() helper.HostHelpersInterface |
| 108 | + |
| 109 | + // DiscoverSriovDevices discovers and returns all available SR-IOV capable devices on the platform |
| 110 | + // The discovery method varies by platform (PCI scanning for bare metal, metadata service for cloud) |
| 111 | + DiscoverSriovDevices() ([]sriovnetworkv1.InterfaceExt, error) |
| 112 | + |
| 113 | + // DiscoverBridges discovers and returns bridge configuration information |
| 114 | + // Not supported on all platforms (e.g., not available in some virtualized environments) |
| 115 | + DiscoverBridges() (sriovnetworkv1.Bridges, error) |
| 116 | + |
| 117 | + // GetPlugins returns the appropriate vendor plugins for the platform based on the node state |
| 118 | + // Returns the selected plugin and a list of all available plugins for the platform |
| 119 | + GetPlugins(ns *sriovnetworkv1.SriovNetworkNodeState) (plugin.VendorPlugin, []plugin.VendorPlugin, error) |
| 120 | + |
| 121 | + // SystemdGetPlugin returns the appropriate plugin for systemd-based configuration phases |
| 122 | + // Not supported on all platforms (returns error for platforms that don't support systemd mode) |
| 123 | + SystemdGetPlugin(phase string) (plugin.VendorPlugin, error) |
| 124 | +} |
| 125 | +``` |
| 126 | + |
| 127 | +#### Orchestrator Interface |
| 128 | + |
| 129 | +```golang |
| 130 | +type Interface interface { |
| 131 | + // ClusterType returns the detected Kubernetes distribution type (e.g., OpenShift, Kubernetes) |
| 132 | + ClusterType() consts.ClusterType |
| 133 | + |
| 134 | + // Flavor returns the specific flavor/variant of the orchestrator (e.g., Vanilla, Hypershift for OpenShift) |
| 135 | + Flavor() consts.ClusterFlavor |
| 136 | + |
| 137 | + // BeforeNodeCordonProcess performs orchestrator-specific actions before cordoning a node |
| 138 | + // Returns true if the cordon operation should proceed, false to skip cordoning |
| 139 | + BeforeNodeCordonProcess(context.Context, *corev1.Node) (bool, error) |
| 140 | + |
| 141 | + // AfterNodeUncordonProcess performs orchestrator-specific cleanup actions after uncordoning completes |
| 142 | + // Returns true if post-uncordon operations completed successfully |
| 143 | + AfterNodeUncordonProcess(context.Context, *corev1.Node) (bool, error) |
| 144 | +} |
| 145 | +``` |
| 146 | + |
| 147 | +### Implementation Details/Notes/Constraints |
| 148 | + |
| 149 | +#### Platform Implementations |
| 150 | + |
| 151 | +1. **Bare Metal Platform (`pkg/platform/baremetal/`)**: |
| 152 | + - Uses standard SR-IOV device discovery |
| 153 | + - Supports vendor-specific plugins (Intel, Mellanox) |
| 154 | + - Handles bridge discovery and management |
| 155 | + - Supports both daemon and systemd configuration modes |
| 156 | + |
| 157 | +2. **OpenStack Platform (`pkg/platform/openstack/`)**: |
| 158 | + - Uses virtual device discovery based on OpenStack metadata |
| 159 | + - Reads device information from config-drive or metadata service |
| 160 | + - Uses virtual plugin for VF configuration |
| 161 | + - Does not support systemd mode or bridge management |
| 162 | + |
| 163 | +#### Orchestrator Implementations |
| 164 | + |
| 165 | +1. **Kubernetes Orchestrator (`pkg/orchestrator/kubernetes/`)**: |
| 166 | + - Simple implementation with minimal cluster-specific logic |
| 167 | + - No special cordon/uncordon handling (returns true for all cordon/uncordon operations) |
| 168 | + - Vanilla Kubernetes flavor |
| 169 | + |
| 170 | +2. **OpenShift Orchestrator (`pkg/orchestrator/openshift/`)**: |
| 171 | + - Complex cordon/uncordon handling with Machine Config Pool management |
| 172 | + - Supports both regular OpenShift and Hypershift flavors |
| 173 | + - Manages MCP pausing during node operations |
| 174 | + |
| 175 | +#### Platform Detection |
| 176 | + |
| 177 | +Platform detection occurs in the daemon startup code based on: |
| 178 | +- Node provider ID examination |
| 179 | +- Environment variables |
| 180 | +- Available metadata services |
| 181 | + |
| 182 | +```golang |
| 183 | +// Platform detection logic |
| 184 | +for key, pType := range vars.PlatformsMap { |
| 185 | + if strings.Contains(strings.ToLower(nodeInfo.Spec.ProviderID), strings.ToLower(key)) { |
| 186 | + vars.PlatformType = pType |
| 187 | + } |
| 188 | +} |
| 189 | +``` |
| 190 | + |
| 191 | +#### Factory Pattern |
| 192 | + |
| 193 | +Both platform and orchestrator use factory patterns for instantiation, facilitating easy extensions for new implementations: |
| 194 | + |
| 195 | +```golang |
| 196 | +// Platform factory |
| 197 | +func New(hostHelpers helper.HostHelpersInterface) (Interface, error) { |
| 198 | + switch vars.PlatformType { |
| 199 | + case consts.Baremetal: |
| 200 | + return baremetal.New(hostHelpers) |
| 201 | + case consts.VirtualOpenStack: |
| 202 | + return openstack.New(hostHelpers) |
| 203 | + default: |
| 204 | + return nil, fmt.Errorf("unknown platform type %s", vars.PlatformType) |
| 205 | + } |
| 206 | +} |
| 207 | + |
| 208 | +// Orchestrator factory |
| 209 | +func New() (Interface, error) { |
| 210 | + switch vars.ClusterType { |
| 211 | + case consts.ClusterTypeOpenshift: |
| 212 | + return openshift.New() |
| 213 | + case consts.ClusterTypeKubernetes: |
| 214 | + return kubernetes.New() |
| 215 | + default: |
| 216 | + return nil, fmt.Errorf("unknown orchestration type: %s", vars.ClusterType) |
| 217 | + } |
| 218 | +} |
| 219 | +``` |
| 220 | + |
| 221 | +### Upgrade & Downgrade considerations |
| 222 | + |
| 223 | +Existing configurations and behaviors are preserved, with the abstraction layer providing the same functionality through the new interface structure. |
| 224 | + |
| 225 | +No user-facing API changes are required, and existing SR-IOV configurations will continue to work without modification. |
| 226 | + |
| 227 | +### Test Plan |
| 228 | + |
| 229 | +The implementation includes comprehensive unit tests for both platform and orchestrator abstractions: |
| 230 | + |
| 231 | +1. **Platform Tests**: Test device discovery, plugin loading, and platform-specific behaviors for both bare metal and OpenStack platforms |
| 232 | +2. **Orchestrator Tests**: Test cluster type detection, cordon/uncordon handling, and orchestrator-specific behaviors for both Kubernetes and OpenShift |
| 233 | +3. **Integration Tests**: Ensure the abstractions work correctly with the existing daemon and operator logic |
| 234 | +4. **Mock Interfaces**: Generated mock interfaces enable comprehensive unit testing of components that depend on platform and orchestrator abstractions |
| 235 | + |
| 236 | +## Benefits for Adding New Platforms |
| 237 | + |
| 238 | +### 1. Clear Separation of Concerns |
| 239 | + |
| 240 | +The abstraction separates infrastructure-specific logic (platform) from Kubernetes distribution-specific logic (orchestrator), making it easier to reason about and implement support for new platforms. |
| 241 | + |
| 242 | +### 2. Standardized Interface |
| 243 | + |
| 244 | +New platforms only need to implement the well-defined `Platform Interface`, which includes: |
| 245 | +- Device discovery methods |
| 246 | +- Plugin selection logic |
| 247 | +- Platform-specific initialization |
| 248 | + |
| 249 | +### 3. Minimal Core Changes |
| 250 | + |
| 251 | +Adding a new platform requires: |
| 252 | +1. Creating a new package under `pkg/platform/<platform-name>/` |
| 253 | +2. Implementing the `Platform Interface` |
| 254 | +3. Adding the platform to the factory function |
| 255 | +4. Adding platform detection logic |
| 256 | + |
| 257 | +No changes to core operator logic, existing platforms, or user-facing APIs are required. |
| 258 | + |
| 259 | +### 4. Plugin Architecture |
| 260 | + |
| 261 | +The platform interface includes plugin selection methods, allowing each platform to: |
| 262 | +- Choose appropriate vendor plugins |
| 263 | +- Use platform-specific plugins (like the virtual plugin for OpenStack) |
| 264 | +- Support different configuration modes (daemon vs systemd) |
| 265 | + |
| 266 | +### 5. Independent Development and Testing |
| 267 | + |
| 268 | +Each platform implementation is self-contained, enabling: |
| 269 | +- Independent development of platform support |
| 270 | +- Platform-specific unit tests |
| 271 | +- Mock-based testing of platform interactions |
| 272 | +- Easier debugging and maintenance |
| 273 | + |
| 274 | +### Example: Adding a New Platform |
| 275 | + |
| 276 | +Adding support for a new platform follows a standardized process: |
| 277 | + |
| 278 | +1. **Create Platform Package**: Create a new package under `pkg/platform/<platform-name>/` that implements the Platform Interface |
| 279 | +2. **Platform Detection**: Add platform detection logic to identify the new platform based on node metadata or environment variables |
| 280 | +3. **Factory Registration**: Register the new platform in the factory function to enable instantiation |
| 281 | +4. **Plugin Integration**: Implement platform-specific plugin selection and device discovery logic |
| 282 | +5. **Testing**: Add comprehensive unit tests for the new platform implementation |
| 283 | + |
| 284 | +This standardized approach ensures that new platforms integrate seamlessly with the existing operator architecture without requiring changes to core logic or other platform implementations. |
0 commit comments