|
| 1 | +--- |
| 2 | +title: Platform and Orchestrator Abstraction |
| 3 | +authors: |
| 4 | + - sriov-network-operator team |
| 5 | +reviewers: |
| 6 | + - TBD |
| 7 | +creation-date: 21-07-2025 |
| 8 | +last-updated: 21-07-2025 |
| 9 | +--- |
| 10 | + |
| 11 | +# Platform and Orchestrator Abstraction |
| 12 | + |
| 13 | +## Summary |
| 14 | + |
| 15 | +This design document describes the introduction of platform and orchestrator abstraction layers in the SR-IOV Network Operator. These abstractions separate platform-specific (infrastructure provider) logic from orchestrator-specific (Kubernetes distribution) logic, making it easier to add support for new infrastructure platforms and Kubernetes distributions. |
| 16 | + |
| 17 | +## Motivation |
| 18 | + |
| 19 | +The SR-IOV Network Operator has historically been tightly coupled to specific infrastructure platforms and Kubernetes distributions, particularly OpenShift. As the operator expanded to support different virtualization platforms like OpenStack, AWS, Oracle and various Kubernetes distributions, the need for a clean abstraction layer became apparent. |
| 20 | + |
| 21 | +### Use Cases |
| 22 | + |
| 23 | +1. **Multi-Platform Support**: Enable the operator to run efficiently on different infrastructure platforms (bare metal, OpenStack, AWS,Oracle, etc.) with platform-specific optimizations |
| 24 | +2. **Multi-Orchestrator Support**: Support different Kubernetes distributions (vanilla Kubernetes, OpenShift, etc.) with orchestrator-specific behaviors |
| 25 | +3. **Extensibility**: Make it easy to add new platforms and orchestrators without modifying core operator logic |
| 26 | +4. **Testing**: Enable better unit testing with mockable interfaces for platform and orchestrator specific operations |
| 27 | + |
| 28 | +### Goals |
| 29 | + |
| 30 | +* Create a clean abstraction layer that separates platform-specific logic from orchestrator-specific logic |
| 31 | +* Implement support for bare metal and OpenStack platforms |
| 32 | +* Implement support for Kubernetes and OpenShift orchestrators |
| 33 | +* Provide a plugin architecture that makes it easy to add new platforms and orchestrators |
| 34 | +* Maintain backward compatibility with existing functionality |
| 35 | +* Enable better testability through interface-based design |
| 36 | + |
| 37 | +### Non-Goals |
| 38 | + |
| 39 | +* Support all possible infrastructure platforms in the initial implementation |
| 40 | +* Change existing API structures or user-facing interfaces |
| 41 | + |
| 42 | +## Proposal |
| 43 | + |
| 44 | +### Workflow Description |
| 45 | + |
| 46 | +The operator will use two main abstraction layers: |
| 47 | + |
| 48 | +1. **Platform Interface**: Handles infrastructure-specific operations like device discovery, bridge management, and plugin selection |
| 49 | +2. **Orchestrator Interface**: Handles Kubernetes distribution-specific operations like cluster type detection, additional node draining logic, and cluster-specific configurations |
| 50 | + |
| 51 | +The platform is detected at startup based on node metadata and environment variables, while the orchestrator is detected based on cluster characteristics and available APIs. |
| 52 | + |
| 53 | +### API Extensions |
| 54 | + |
| 55 | +#### Platform Interface |
| 56 | + |
| 57 | +```golang |
| 58 | +type Interface interface { |
| 59 | + Init() error |
| 60 | + GetHostHelpers() helper.HostHelpersInterface |
| 61 | + |
| 62 | + DiscoverSriovDevices() ([]sriovnetworkv1.InterfaceExt, error) |
| 63 | + DiscoverBridges() (sriovnetworkv1.Bridges, error) |
| 64 | + |
| 65 | + GetPlugins(ns *sriovnetworkv1.SriovNetworkNodeState) (plugin.VendorPlugin, []plugin.VendorPlugin, error) |
| 66 | + SystemdGetPlugin(phase string) (plugin.VendorPlugin, error) |
| 67 | +} |
| 68 | +``` |
| 69 | + |
| 70 | +#### Orchestrator Interface |
| 71 | + |
| 72 | +```golang |
| 73 | +type Interface interface { |
| 74 | + ClusterType() consts.ClusterType |
| 75 | + Flavor() consts.ClusterFlavor |
| 76 | + BeforeDrainNode(context.Context, *corev1.Node) (bool, error) |
| 77 | + AfterCompleteDrainNode(context.Context, *corev1.Node) (bool, error) |
| 78 | +} |
| 79 | +``` |
| 80 | + |
| 81 | +### Implementation Details/Notes/Constraints |
| 82 | + |
| 83 | +#### Platform Implementations |
| 84 | + |
| 85 | +1. **Bare Metal Platform (`pkg/platform/baremetal/`)**: |
| 86 | + - Uses standard SR-IOV device discovery |
| 87 | + - Supports vendor-specific plugins (Intel, Mellanox) |
| 88 | + - Handles bridge discovery and management |
| 89 | + - Supports both daemon and systemd configuration modes |
| 90 | + |
| 91 | +2. **OpenStack Platform (`pkg/platform/openstack/`)**: |
| 92 | + - Uses virtual device discovery based on OpenStack metadata |
| 93 | + - Reads device information from config-drive or metadata service |
| 94 | + - Uses virtual plugin for VF configuration |
| 95 | + - Does not support systemd mode or bridge management |
| 96 | + |
| 97 | +#### Orchestrator Implementations |
| 98 | + |
| 99 | +1. **Kubernetes Orchestrator (`pkg/orchestrator/kubernetes/`)**: |
| 100 | + - Simple implementation with minimal cluster-specific logic |
| 101 | + - No special drain handling (returns true for all drain operations) |
| 102 | + - Vanilla Kubernetes flavor |
| 103 | + |
| 104 | +2. **OpenShift Orchestrator (`pkg/orchestrator/openshift/`)**: |
| 105 | + - Complex drain handling with Machine Config Pool management |
| 106 | + - Supports both regular OpenShift and Hypershift flavors |
| 107 | + - Manages MCP pausing during node operations |
| 108 | + |
| 109 | +#### Platform Detection |
| 110 | + |
| 111 | +Platform detection occurs in the daemon startup code based on: |
| 112 | +- Node provider ID examination |
| 113 | +- Environment variables |
| 114 | +- Available metadata services |
| 115 | + |
| 116 | +```golang |
| 117 | +// Platform detection logic |
| 118 | +for key, pType := range vars.PlatformsMap { |
| 119 | + if strings.Contains(strings.ToLower(nodeInfo.Spec.ProviderID), strings.ToLower(key)) { |
| 120 | + vars.PlatformType = pType |
| 121 | + } |
| 122 | +} |
| 123 | +``` |
| 124 | + |
| 125 | +#### Factory Pattern |
| 126 | + |
| 127 | +Both platform and orchestrator use factory patterns for instantiation: |
| 128 | + |
| 129 | +```golang |
| 130 | +// Platform factory |
| 131 | +func New(hostHelpers helper.HostHelpersInterface) (Interface, error) { |
| 132 | + switch vars.PlatformType { |
| 133 | + case consts.Baremetal: |
| 134 | + return baremetal.New(hostHelpers) |
| 135 | + case consts.VirtualOpenStack: |
| 136 | + return openstack.New(hostHelpers) |
| 137 | + default: |
| 138 | + return nil, fmt.Errorf("unknown platform type %s", vars.PlatformType) |
| 139 | + } |
| 140 | +} |
| 141 | + |
| 142 | +// Orchestrator factory |
| 143 | +func New() (Interface, error) { |
| 144 | + switch vars.ClusterType { |
| 145 | + case consts.ClusterTypeOpenshift: |
| 146 | + return openshift.New() |
| 147 | + case consts.ClusterTypeKubernetes: |
| 148 | + return kubernetes.New() |
| 149 | + default: |
| 150 | + return nil, fmt.Errorf("unknown orchestration type: %s", vars.ClusterType) |
| 151 | + } |
| 152 | +} |
| 153 | +``` |
| 154 | + |
| 155 | +### Upgrade & Downgrade considerations |
| 156 | + |
| 157 | +The abstraction layer is designed to be backward compatible. Existing configurations and behaviors are preserved, with the abstraction layer providing the same functionality through the new interface structure. |
| 158 | + |
| 159 | +No user-facing API changes are required, and existing SR-IOV configurations will continue to work without modification. |
| 160 | + |
| 161 | +### Test Plan |
| 162 | + |
| 163 | +The implementation includes comprehensive unit tests for both platform and orchestrator abstractions: |
| 164 | + |
| 165 | +1. **Platform Tests**: Test device discovery, plugin loading, and platform-specific behaviors for both bare metal and OpenStack platforms |
| 166 | +2. **Orchestrator Tests**: Test cluster type detection, drain handling, and orchestrator-specific behaviors for both Kubernetes and OpenShift |
| 167 | +3. **Integration Tests**: Ensure the abstractions work correctly with the existing daemon and operator logic |
| 168 | +4. **Mock Interfaces**: Generated mock interfaces enable comprehensive unit testing of components that depend on platform and orchestrator abstractions |
| 169 | + |
| 170 | +## Benefits for Adding New Platforms |
| 171 | + |
| 172 | +### 1. Clear Separation of Concerns |
| 173 | + |
| 174 | +The abstraction separates infrastructure-specific logic (platform) from Kubernetes distribution-specific logic (orchestrator), making it easier to reason about and implement support for new platforms. |
| 175 | + |
| 176 | +### 2. Standardized Interface |
| 177 | + |
| 178 | +New platforms only need to implement the well-defined `Platform Interface`, which includes: |
| 179 | +- Device discovery methods |
| 180 | +- Plugin selection logic |
| 181 | +- Platform-specific initialization |
| 182 | + |
| 183 | +### 3. Minimal Core Changes |
| 184 | + |
| 185 | +Adding a new platform requires: |
| 186 | +1. Creating a new package under `pkg/platform/<platform-name>/` |
| 187 | +2. Implementing the `Platform Interface` |
| 188 | +3. Adding the platform to the factory function |
| 189 | +4. Adding platform detection logic |
| 190 | + |
| 191 | +No changes to core operator logic, existing platforms, or user-facing APIs are required. |
| 192 | + |
| 193 | +### 4. Plugin Architecture |
| 194 | + |
| 195 | +The platform interface includes plugin selection methods, allowing each platform to: |
| 196 | +- Choose appropriate vendor plugins |
| 197 | +- Use platform-specific plugins (like the virtual plugin for OpenStack) |
| 198 | +- Support different configuration modes (daemon vs systemd) |
| 199 | + |
| 200 | +### 5. Independent Development and Testing |
| 201 | + |
| 202 | +Each platform implementation is self-contained, enabling: |
| 203 | +- Independent development of platform support |
| 204 | +- Platform-specific unit tests |
| 205 | +- Mock-based testing of platform interactions |
| 206 | +- Easier debugging and maintenance |
| 207 | + |
| 208 | +### Example: Adding a New Platform (AWS Implementation) |
| 209 | + |
| 210 | +The AWS platform implementation demonstrates how to add support for a new cloud platform: |
| 211 | + |
| 212 | +1. Create `pkg/platform/aws/aws.go`: |
| 213 | +```golang |
| 214 | +package aws |
| 215 | + |
| 216 | +import ( |
| 217 | + sriovnetworkv1 "github.com/k8snetworkplumbingwg/sriov-network-operator/api/v1" |
| 218 | + "github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/helper" |
| 219 | + plugin "github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/plugins" |
| 220 | + virtualplugin "github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/plugins/virtual" |
| 221 | +) |
| 222 | + |
| 223 | +type Aws struct { |
| 224 | + hostHelpers helper.HostHelpersInterface |
| 225 | + loadedDevicesInfo sriovnetworkv1.InterfaceExts |
| 226 | +} |
| 227 | + |
| 228 | +func New(hostHelpers helper.HostHelpersInterface) (*Aws, error) { |
| 229 | + return &Aws{hostHelpers: hostHelpers}, nil |
| 230 | +} |
| 231 | + |
| 232 | +func (a *Aws) GetPlugins(_ *sriovnetworkv1.SriovNetworkNodeState) (plugin.VendorPlugin, []plugin.VendorPlugin, error) { |
| 233 | + virtual, err := virtualplugin.NewVirtualPlugin(a.hostHelpers) |
| 234 | + return virtual, []plugin.VendorPlugin{}, err |
| 235 | +} |
| 236 | + |
| 237 | +func (a *Aws) DiscoverSriovDevices() ([]sriovnetworkv1.InterfaceExt, error) { |
| 238 | + // AWS-specific device discovery using EC2 metadata service |
| 239 | + // Fetches MAC addresses and subnet IDs from metadata service |
| 240 | + // Maps devices to AWS network configuration |
| 241 | +} |
| 242 | + |
| 243 | +func (a *Aws) SystemdGetPlugin(_ string) (plugin.VendorPlugin, error) { |
| 244 | + return nil, fmt.Errorf("aws platform not supported in systemd") |
| 245 | +} |
| 246 | + |
| 247 | +// Implement other interface methods... |
| 248 | +``` |
| 249 | + |
| 250 | +2. Add to the platform factory in `pkg/platform/platform.go`: |
| 251 | +```golang |
| 252 | +func New(hostHelpers helper.HostHelpersInterface) (Interface, error) { |
| 253 | + switch vars.PlatformType { |
| 254 | + case consts.Baremetal: |
| 255 | + return baremetal.New(hostHelpers) |
| 256 | + case consts.VirtualOpenStack: |
| 257 | + return openstack.New(hostHelpers) |
| 258 | + case consts.VirtualAWS: // New addition |
| 259 | + return aws.New(hostHelpers) |
| 260 | + default: |
| 261 | + return nil, fmt.Errorf("unknown platform type %s", vars.PlatformType) |
| 262 | + } |
| 263 | +} |
| 264 | +``` |
| 265 | + |
| 266 | +3. Add platform detection logic in `pkg/vars/vars.go`: |
| 267 | +```golang |
| 268 | +var PlatformsMap = map[string]consts.PlatformTypes{ |
| 269 | + "openstack": consts.VirtualOpenStack, |
| 270 | + "aws": consts.VirtualAWS, // New addition |
| 271 | +} |
| 272 | +``` |
| 273 | + |
| 274 | +4. Add platform constant in `pkg/consts/platforms.go`: |
| 275 | +```golang |
| 276 | +const ( |
| 277 | + Baremetal PlatformTypes = iota |
| 278 | + VirtualOpenStack |
| 279 | + VirtualAWS // New addition |
| 280 | +) |
| 281 | +``` |
| 282 | + |
| 283 | +Key features of the AWS implementation: |
| 284 | +- **Metadata Service Integration**: Uses AWS EC2 metadata service to discover network configuration |
| 285 | +- **Virtual Plugin Usage**: Leverages the existing virtual plugin for SR-IOV VF management |
| 286 | +- **Subnet ID Mapping**: Maps network interfaces to AWS subnet IDs for proper network filtering |
| 287 | +- **Comprehensive Testing**: Includes extensive unit tests with mocked HTTP calls |
| 288 | +- **Error Handling**: Robust error handling for metadata service failures |
| 289 | + |
| 290 | +This approach makes the SR-IOV Network Operator truly platform-agnostic while maintaining clean, maintainable code. |
0 commit comments