|
| 1 | +# PriorityClass Support Design Proposal |
| 2 | + |
| 3 | +## Abstract |
| 4 | +This design document outlines the implementation of priority class name support for Velero components, including the Velero server deployment, node agent daemonset, and maintenance jobs. This feature allows users to specify a priority class name for Velero components, which can be used to influence the scheduling and eviction behavior of these components. |
| 5 | + |
| 6 | +## Background |
| 7 | +Kubernetes allows users to define priority classes, which can be used to influence the scheduling and eviction behavior of pods. Priority classes are defined as cluster-wide resources, and pods can reference them by name. When a pod is created, the priority admission controller uses the priority class name to populate the priority value for the pod. The scheduler then uses this priority value to determine the order in which pods are scheduled. |
| 8 | + |
| 9 | +Currently, Velero does not provide a way for users to specify a priority class name for its components. This can be problematic in clusters where resource contention is high, as Velero components may be evicted or not scheduled in a timely manner, potentially impacting backup and restore operations. |
| 10 | + |
| 11 | +## Goals |
| 12 | +- Add support for specifying priority class names for Velero components |
| 13 | +- Update the Velero CLI to accept priority class name parameters for different components |
| 14 | +- Update the Velero deployment, node agent daemonset, maintenance jobs, and data mover pods to use the specified priority class names |
| 15 | + |
| 16 | +## Non Goals |
| 17 | +- Creating or managing priority classes |
| 18 | +- Automatically determining the appropriate priority class for Velero components |
| 19 | + |
| 20 | +## High-Level Design |
| 21 | +The implementation will add a new field to the Velero options struct to store the priority class name for the server component. The Velero CLI will be updated to accept a new flag for the server deployment. For all other components (node agent daemonset, data mover pods, and maintenance jobs), priority class names will be configured through existing ConfigMap mechanisms (`node-agent-configmap` and `repo-maintenance-job-configmap`). The Velero deployment, node agent daemonset, maintenance jobs, and data mover pods will be updated to use their respective priority class names. |
| 22 | + |
| 23 | +## Detailed Design |
| 24 | + |
| 25 | +### CLI Changes |
| 26 | +New flags will be added to the `velero install` command to specify priority class names for different components: |
| 27 | + |
| 28 | +```go |
| 29 | +flags.StringVar( |
| 30 | + &o.ServerPriorityClassName, |
| 31 | + "server-priority-class-name", |
| 32 | + o.ServerPriorityClassName, |
| 33 | + "Priority class name for the Velero server deployment. Optional.", |
| 34 | +) |
| 35 | +``` |
| 36 | + |
| 37 | +Note: Priority class names for node agent daemonset, data mover pods, and maintenance jobs will be configured through their respective ConfigMaps (`--node-agent-configmap` and `--repo-maintenance-job-configmap` flags). |
| 38 | + |
| 39 | +### Velero Options Changes |
| 40 | +The `VeleroOptions` struct in `pkg/install/resources.go` will be updated to include a new field for the server priority class name: |
| 41 | + |
| 42 | +```go |
| 43 | +type VeleroOptions struct { |
| 44 | + // ... existing fields ... |
| 45 | + ServerPriorityClassName string |
| 46 | +} |
| 47 | +``` |
| 48 | + |
| 49 | +### Deployment Changes |
| 50 | +The `podTemplateConfig` struct in `pkg/install/deployment.go` will be updated to include a new field for the priority class name: |
| 51 | + |
| 52 | +```go |
| 53 | +type podTemplateConfig struct { |
| 54 | + // ... existing fields ... |
| 55 | + priorityClassName string |
| 56 | +} |
| 57 | +``` |
| 58 | + |
| 59 | +A new function, `WithPriorityClassName`, will be added to set this field: |
| 60 | + |
| 61 | +```go |
| 62 | +func WithPriorityClassName(priorityClassName string) podTemplateOption { |
| 63 | + return func(c *podTemplateConfig) { |
| 64 | + c.priorityClassName = priorityClassName |
| 65 | + } |
| 66 | +} |
| 67 | +``` |
| 68 | + |
| 69 | +The `Deployment` function will be updated to use the priority class name: |
| 70 | + |
| 71 | +```go |
| 72 | +deployment := &appsv1api.Deployment{ |
| 73 | + // ... existing fields ... |
| 74 | + Spec: appsv1api.DeploymentSpec{ |
| 75 | + // ... existing fields ... |
| 76 | + Template: corev1api.PodTemplateSpec{ |
| 77 | + // ... existing fields ... |
| 78 | + Spec: corev1api.PodSpec{ |
| 79 | + // ... existing fields ... |
| 80 | + PriorityClassName: c.priorityClassName, |
| 81 | + }, |
| 82 | + }, |
| 83 | + }, |
| 84 | +} |
| 85 | +``` |
| 86 | + |
| 87 | +### DaemonSet Changes |
| 88 | +The `DaemonSet` function will be updated to retrieve and use the priority class name from the node-agent-configmap: |
| 89 | + |
| 90 | +```go |
| 91 | +// Get priority class from node-agent-configmap if it exists |
| 92 | +priorityClassName := "" |
| 93 | +if nodeAgentConfig != nil && nodeAgentConfig.PriorityClassName != "" { |
| 94 | + priorityClassName = nodeAgentConfig.PriorityClassName |
| 95 | +} |
| 96 | + |
| 97 | +daemonSet := &appsv1api.DaemonSet{ |
| 98 | + // ... existing fields ... |
| 99 | + Spec: appsv1api.DaemonSetSpec{ |
| 100 | + // ... existing fields ... |
| 101 | + Template: corev1api.PodTemplateSpec{ |
| 102 | + // ... existing fields ... |
| 103 | + Spec: corev1api.PodSpec{ |
| 104 | + // ... existing fields ... |
| 105 | + PriorityClassName: priorityClassName, |
| 106 | + }, |
| 107 | + }, |
| 108 | + }, |
| 109 | +} |
| 110 | +``` |
| 111 | + |
| 112 | +### Maintenance Job Changes |
| 113 | +The `JobConfigs` struct in `pkg/repository/maintenance/maintenance.go` will be updated to include a field for the priority class name: |
| 114 | + |
| 115 | +```go |
| 116 | +type JobConfigs struct { |
| 117 | + // LoadAffinities is the config for repository maintenance job load affinity. |
| 118 | + LoadAffinities []*kube.LoadAffinity `json:"loadAffinity,omitempty"` |
| 119 | + |
| 120 | + // PodResources is the config for the CPU and memory resources setting. |
| 121 | + PodResources *kube.PodResources `json:"podResources,omitempty"` |
| 122 | + |
| 123 | + // PriorityClassName is the priority class name for the maintenance job pod |
| 124 | + PriorityClassName string `json:"priorityClassName,omitempty"` |
| 125 | +} |
| 126 | +``` |
| 127 | + |
| 128 | +The `buildJob` function will be updated to use the priority class name from the job configuration: |
| 129 | + |
| 130 | +```go |
| 131 | +func buildJob(cli client.Client, ctx context.Context, repo *velerov1api.BackupRepository, bslName string, config *JobConfigs, |
| 132 | + podResources kube.PodResources, logLevel logrus.Level, logFormat *logging.FormatFlag) (*batchv1.Job, error) { |
| 133 | + // ... existing code ... |
| 134 | + |
| 135 | + // Use the priority class name from the job configuration if available |
| 136 | + priorityClassName := "" |
| 137 | + if config != nil && config.PriorityClassName != "" { |
| 138 | + priorityClassName = config.PriorityClassName |
| 139 | + } |
| 140 | + |
| 141 | + // ... existing code ... |
| 142 | + |
| 143 | + job := &batchv1.Job{ |
| 144 | + // ... existing fields ... |
| 145 | + Spec: batchv1.JobSpec{ |
| 146 | + // ... existing fields ... |
| 147 | + Template: corev1api.PodTemplateSpec{ |
| 148 | + // ... existing fields ... |
| 149 | + Spec: corev1api.PodSpec{ |
| 150 | + // ... existing fields ... |
| 151 | + PriorityClassName: priorityClassName, |
| 152 | + }, |
| 153 | + }, |
| 154 | + }, |
| 155 | + } |
| 156 | + |
| 157 | + // ... existing code ... |
| 158 | +} |
| 159 | +``` |
| 160 | + |
| 161 | +Users will be able to configure the priority class name for maintenance jobs by creating the repository maintenance job ConfigMap before installation. For example: |
| 162 | + |
| 163 | +```bash |
| 164 | +# Create the ConfigMap before running velero install |
| 165 | +cat <<EOF | kubectl create configmap repo-maintenance-job-config -n velero --from-file=config.json=/dev/stdin |
| 166 | +{ |
| 167 | + "global": { |
| 168 | + "priorityClassName": "low-priority", |
| 169 | + "podResources": { |
| 170 | + "cpuRequest": "100m", |
| 171 | + "memoryRequest": "128Mi" |
| 172 | + } |
| 173 | + }, |
| 174 | + "namespace1-default-kopia": { |
| 175 | + "priorityClassName": "medium-priority" |
| 176 | + } |
| 177 | +} |
| 178 | +EOF |
| 179 | + |
| 180 | +# Then install Velero referencing this ConfigMap |
| 181 | +velero install --provider aws \ |
| 182 | + --repo-maintenance-job-configmap repo-maintenance-job-config \ |
| 183 | + # ... other flags |
| 184 | +``` |
| 185 | + |
| 186 | +The ConfigMap can be updated after installation to change priority classes for future maintenance jobs. |
| 187 | + |
| 188 | +### Node Agent ConfigMap Changes |
| 189 | +We'll update the `Configs` struct in `pkg/nodeagent/node_agent.go` to include a field for the priority class name in the node-agent-configmap: |
| 190 | + |
| 191 | +```go |
| 192 | +type Configs struct { |
| 193 | + // ... existing fields ... |
| 194 | + |
| 195 | + // PriorityClassName is the priority class name for both the node agent daemonset |
| 196 | + // and the data mover pods it creates |
| 197 | + PriorityClassName string `json:"priorityClassName,omitempty"` |
| 198 | +} |
| 199 | +``` |
| 200 | + |
| 201 | +This will allow users to configure the priority class name for both the node agent daemonset and data mover pods through a single node-agent-configmap. For example: |
| 202 | + |
| 203 | +```bash |
| 204 | +# Create the ConfigMap before running velero install |
| 205 | +cat <<EOF | kubectl create configmap node-agent-config -n velero --from-file=config.json=/dev/stdin |
| 206 | +{ |
| 207 | + "priorityClassName": "low-priority", |
| 208 | + "loadAffinity": [ |
| 209 | + { |
| 210 | + "nodeSelector": { |
| 211 | + "matchLabels": { |
| 212 | + "node-role.kubernetes.io/worker": "true" |
| 213 | + } |
| 214 | + } |
| 215 | + } |
| 216 | + ] |
| 217 | +} |
| 218 | +EOF |
| 219 | + |
| 220 | +# Then install Velero referencing this ConfigMap |
| 221 | +velero install --provider aws \ |
| 222 | + --node-agent-configmap node-agent-config \ |
| 223 | + --use-node-agent \ |
| 224 | + # ... other flags |
| 225 | +``` |
| 226 | + |
| 227 | +The `createBackupPod` function in `pkg/exposer/csi_snapshot.go` will be updated to accept and use the priority class name: |
| 228 | + |
| 229 | +```go |
| 230 | +func (e *csiSnapshotExposer) createBackupPod( |
| 231 | + ctx context.Context, |
| 232 | + ownerObject corev1api.ObjectReference, |
| 233 | + backupPVC *corev1api.PersistentVolumeClaim, |
| 234 | + operationTimeout time.Duration, |
| 235 | + label map[string]string, |
| 236 | + annotation map[string]string, |
| 237 | + affinity *kube.LoadAffinity, |
| 238 | + resources corev1api.ResourceRequirements, |
| 239 | + backupPVCReadOnly bool, |
| 240 | + spcNoRelabeling bool, |
| 241 | + nodeOS string, |
| 242 | + priorityClassName string, // New parameter |
| 243 | +) (*corev1api.Pod, error) { |
| 244 | + // ... existing code ... |
| 245 | + |
| 246 | + pod := &corev1api.Pod{ |
| 247 | + // ... existing fields ... |
| 248 | + Spec: corev1api.PodSpec{ |
| 249 | + // ... existing fields ... |
| 250 | + PriorityClassName: priorityClassName, |
| 251 | + // ... existing fields ... |
| 252 | + }, |
| 253 | + } |
| 254 | + |
| 255 | + // ... existing code ... |
| 256 | +} |
| 257 | +``` |
| 258 | + |
| 259 | +The call to `createBackupPod` in the `Expose` method will be updated to pass the priority class name retrieved from the node-agent-configmap: |
| 260 | + |
| 261 | +```go |
| 262 | +priorityClassName, _ := veleroutil.GetDataMoverPriorityClassName(ctx, namespace, kubeClient, configMapName) |
| 263 | +backupPod, err := e.createBackupPod( |
| 264 | + ctx, |
| 265 | + ownerObject, |
| 266 | + backupPVC, |
| 267 | + csiExposeParam.OperationTimeout, |
| 268 | + csiExposeParam.HostingPodLabels, |
| 269 | + csiExposeParam.HostingPodAnnotations, |
| 270 | + csiExposeParam.Affinity, |
| 271 | + csiExposeParam.Resources, |
| 272 | + backupPVCReadOnly, |
| 273 | + spcNoRelabeling, |
| 274 | + csiExposeParam.NodeOS, |
| 275 | + priorityClassName, // Priority class name from node-agent-configmap |
| 276 | +) |
| 277 | +``` |
| 278 | + |
| 279 | +A new function, `GetDataMoverPriorityClassName`, will be added to the `veleroutil` package to retrieve the priority class name for data mover pods: |
| 280 | + |
| 281 | +```go |
| 282 | +func GetDataMoverPriorityClassName(ctx context.Context, namespace string, kubeClient kubernetes.Interface, configName string) (string, error) { |
| 283 | + // Get from node-agent-configmap |
| 284 | + configs, err := nodeagent.GetConfigs(ctx, namespace, kubeClient, configName) |
| 285 | + if err == nil && configs != nil && configs.PriorityClassName != "" { |
| 286 | + return configs.PriorityClassName, nil |
| 287 | + } |
| 288 | + |
| 289 | + // Return empty string if not found in configmap |
| 290 | + return "", nil |
| 291 | +} |
| 292 | +``` |
| 293 | + |
| 294 | +This function will get the priority class name from the node-agent-configmap. If it's not found, it will return an empty string. |
| 295 | + |
| 296 | +## Alternatives Considered |
| 297 | + |
| 298 | +1. **Using a single flag for all components**: We could have used a single flag for all components, but this would not allow for different priority classes for different components. Since maintenance jobs and data movers typically require lower priority than the Velero server, separate flags provide more flexibility. |
| 299 | + |
| 300 | +2. **Using a configuration file**: We could have added support for specifying the priority class names in a configuration file. However, this would have required additional changes to the Velero CLI and would have been more complex to implement. |
| 301 | + |
| 302 | +3. **Inheriting priority class from parent components**: We initially considered having maintenance jobs inherit their priority class from the Velero server, and data movers inherit from the node agent. However, this approach doesn't allow for the appropriate prioritization of different components based on their importance and resource requirements. |
| 303 | + |
| 304 | +## Security Considerations |
| 305 | + |
| 306 | +There are no security considerations for this feature. |
| 307 | + |
| 308 | +## Compatibility |
| 309 | + |
| 310 | +This feature is compatible with all Kubernetes versions that support priority classes. The PodPriority feature became stable in Kubernetes 1.14. For more information, see the [Kubernetes documentation on Pod Priority and Preemption](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/). |
| 311 | + |
| 312 | +## Implementation |
| 313 | + |
| 314 | +The implementation will involve the following steps: |
| 315 | + |
| 316 | +1. Add the priority class name field for server to the `VeleroOptions` struct |
| 317 | +2. Add the priority class name field to the `podTemplateConfig` struct |
| 318 | +3. Add the `WithPriorityClassName` function for the server deployment |
| 319 | +4. Update the `Deployment` function to use the server priority class name |
| 320 | +5. Update the `DaemonSet` function to retrieve and use the priority class name from node-agent-configmap |
| 321 | +6. Update the `JobConfigs` struct to include `PriorityClassName` field |
| 322 | +7. Update the `buildJob` function in maintenance job to use the priority class name from JobConfigs |
| 323 | +8. Update the `Configs` struct in node agent to include `PriorityClassName` field for both daemonset and data mover pods |
| 324 | +9. Update the data mover pod creation to use the priority class name from node-agent-configmap |
| 325 | +10. Add the priority class name flag for server to the `velero install` command |
| 326 | +11. Update documentation to explain how to configure priority classes via ConfigMaps |
| 327 | + |
| 328 | +Note: Only the server deployment will have a CLI flag for priority class. All other components (node agent daemonset, data mover pods, and maintenance jobs) will use their respective ConfigMaps for priority class configuration. |
| 329 | + |
| 330 | +This approach ensures that different Velero components can use different priority class names based on their importance and resource requirements: |
| 331 | + |
| 332 | +1. The Velero server deployment can use a higher priority class to ensure it continues running even under resource pressure. |
| 333 | +2. The node agent daemonset can use a medium priority class. |
| 334 | +3. Maintenance jobs can use a lower priority class since they should not run when resources are limited. |
| 335 | +4. Data mover pods can use a lower priority class since they should not run when resources are limited. |
| 336 | + |
| 337 | +### Implementation Considerations |
| 338 | + |
| 339 | +Priority class names for all components except the server deployment are configured through ConfigMaps that should be created before Velero installation but can be updated afterwards. This design leverages existing ConfigMap mechanisms: |
| 340 | + |
| 341 | +1. **Node Agent DaemonSet and Data Mover Pods**: Will use the node-agent-configmap (specified via the `--node-agent-configmap` flag). This single ConfigMap controls priority class for both the node agent daemonset itself and all data mover pods (including PVB and PVR) it creates. |
| 342 | + |
| 343 | +2. **Maintenance Jobs**: Will use the repository maintenance job ConfigMap (specified via the `--repo-maintenance-job-configmap` flag). Users should create this ConfigMap before running `velero install` with the desired priority class configuration. The ConfigMap can be updated after installation to change priority classes for future maintenance jobs. |
| 344 | + |
| 345 | +This approach has several advantages: |
| 346 | + |
| 347 | +- Leverages existing configuration mechanisms, minimizing new CLI flags |
| 348 | +- Provides a single point of configuration for related components (node agent and its pods) |
| 349 | +- Allows dynamic configuration updates without requiring Velero reinstallation |
| 350 | +- Provides flexibility to set different priority classes for different repositories |
| 351 | +- Maintains backward compatibility with existing installations |
| 352 | +- Enables administrators to set up priority classes during initial deployment |
| 353 | + |
| 354 | +The priority class name for data mover pods will be determined by checking the node-agent-configmap. This approach provides a centralized way to configure priority class names for all data mover pods. The same approach will be used for PVB (PodVolumeBackup) and PVR (PodVolumeRestore) pods, which will also retrieve their priority class name from the node-agent-configmap. |
| 355 | + |
| 356 | +For PVB and PVR pods specifically, the controllers will need to be updated to retrieve the priority class name from the node-agent-configmap and pass it to the pod creation functions. For example, in the PodVolumeBackup controller: |
| 357 | + |
| 358 | +```go |
| 359 | +// In pkg/controller/pod_volume_backup_controller.go |
| 360 | +priorityClassName, _ := veleroutil.GetDataMoverPriorityClassName(ctx, namespace, kubeClient, configMapName) |
| 361 | + |
| 362 | +// Add priorityClassName to the pod spec |
| 363 | +pod := &corev1api.Pod{ |
| 364 | + // ... existing fields ... |
| 365 | + Spec: corev1api.PodSpec{ |
| 366 | + // ... existing fields ... |
| 367 | + PriorityClassName: priorityClassName, |
| 368 | + }, |
| 369 | +} |
| 370 | +``` |
| 371 | + |
| 372 | +Similarly, in the PodVolumeRestore controller: |
| 373 | + |
| 374 | +```go |
| 375 | +// In pkg/controller/pod_volume_restore_controller.go |
| 376 | +priorityClassName, _ := veleroutil.GetDataMoverPriorityClassName(ctx, namespace, kubeClient, configMapName) |
| 377 | + |
| 378 | +// Add priorityClassName to the pod spec |
| 379 | +pod := &corev1api.Pod{ |
| 380 | + // ... existing fields ... |
| 381 | + Spec: corev1api.PodSpec{ |
| 382 | + // ... existing fields ... |
| 383 | + PriorityClassName: priorityClassName, |
| 384 | + }, |
| 385 | +} |
| 386 | +``` |
| 387 | + |
| 388 | +This ensures that all pods created by Velero (data movers, PVB, and PVR) use a consistent approach for priority class name configuration. |
| 389 | + |
| 390 | +## Open Issues |
| 391 | + |
| 392 | +None. |
0 commit comments