You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A new function, `GetDataMoverPriorityClassName`, will be added to the `veleroutil` package to retrieve the priority class name for data mover pods:
280
+
A new function, `GetDataMoverPriorityClassName`, will be added to the `pkg/util/kube` package (in the same file as `ValidatePriorityClass`) to retrieve the priority class name for data mover pods:
281
281
282
282
```go
283
+
// In pkg/util/kube/priority_class.go
284
+
285
+
// GetDataMoverPriorityClassName retrieves the priority class name for data mover pods from the node-agent-configmap
This function will get the priority class name from the node-agent-configmap. If it's not found, it will return an empty string.
296
299
300
+
### Validation and Logging
301
+
302
+
To improve observability and help with troubleshooting, the implementation will include:
303
+
304
+
1.**Optional Priority Class Validation**: A helper function to check if a priority class exists in the cluster. This function will be added to the `pkg/util/kube` package alongside other Kubernetes utility functions:
305
+
306
+
```go
307
+
// In pkg/util/kube/priority_class.go
308
+
309
+
// ValidatePriorityClass checks if the specified priority class exists in the cluster
310
+
// Returns nil if the priority class exists or if priorityClassName is empty
311
+
// Returns a warning (not an error) if the priority class doesn't exist
logger.Warnf("Priority class %q not found in cluster. Pod creation may fail if the priority class doesn't exist when pods are scheduled.", priorityClassName)
321
+
} else {
322
+
logger.WithError(err).Warnf("Failed to validate priority class %q", priorityClassName)
323
+
}
324
+
} else {
325
+
logger.Infof("Validated priority class %q exists in cluster", priorityClassName)
326
+
}
327
+
}
328
+
```
329
+
330
+
2.**Debug Logging**: Add debug logs when priority classes are applied:
331
+
332
+
```go
333
+
// In deployment creation
334
+
if c.priorityClassName != "" {
335
+
logger.Debugf("Setting priority class %q for Velero server deployment", c.priorityClassName)
336
+
}
337
+
338
+
// In daemonset creation
339
+
if c.priorityClassName != "" {
340
+
logger.Debugf("Setting priority class %q for node agent daemonset", c.priorityClassName)
341
+
}
342
+
343
+
// In maintenance job creation
344
+
if priorityClassName != "" {
345
+
logger.Debugf("Setting priority class %q for maintenance job %s", priorityClassName, job.Name)
346
+
}
347
+
348
+
// In data mover pod creation
349
+
if priorityClassName != "" {
350
+
logger.Debugf("Setting priority class %q for data mover pod %s", priorityClassName, pod.Name)
351
+
}
352
+
```
353
+
354
+
These validation and logging features will help administrators:
355
+
- Identify configuration issues early (validation warnings)
356
+
- Troubleshoot priority class application issues
357
+
- Verify that priority classes are being applied as expected
358
+
359
+
The `ValidatePriorityClass` function should be called at the following points:
360
+
361
+
1.**During `velero install`**: Validate the priority classes specified via CLI flags:
362
+
- After parsing `--server-priority-class-name` flag
363
+
- After parsing `--node-agent-priority-class-name` flag
364
+
365
+
2.**When reading from ConfigMaps**: Validate priority classes when loading configurations:
366
+
- In `GetDataMoverPriorityClassName` when reading from node-agent-configmap
367
+
- In maintenance job controller when reading from repo-maintenance-job-configmap
368
+
369
+
3.**During pod/job creation** (optional, for runtime validation):
370
+
- Before creating data mover pods (PVB/PVR/CSI snapshot data movement)
Note: Since validation only logs warnings (not errors), it won't block operations if a priority class doesn't exist. This allows for scenarios where priority classes might be created after Velero installation.
388
+
297
389
## Alternatives Considered
298
390
299
391
1.**Using a single flag for all components**: We could have used a single flag for all components, but this would not allow for different priority classes for different components. Since maintenance jobs and data movers typically require lower priority than the Velero server, separate flags provide more flexibility.
@@ -323,8 +415,24 @@ The implementation will involve the following steps:
323
415
7. Update the `buildJob` function in maintenance job to use the priority class name from JobConfigs (global config only)
324
416
8. Update the `Configs` struct in node agent to include `PriorityClassName` field for data mover pods
325
417
9. Update the data mover pod creation to use the priority class name from node-agent-configmap
326
-
10. Add the priority class name flags for server and node agent to the `velero install` command
327
-
11. Update documentation to explain how to configure priority classes
418
+
10. Update the PodVolumeBackup controller to retrieve and apply priority class name from node-agent-configmap
419
+
11. Update the PodVolumeRestore controller to retrieve and apply priority class name from node-agent-configmap
420
+
12. Add the `GetDataMoverPriorityClassName` utility function to retrieve priority class from configmap
421
+
13. Add the priority class name flags for server and node agent to the `velero install` command
422
+
14. Add unit tests for:
423
+
-`WithPriorityClassName` function
424
+
-`GetDataMoverPriorityClassName` function
425
+
- Priority class application in deployment, daemonset, and job specs
426
+
15. Add integration tests to verify:
427
+
- Priority class is correctly applied to all component pods
428
+
- ConfigMap updates are reflected in new pods
429
+
- Empty/missing priority class names are handled gracefully
430
+
16. Update user documentation to include:
431
+
- How to configure priority classes for each component
432
+
- Examples of creating ConfigMaps before installation
433
+
- Expected priority class hierarchy recommendations
434
+
- Troubleshooting guide for priority class issues
435
+
17. Update CLI documentation for new flags (`--server-priority-class-name` and `--node-agent-priority-class-name`)
328
436
329
437
Note: The server deployment and node agent daemonset will have CLI flags for priority class. Data mover pods and maintenance jobs will use their respective ConfigMaps for priority class configuration.
330
438
@@ -347,6 +455,84 @@ Priority class names are configured through different mechanisms:
347
455
348
456
4.**Maintenance Jobs**: Will use the repository maintenance job ConfigMap (specified via the `--repo-maintenance-job-configmap` flag). Users should create this ConfigMap before running `velero install` with the desired priority class configuration. The ConfigMap can be updated after installation to change priority classes for future maintenance jobs. While the ConfigMap structure supports per-repository configuration for resources and affinity, priority class is intentionally only read from the global configuration to ensure all maintenance jobs have the same priority.
349
457
458
+
#### ConfigMap Pre-Creation Guide
459
+
460
+
For components that use ConfigMaps for priority class configuration, the ConfigMaps must be created before running `velero install`. Here's the recommended workflow:
461
+
462
+
```bash
463
+
# Step 1: Create priority classes in your cluster (if not already existing)
464
+
kubectl apply -f - <<EOF
465
+
apiVersion: scheduling.k8s.io/v1
466
+
kind: PriorityClass
467
+
metadata:
468
+
name: velero-critical
469
+
value: 100
470
+
globalDefault: false
471
+
description: "Critical priority for Velero server"
472
+
---
473
+
apiVersion: scheduling.k8s.io/v1
474
+
kind: PriorityClass
475
+
metadata:
476
+
name: velero-standard
477
+
value: 50
478
+
globalDefault: false
479
+
description: "Standard priority for Velero node agent"
480
+
---
481
+
apiVersion: scheduling.k8s.io/v1
482
+
kind: PriorityClass
483
+
metadata:
484
+
name: velero-low
485
+
value: 10
486
+
globalDefault: false
487
+
description: "Low priority for Velero data movers and maintenance jobs"
488
+
EOF
489
+
490
+
# Step 2: Create the namespace
491
+
kubectl create namespace velero
492
+
493
+
# Step 3: Create ConfigMaps for data movers and maintenance jobs
When configuring priority classes for Velero components, consider the following hierarchy based on component criticality:
521
+
522
+
1.**Velero Server (Highest Priority)**:
523
+
- Example: `velero-critical` with value 100
524
+
- Rationale: The server must remain running to coordinate backup/restore operations
525
+
526
+
2.**Node Agent DaemonSet (Medium Priority)**:
527
+
- Example: `velero-standard` with value 50
528
+
- Rationale: Node agents need to be available on nodes but are less critical than the server
529
+
530
+
3.**Data Mover Pods & Maintenance Jobs (Lower Priority)**:
531
+
- Example: `velero-low` with value 10
532
+
- Rationale: These are temporary workloads that can be delayed during resource contention
533
+
534
+
This hierarchy ensures that core Velero components remain operational even under resource pressure, while allowing less critical workloads to be preempted if necessary.
535
+
350
536
This approach has several advantages:
351
537
352
538
- Leverages existing configuration mechanisms, minimizing new CLI flags
@@ -362,7 +548,7 @@ For PVB and PVR pods specifically, the controllers will need to be updated to re
362
548
363
549
```go
364
550
// In pkg/controller/pod_volume_backup_controller.go
With the introduction of VGDP micro-services (as described in the VGDP micro-service design), data mover pods are created as dedicated pods for volume snapshot data movement. These pods will also inherit the priority class configuration from the node-agent-configmap. Since VGDP-MS pods (backupPod/restorePod) inherit their configurations from the node-agent, they will automatically use the priority class name specified in the node-agent-configmap.
396
582
397
-
This ensures that all pods created by Velero for data movement operations (including VGDP micro-service pods, PVB, and PVR) use a consistent approach for priority class name configuration through the node-agent-configmap.
583
+
This ensures that all pods created by Velero for data movement operations (CSI snapshot data movement, PVB, and PVR) use a consistent approach for priority class name configuration through the node-agent-configmap.
0 commit comments