You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add priority class support for Velero server and node-agent
- Add --server-priority-class-name and --node-agent-priority-class-name flags to velero install command
- Configure data mover pods (PVB/PVR/DataUpload/DataDownload) to use priority class from node-agent-configmap
- Update e2e tests to include PriorityClass label for testing
- Move priority class design document to Implemented folder
- Add changelog entry for velero-io#8883
🤖 Generated with [Claude Code](https://claude.ai/code)
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Add support for ConfigMap options in Velero server installation
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Implement centralized ConfigMap watching for node-agent controllers
This change introduces a centralized ConfigMap watching mechanism to eliminate
redundant ConfigMap watchers across controllers. Previously, each controller
(PodVolumeBackup, PodVolumeRestore, DataUpload, DataDownload) independently
watched the same ConfigMap, leading to inefficiency.
Key changes:
- Add ConfigProvider interface for centralized configuration management
- Implement nodeAgentConfigProvider with single ConfigMap watcher
- Update all controllers to use ConfigProvider instead of direct watching
- Add comprehensive unit tests for ConfigProvider implementation
- Add enhanced MockConfigProvider for testing
- Add E2E test for validating centralized watching behavior
- Remove redundant ConfigMap watching code from controllers
Benefits:
- Single source of truth for ConfigMap watching
- Reduced resource usage with one watcher instead of four
- Consistent configuration updates across all controllers
- Improved testability with centralized mocking
- Better separation of concerns
🤖 Generated with [Claude Code](https://claude.ai/code)
Fix ConfigProvider tests for empty ConfigMap name and informer behavior
- Handle empty ConfigMap name gracefully in NewNodeAgentConfigProvider
- Skip informer start when ConfigMap name is empty
- Update test expectations to handle informer Add events on startup
- Ensure tests pass with correct behavior for edge cases
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
This design document outlines the implementation of priority class name support for Velero components, including the Velero server deployment, node agent daemonset, and maintenance jobs. This feature allows users to specify a priority class name for Velero components, which can be used to influence the scheduling and eviction behavior of these components.
5
6
6
7
## Background
8
+
7
9
Kubernetes allows users to define priority classes, which can be used to influence the scheduling and eviction behavior of pods. Priority classes are defined as cluster-wide resources, and pods can reference them by name. When a pod is created, the priority admission controller uses the priority class name to populate the priority value for the pod. The scheduler then uses this priority value to determine the order in which pods are scheduled.
8
10
9
11
Currently, Velero does not provide a way for users to specify a priority class name for its components. This can be problematic in clusters where resource contention is high, as Velero components may be evicted or not scheduled in a timely manner, potentially impacting backup and restore operations.
10
12
11
13
## Goals
14
+
12
15
- Add support for specifying priority class names for Velero components
13
16
- Update the Velero CLI to accept priority class name parameters for different components
14
17
- Update the Velero deployment, node agent daemonset, maintenance jobs, and data mover pods to use the specified priority class names
15
18
16
19
## Non Goals
20
+
17
21
- Creating or managing priority classes
18
22
- Automatically determining the appropriate priority class for Velero components
19
23
20
24
## High-Level Design
25
+
21
26
The implementation will add new fields to the Velero options struct to store the priority class names for the server deployment and node agent daemonset. The Velero CLI will be updated to accept new flags for these components. For data mover pods and maintenance jobs, priority class names will be configured through existing ConfigMap mechanisms (`node-agent-configmap` for data movers and `repo-maintenance-job-configmap` for maintenance jobs). The Velero deployment, node agent daemonset, maintenance jobs, and data mover pods will be updated to use their respective priority class names.
22
27
23
28
## Detailed Design
24
29
25
30
### CLI Changes
31
+
26
32
New flags will be added to the `velero install` command to specify priority class names for different components:
27
33
28
34
```go
@@ -44,6 +50,7 @@ flags.StringVar(
44
50
Note: Priority class names for data mover pods and maintenance jobs will be configured through their respective ConfigMaps (`--node-agent-configmap` for data movers and `--repo-maintenance-job-configmap` for maintenance jobs).
45
51
46
52
### Velero Options Changes
53
+
47
54
The `VeleroOptions` struct in `pkg/install/resources.go` will be updated to include new fields for priority class names:
48
55
49
56
```go
@@ -55,6 +62,7 @@ type VeleroOptions struct {
55
62
```
56
63
57
64
### Deployment Changes
65
+
58
66
The `podTemplateConfig` struct in `pkg/install/deployment.go` will be updated to include a new field for the priority class name:
The ConfigMap can be updated after installation to change the priority class for future maintenance jobs. Note that only the "global" configuration is used for priority class - all maintenance jobs will use the same priority class regardless of which repository they are maintaining.
188
198
189
199
### Node Agent ConfigMap Changes
200
+
190
201
We'll update the `Configs` struct in `pkg/nodeagent/node_agent.go` to include a field for the priority class name in the node-agent-configmap:
191
202
192
203
```go
@@ -352,6 +363,7 @@ if priorityClassName != "" {
352
363
```
353
364
354
365
These validation and logging features will help administrators:
366
+
355
367
- Identify configuration issues early (validation warnings)
356
368
- Troubleshoot priority class application issues
357
369
- Verify that priority classes are being applied as expected
@@ -371,6 +383,7 @@ The `ValidatePriorityClass` function should be called at the following points:
371
383
- Before creating maintenance jobs
372
384
373
385
Example usage:
386
+
374
387
```go
375
388
// During velero install
376
389
if o.ServerPriorityClassName != "" {
@@ -519,10 +532,10 @@ velero install \
519
532
520
533
When configuring priority classes for Velero components, consider the following hierarchy based on component criticality:
521
534
522
-
1.**Velero Server (Highest Priority)**:
535
+
1.**Velero Server (Highest Priority)**:
523
536
- Example: `velero-critical` with value 100
524
537
- Rationale: The server must remain running to coordinate backup/restore operations
525
-
538
+
526
539
2.**Node Agent DaemonSet (Medium Priority)**:
527
540
- Example: `velero-standard` with value 50
528
541
- Rationale: Node agents need to be available on nodes but are less critical than the server
@@ -544,35 +557,64 @@ This approach has several advantages:
544
557
545
558
The priority class name for data mover pods will be determined by checking the node-agent-configmap. This approach provides a centralized way to configure priority class names for all data mover pods. The same approach will be used for PVB (PodVolumeBackup) and PVR (PodVolumeRestore) pods, which will also retrieve their priority class name from the node-agent-configmap.
546
559
547
-
For PVB and PVR pods specifically, the controllers will need to be updated to retrieve the priority class name from the node-agent-configmap and pass it to the pod creation functions. For example, in the PodVolumeBackup controller:
560
+
For PVB and PVR pods specifically, the implementation follows this approach:
561
+
562
+
1.**Controller Initialization**: Both PodVolumeBackup and PodVolumeRestore controllers are updated to accept nodeAgentConfigMap and namespace parameters. During initialization, they retrieve the priority class name from the node-agent-configmap:
548
563
549
564
```go
550
-
// In pkg/controller/pod_volume_backup_controller.go
e.log.Debugf("Setting priority class %q for data mover pod %s", priorityClassName, hostingPodName)
599
+
}
600
+
601
+
pod:= &corev1api.Pod{
602
+
// ... existing fields ...
603
+
Spec: corev1api.PodSpec{
604
+
// ... existing fields ...
605
+
PriorityClassName: priorityClassName,
606
+
},
607
+
}
608
+
}
609
+
```
568
610
569
-
// Add priorityClassName to the pod spec
570
-
pod:= &corev1api.Pod{
611
+
4.**Controller Setup**: Both controllers' setupExposeParam functions are updated to include the priority class:
612
+
613
+
```go
614
+
return exposer.PodVolumeExposeParam{
571
615
// ... existing fields ...
572
-
Spec: corev1api.PodSpec{
573
-
// ... existing fields ...
574
-
PriorityClassName: priorityClassName,
575
-
},
616
+
// Priority class name for the data mover pod, retrieved from node-agent-configmap
617
+
PriorityClassName: r.dataMovePriorityClass,
576
618
}
577
619
```
578
620
@@ -582,6 +624,102 @@ With the introduction of VGDP micro-services (as described in the VGDP micro-ser
582
624
583
625
This ensures that all pods created by Velero for data movement operations (CSI snapshot data movement, PVB, and PVR) use a consistent approach for priority class name configuration through the node-agent-configmap.
584
626
627
+
## ConfigMap Update Strategy
628
+
629
+
Different Velero controllers handle ConfigMap updates using different strategies based on their operational patterns.
630
+
631
+
### Centralized ConfigMap Watching
632
+
633
+
The node-agent server reads and parses the ConfigMap during initialization and passes configurations (like `podResources`, `loadAffinity`, and `priorityClassName`) directly to controllers as parameters.
- Maintenance jobs are created infrequently (every 7 days for Restic, 1 hour for Kopia)
690
+
- ConfigMap reads during job creation are acceptable performance-wise
691
+
- Each job gets the most current configuration without caching complexity
692
+
- Simpler implementation with no cache management or synchronization
693
+
694
+
### Implementation Approach
695
+
696
+
1.**Data Mover Controllers**: Receive configuration from centralized provider
697
+
2.**Maintenance Job Controller**: Read fresh configuration from repo-maintenance-job-configmap at job creation time
698
+
3. ConfigMap changes are reflected in newly created pods/jobs
699
+
4. Use centralized provider for efficiency and consistency across data mover controllers
700
+
701
+
### How Exposers Receive Configuration Updates
702
+
703
+
CSI Snapshot Exposer and Generic Restore Exposer do not directly watch or read ConfigMaps. Instead, they receive configuration through their parent controllers following this flow:
704
+
705
+
1.**ConfigMap Update Detection**: When the node-agent-configmap is updated, the centralized ConfigProvider detects the change through its Kubernetes informer.
706
+
707
+
2.**Controller Notification**: The ConfigProvider notifies all registered handlers asynchronously. Data mover controllers (DataUploadReconciler, DataDownloadReconciler, PodVolumeBackupReconciler, PodVolumeRestoreReconciler) have registered handlers that update their internal `dataMovePriorityClass` field.
708
+
709
+
3.**Configuration Propagation**: On the next reconciliation of a DataUpload/DataDownload/PodVolumeBackup/PodVolumeRestore resource:
710
+
- The controller calls `setupExposeParam()` which includes the current `dataMovePriorityClass` value
711
+
- For CSI operations: `CSISnapshotExposeParam.PriorityClassName` is set
712
+
- For generic restore: `GenericRestoreExposeParam.PriorityClassName` is set
713
+
- The controller passes these parameters to the exposer's `Expose()` method
714
+
715
+
4.**Pod Creation**: The exposer creates new pods with the updated priority class name. Existing pods retain their original priority class.
716
+
717
+
This design keeps exposers stateless and ensures:
718
+
- Exposers remain simple and focused on pod creation
719
+
- All configuration flows through controllers consistently
720
+
- No complex state synchronization between components
721
+
- Configuration updates are eventually consistent across all new pods
0 commit comments