Skip to content

Commit c6202fd

Browse files
committed
PriorityClass Support Design Proposal
Design for #8869 Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
1 parent d5a2e7e commit c6202fd

1 file changed

Lines changed: 392 additions & 0 deletions

File tree

Lines changed: 392 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,392 @@
1+
# PriorityClass Support Design Proposal
2+
3+
## Abstract
4+
This design document outlines the implementation of priority class name support for Velero components, including the Velero server deployment, node agent daemonset, and maintenance jobs. This feature allows users to specify a priority class name for Velero components, which can be used to influence the scheduling and eviction behavior of these components.
5+
6+
## Background
7+
Kubernetes allows users to define priority classes, which can be used to influence the scheduling and eviction behavior of pods. Priority classes are defined as cluster-wide resources, and pods can reference them by name. When a pod is created, the priority admission controller uses the priority class name to populate the priority value for the pod. The scheduler then uses this priority value to determine the order in which pods are scheduled.
8+
9+
Currently, Velero does not provide a way for users to specify a priority class name for its components. This can be problematic in clusters where resource contention is high, as Velero components may be evicted or not scheduled in a timely manner, potentially impacting backup and restore operations.
10+
11+
## Goals
12+
- Add support for specifying priority class names for Velero components
13+
- Update the Velero CLI to accept priority class name parameters for different components
14+
- Update the Velero deployment, node agent daemonset, maintenance jobs, and data mover pods to use the specified priority class names
15+
16+
## Non Goals
17+
- Creating or managing priority classes
18+
- Automatically determining the appropriate priority class for Velero components
19+
20+
## High-Level Design
21+
The implementation will add a new field to the Velero options struct to store the priority class name for the server component. The Velero CLI will be updated to accept a new flag for the server deployment. For all other components (node agent daemonset, data mover pods, and maintenance jobs), priority class names will be configured through existing ConfigMap mechanisms (`node-agent-configmap` and `repo-maintenance-job-configmap`). The Velero deployment, node agent daemonset, maintenance jobs, and data mover pods will be updated to use their respective priority class names.
22+
23+
## Detailed Design
24+
25+
### CLI Changes
26+
New flags will be added to the `velero install` command to specify priority class names for different components:
27+
28+
```go
29+
flags.StringVar(
30+
&o.ServerPriorityClassName,
31+
"server-priority-class-name",
32+
o.ServerPriorityClassName,
33+
"Priority class name for the Velero server deployment. Optional.",
34+
)
35+
```
36+
37+
Note: Priority class names for node agent daemonset, data mover pods, and maintenance jobs will be configured through their respective ConfigMaps (`--node-agent-configmap` and `--repo-maintenance-job-configmap` flags).
38+
39+
### Velero Options Changes
40+
The `VeleroOptions` struct in `pkg/install/resources.go` will be updated to include a new field for the server priority class name:
41+
42+
```go
43+
type VeleroOptions struct {
44+
// ... existing fields ...
45+
ServerPriorityClassName string
46+
}
47+
```
48+
49+
### Deployment Changes
50+
The `podTemplateConfig` struct in `pkg/install/deployment.go` will be updated to include a new field for the priority class name:
51+
52+
```go
53+
type podTemplateConfig struct {
54+
// ... existing fields ...
55+
priorityClassName string
56+
}
57+
```
58+
59+
A new function, `WithPriorityClassName`, will be added to set this field:
60+
61+
```go
62+
func WithPriorityClassName(priorityClassName string) podTemplateOption {
63+
return func(c *podTemplateConfig) {
64+
c.priorityClassName = priorityClassName
65+
}
66+
}
67+
```
68+
69+
The `Deployment` function will be updated to use the priority class name:
70+
71+
```go
72+
deployment := &appsv1api.Deployment{
73+
// ... existing fields ...
74+
Spec: appsv1api.DeploymentSpec{
75+
// ... existing fields ...
76+
Template: corev1api.PodTemplateSpec{
77+
// ... existing fields ...
78+
Spec: corev1api.PodSpec{
79+
// ... existing fields ...
80+
PriorityClassName: c.priorityClassName,
81+
},
82+
},
83+
},
84+
}
85+
```
86+
87+
### DaemonSet Changes
88+
The `DaemonSet` function will be updated to retrieve and use the priority class name from the node-agent-configmap:
89+
90+
```go
91+
// Get priority class from node-agent-configmap if it exists
92+
priorityClassName := ""
93+
if nodeAgentConfig != nil && nodeAgentConfig.PriorityClassName != "" {
94+
priorityClassName = nodeAgentConfig.PriorityClassName
95+
}
96+
97+
daemonSet := &appsv1api.DaemonSet{
98+
// ... existing fields ...
99+
Spec: appsv1api.DaemonSetSpec{
100+
// ... existing fields ...
101+
Template: corev1api.PodTemplateSpec{
102+
// ... existing fields ...
103+
Spec: corev1api.PodSpec{
104+
// ... existing fields ...
105+
PriorityClassName: priorityClassName,
106+
},
107+
},
108+
},
109+
}
110+
```
111+
112+
### Maintenance Job Changes
113+
The `JobConfigs` struct in `pkg/repository/maintenance/maintenance.go` will be updated to include a field for the priority class name:
114+
115+
```go
116+
type JobConfigs struct {
117+
// LoadAffinities is the config for repository maintenance job load affinity.
118+
LoadAffinities []*kube.LoadAffinity `json:"loadAffinity,omitempty"`
119+
120+
// PodResources is the config for the CPU and memory resources setting.
121+
PodResources *kube.PodResources `json:"podResources,omitempty"`
122+
123+
// PriorityClassName is the priority class name for the maintenance job pod
124+
PriorityClassName string `json:"priorityClassName,omitempty"`
125+
}
126+
```
127+
128+
The `buildJob` function will be updated to use the priority class name from the job configuration:
129+
130+
```go
131+
func buildJob(cli client.Client, ctx context.Context, repo *velerov1api.BackupRepository, bslName string, config *JobConfigs,
132+
podResources kube.PodResources, logLevel logrus.Level, logFormat *logging.FormatFlag) (*batchv1.Job, error) {
133+
// ... existing code ...
134+
135+
// Use the priority class name from the job configuration if available
136+
priorityClassName := ""
137+
if config != nil && config.PriorityClassName != "" {
138+
priorityClassName = config.PriorityClassName
139+
}
140+
141+
// ... existing code ...
142+
143+
job := &batchv1.Job{
144+
// ... existing fields ...
145+
Spec: batchv1.JobSpec{
146+
// ... existing fields ...
147+
Template: corev1api.PodTemplateSpec{
148+
// ... existing fields ...
149+
Spec: corev1api.PodSpec{
150+
// ... existing fields ...
151+
PriorityClassName: priorityClassName,
152+
},
153+
},
154+
},
155+
}
156+
157+
// ... existing code ...
158+
}
159+
```
160+
161+
Users will be able to configure the priority class name for maintenance jobs by creating the repository maintenance job ConfigMap before installation. For example:
162+
163+
```bash
164+
# Create the ConfigMap before running velero install
165+
cat <<EOF | kubectl create configmap repo-maintenance-job-config -n velero --from-file=config.json=/dev/stdin
166+
{
167+
"global": {
168+
"priorityClassName": "low-priority",
169+
"podResources": {
170+
"cpuRequest": "100m",
171+
"memoryRequest": "128Mi"
172+
}
173+
},
174+
"namespace1-default-kopia": {
175+
"priorityClassName": "medium-priority"
176+
}
177+
}
178+
EOF
179+
180+
# Then install Velero referencing this ConfigMap
181+
velero install --provider aws \
182+
--repo-maintenance-job-configmap repo-maintenance-job-config \
183+
# ... other flags
184+
```
185+
186+
The ConfigMap can be updated after installation to change priority classes for future maintenance jobs.
187+
188+
### Node Agent ConfigMap Changes
189+
We'll update the `Configs` struct in `pkg/nodeagent/node_agent.go` to include a field for the priority class name in the node-agent-configmap:
190+
191+
```go
192+
type Configs struct {
193+
// ... existing fields ...
194+
195+
// PriorityClassName is the priority class name for both the node agent daemonset
196+
// and the data mover pods it creates
197+
PriorityClassName string `json:"priorityClassName,omitempty"`
198+
}
199+
```
200+
201+
This will allow users to configure the priority class name for both the node agent daemonset and data mover pods through a single node-agent-configmap. For example:
202+
203+
```bash
204+
# Create the ConfigMap before running velero install
205+
cat <<EOF | kubectl create configmap node-agent-config -n velero --from-file=config.json=/dev/stdin
206+
{
207+
"priorityClassName": "low-priority",
208+
"loadAffinity": [
209+
{
210+
"nodeSelector": {
211+
"matchLabels": {
212+
"node-role.kubernetes.io/worker": "true"
213+
}
214+
}
215+
}
216+
]
217+
}
218+
EOF
219+
220+
# Then install Velero referencing this ConfigMap
221+
velero install --provider aws \
222+
--node-agent-configmap node-agent-config \
223+
--use-node-agent \
224+
# ... other flags
225+
```
226+
227+
The `createBackupPod` function in `pkg/exposer/csi_snapshot.go` will be updated to accept and use the priority class name:
228+
229+
```go
230+
func (e *csiSnapshotExposer) createBackupPod(
231+
ctx context.Context,
232+
ownerObject corev1api.ObjectReference,
233+
backupPVC *corev1api.PersistentVolumeClaim,
234+
operationTimeout time.Duration,
235+
label map[string]string,
236+
annotation map[string]string,
237+
affinity *kube.LoadAffinity,
238+
resources corev1api.ResourceRequirements,
239+
backupPVCReadOnly bool,
240+
spcNoRelabeling bool,
241+
nodeOS string,
242+
priorityClassName string, // New parameter
243+
) (*corev1api.Pod, error) {
244+
// ... existing code ...
245+
246+
pod := &corev1api.Pod{
247+
// ... existing fields ...
248+
Spec: corev1api.PodSpec{
249+
// ... existing fields ...
250+
PriorityClassName: priorityClassName,
251+
// ... existing fields ...
252+
},
253+
}
254+
255+
// ... existing code ...
256+
}
257+
```
258+
259+
The call to `createBackupPod` in the `Expose` method will be updated to pass the priority class name retrieved from the node-agent-configmap:
260+
261+
```go
262+
priorityClassName, _ := veleroutil.GetDataMoverPriorityClassName(ctx, namespace, kubeClient, configMapName)
263+
backupPod, err := e.createBackupPod(
264+
ctx,
265+
ownerObject,
266+
backupPVC,
267+
csiExposeParam.OperationTimeout,
268+
csiExposeParam.HostingPodLabels,
269+
csiExposeParam.HostingPodAnnotations,
270+
csiExposeParam.Affinity,
271+
csiExposeParam.Resources,
272+
backupPVCReadOnly,
273+
spcNoRelabeling,
274+
csiExposeParam.NodeOS,
275+
priorityClassName, // Priority class name from node-agent-configmap
276+
)
277+
```
278+
279+
A new function, `GetDataMoverPriorityClassName`, will be added to the `veleroutil` package to retrieve the priority class name for data mover pods:
280+
281+
```go
282+
func GetDataMoverPriorityClassName(ctx context.Context, namespace string, kubeClient kubernetes.Interface, configName string) (string, error) {
283+
// Get from node-agent-configmap
284+
configs, err := nodeagent.GetConfigs(ctx, namespace, kubeClient, configName)
285+
if err == nil && configs != nil && configs.PriorityClassName != "" {
286+
return configs.PriorityClassName, nil
287+
}
288+
289+
// Return empty string if not found in configmap
290+
return "", nil
291+
}
292+
```
293+
294+
This function will get the priority class name from the node-agent-configmap. If it's not found, it will return an empty string.
295+
296+
## Alternatives Considered
297+
298+
1. **Using a single flag for all components**: We could have used a single flag for all components, but this would not allow for different priority classes for different components. Since maintenance jobs and data movers typically require lower priority than the Velero server, separate flags provide more flexibility.
299+
300+
2. **Using a configuration file**: We could have added support for specifying the priority class names in a configuration file. However, this would have required additional changes to the Velero CLI and would have been more complex to implement.
301+
302+
3. **Inheriting priority class from parent components**: We initially considered having maintenance jobs inherit their priority class from the Velero server, and data movers inherit from the node agent. However, this approach doesn't allow for the appropriate prioritization of different components based on their importance and resource requirements.
303+
304+
## Security Considerations
305+
306+
There are no security considerations for this feature.
307+
308+
## Compatibility
309+
310+
This feature is compatible with all Kubernetes versions that support priority classes. The PodPriority feature became stable in Kubernetes 1.14. For more information, see the [Kubernetes documentation on Pod Priority and Preemption](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/).
311+
312+
## Implementation
313+
314+
The implementation will involve the following steps:
315+
316+
1. Add the priority class name field for server to the `VeleroOptions` struct
317+
2. Add the priority class name field to the `podTemplateConfig` struct
318+
3. Add the `WithPriorityClassName` function for the server deployment
319+
4. Update the `Deployment` function to use the server priority class name
320+
5. Update the `DaemonSet` function to retrieve and use the priority class name from node-agent-configmap
321+
6. Update the `JobConfigs` struct to include `PriorityClassName` field
322+
7. Update the `buildJob` function in maintenance job to use the priority class name from JobConfigs
323+
8. Update the `Configs` struct in node agent to include `PriorityClassName` field for both daemonset and data mover pods
324+
9. Update the data mover pod creation to use the priority class name from node-agent-configmap
325+
10. Add the priority class name flag for server to the `velero install` command
326+
11. Update documentation to explain how to configure priority classes via ConfigMaps
327+
328+
Note: Only the server deployment will have a CLI flag for priority class. All other components (node agent daemonset, data mover pods, and maintenance jobs) will use their respective ConfigMaps for priority class configuration.
329+
330+
This approach ensures that different Velero components can use different priority class names based on their importance and resource requirements:
331+
332+
1. The Velero server deployment can use a higher priority class to ensure it continues running even under resource pressure.
333+
2. The node agent daemonset can use a medium priority class.
334+
3. Maintenance jobs can use a lower priority class since they should not run when resources are limited.
335+
4. Data mover pods can use a lower priority class since they should not run when resources are limited.
336+
337+
### Implementation Considerations
338+
339+
Priority class names for all components except the server deployment are configured through ConfigMaps that should be created before Velero installation but can be updated afterwards. This design leverages existing ConfigMap mechanisms:
340+
341+
1. **Node Agent DaemonSet and Data Mover Pods**: Will use the node-agent-configmap (specified via the `--node-agent-configmap` flag). This single ConfigMap controls priority class for both the node agent daemonset itself and all data mover pods (including PVB and PVR) it creates.
342+
343+
2. **Maintenance Jobs**: Will use the repository maintenance job ConfigMap (specified via the `--repo-maintenance-job-configmap` flag). Users should create this ConfigMap before running `velero install` with the desired priority class configuration. The ConfigMap can be updated after installation to change priority classes for future maintenance jobs.
344+
345+
This approach has several advantages:
346+
347+
- Leverages existing configuration mechanisms, minimizing new CLI flags
348+
- Provides a single point of configuration for related components (node agent and its pods)
349+
- Allows dynamic configuration updates without requiring Velero reinstallation
350+
- Provides flexibility to set different priority classes for different repositories
351+
- Maintains backward compatibility with existing installations
352+
- Enables administrators to set up priority classes during initial deployment
353+
354+
The priority class name for data mover pods will be determined by checking the node-agent-configmap. This approach provides a centralized way to configure priority class names for all data mover pods. The same approach will be used for PVB (PodVolumeBackup) and PVR (PodVolumeRestore) pods, which will also retrieve their priority class name from the node-agent-configmap.
355+
356+
For PVB and PVR pods specifically, the controllers will need to be updated to retrieve the priority class name from the node-agent-configmap and pass it to the pod creation functions. For example, in the PodVolumeBackup controller:
357+
358+
```go
359+
// In pkg/controller/pod_volume_backup_controller.go
360+
priorityClassName, _ := veleroutil.GetDataMoverPriorityClassName(ctx, namespace, kubeClient, configMapName)
361+
362+
// Add priorityClassName to the pod spec
363+
pod := &corev1api.Pod{
364+
// ... existing fields ...
365+
Spec: corev1api.PodSpec{
366+
// ... existing fields ...
367+
PriorityClassName: priorityClassName,
368+
},
369+
}
370+
```
371+
372+
Similarly, in the PodVolumeRestore controller:
373+
374+
```go
375+
// In pkg/controller/pod_volume_restore_controller.go
376+
priorityClassName, _ := veleroutil.GetDataMoverPriorityClassName(ctx, namespace, kubeClient, configMapName)
377+
378+
// Add priorityClassName to the pod spec
379+
pod := &corev1api.Pod{
380+
// ... existing fields ...
381+
Spec: corev1api.PodSpec{
382+
// ... existing fields ...
383+
PriorityClassName: priorityClassName,
384+
},
385+
}
386+
```
387+
388+
This ensures that all pods created by Velero (data movers, PVB, and PVR) use a consistent approach for priority class name configuration.
389+
390+
## Open Issues
391+
392+
None.

0 commit comments

Comments
 (0)