Skip to content

Implement PriorityClass Support#8883

Merged
blackpiglet merged 2 commits into
velero-io:mainfrom
kaovilai:implement8869
Aug 6, 2025
Merged

Implement PriorityClass Support#8883
blackpiglet merged 2 commits into
velero-io:mainfrom
kaovilai:implement8869

Conversation

@kaovilai
Copy link
Copy Markdown
Collaborator

@kaovilai kaovilai commented Apr 25, 2025

Add priority class support for Velero components

Summary

This PR implements priority class support for Velero components, allowing users to configure Kubernetes priority classes for the Velero server, node-agent, data mover pods, and maintenance jobs. This ensures proper pod scheduling and preemption behavior in resource-constrained environments.

Changes

Core Implementation

  • Added --server-priority-class-name and --node-agent-priority-class-name flags to velero install command
  • Modified node-agent server to read priority class from ConfigMap once at startup (minimizing API calls)
  • Updated all data mover controllers to accept priority class as a string parameter
  • Added priority class validation with fallback behavior to prevent pod creation failures
  • Removed ConfigProvider infrastructure in favor of static configuration reading

Key Features

  • Validation with Fallback: ValidatePriorityClass now returns a boolean, allowing callers to handle missing priority classes gracefully
  • Single ConfigMap Read: Node-agent reads ConfigMap once at startup and passes configuration to controllers
  • Automatic Fallback: If a priority class doesn't exist, the system logs a warning and uses default scheduling

Configuration Flow

graph TB
    subgraph "Installation Time"
        CLI[velero install] --> |"--server-priority-class-name"| Server[Velero Server Deployment]
        CLI --> |"--node-agent-priority-class-name"| NodeAgent[Node Agent DaemonSet]
    end
    
    subgraph "ConfigMap Configuration"
        NAConfig[node-agent-configmap] --> |priorityClassName| DataMovers[Data Mover Pods]
        MJConfig[repo-maintenance-job-configmap] --> |global.priorityClassName| MaintJobs[Maintenance Jobs]
    end
    
    subgraph "Runtime Flow"
        NodeAgentServer[Node Agent Server Startup] --> |Read Once| NAConfig
        NodeAgentServer --> |Validate| ValidatePC{Priority Class Exists?}
        ValidatePC -->|Yes| PassToControllers[Pass to Controllers]
        ValidatePC -->|No| ClearPC[Clear Priority Class<br/>Log Warning]
        ClearPC --> PassToControllers
        PassToControllers --> PVBController[PVB Controller]
        PassToControllers --> PVRController[PVR Controller]
        PassToControllers --> DUController[DataUpload Controller]
        PassToControllers --> DDController[DataDownload Controller]
        
        PVBController --> PVBPods[PVB Pods]
        PVRController --> PVRPods[PVR Pods]
        DUController --> DUPods[DataUpload Pods]
        DDController --> DDPods[DataDownload Pods]
    end
    
    subgraph "Maintenance Jobs"
        MaintController[Maintenance Controller] --> |Read Fresh| MJConfig
        MaintController --> MaintJobs
    end
    
    style ValidatePC fill:#f9f,stroke:#333,stroke-width:2px
    style ClearPC fill:#ff9,stroke:#333,stroke-width:2px
Loading

Priority Class Hierarchy

Component Priority Configuration Method Purpose
Velero Server High (e.g., 100) CLI flag Must remain running to coordinate operations
Node Agent Medium (e.g., 50) CLI flag Needs to be available on nodes
Data Mover Pods Low (e.g., 10) ConfigMap Temporary workloads, can be delayed
Maintenance Jobs Low (e.g., 10) ConfigMap Background tasks, can be preempted

Example Usage

# Step 1: Create priority classes
kubectl apply -f - <<EOF
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: velero-critical
value: 100
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: velero-standard
value: 50
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: velero-low
value: 10
EOF

# Step 2: Create namespace
kubectl create namespace velero

# Step 3: Create ConfigMaps
kubectl create configmap node-agent-config -n velero --from-file=config.json=/dev/stdin <<EOF
{
    "priorityClassName": "velero-low"
}
EOF

kubectl create configmap repo-maintenance-job-config -n velero --from-file=config.json=/dev/stdin <<EOF
{
    "global": {
        "priorityClassName": "velero-low"
    }
}
EOF

# Step 4: Install Velero with priority classes
velero install \
    --provider aws \
    --server-priority-class-name velero-critical \
    --node-agent-priority-class-name velero-standard \
    --node-agent-configmap node-agent-config \
    --repo-maintenance-job-configmap repo-maintenance-job-config \
    --use-node-agent

Testing

  • Unit tests for ValidatePriorityClass and GetDataMoverPriorityClassName
  • Updated controller tests to work with new signatures
  • Verified fallback behavior when priority class doesn't exist

Documentation

  • Updated design document at design/Implemented/priority-class-name-support_design.md
  • Added examples and troubleshooting guide in design doc

Fixes #8869

Checklist

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 25, 2025

Codecov Report

❌ Patch coverage is 75.67568% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.72%. Comparing base (30ea894) to head (ae29030).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
pkg/cmd/cli/nodeagent/server.go 0.00% 17 Missing ⚠️
pkg/repository/maintenance/maintenance.go 61.90% 6 Missing and 2 partials ⚠️
pkg/util/kube/priority_class.go 66.66% 7 Missing ⚠️
pkg/exposer/pod_volume.go 50.00% 2 Missing and 1 partial ⚠️
pkg/controller/data_upload_controller.go 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8883      +/-   ##
==========================================
+ Coverage   59.30%   59.72%   +0.42%     
==========================================
  Files         380      381       +1     
  Lines       43784    43889     +105     
==========================================
+ Hits        25965    26213     +248     
+ Misses      16280    16130     -150     
- Partials     1539     1546       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kaovilai kaovilai force-pushed the implement8869 branch 10 times, most recently from 29af40a to 2f72032 Compare June 4, 2025 07:14
@reasonerjt reasonerjt requested a review from ywk253100 July 18, 2025 06:10
@kaovilai kaovilai force-pushed the implement8869 branch 4 times, most recently from 303ad7e to 5c9c01a Compare July 21, 2025 22:52
@kaovilai kaovilai force-pushed the implement8869 branch 6 times, most recently from 373fe4a to 9727a9e Compare July 23, 2025 13:08
@github-actions github-actions Bot added the Dependencies Pull requests that update a dependency file label Jul 23, 2025
kaovilai added a commit to kaovilai/velero that referenced this pull request Jul 25, 2025
- Add --server-priority-class-name and --node-agent-priority-class-name flags to velero install command
- Configure data mover pods (PVB/PVR/DataUpload/DataDownload) to use priority class from node-agent-configmap
- Update e2e tests to include PriorityClass label for testing
- Move priority class design document to Implemented folder
- Add changelog entry for velero-io#8883

🤖 Generated with [Claude Code](https://claude.ai/code)

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

Add support for ConfigMap options in Velero server installation

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

Implement centralized ConfigMap watching for node-agent controllers

This change introduces a centralized ConfigMap watching mechanism to eliminate
redundant ConfigMap watchers across controllers. Previously, each controller
(PodVolumeBackup, PodVolumeRestore, DataUpload, DataDownload) independently
watched the same ConfigMap, leading to inefficiency.

Key changes:
- Add ConfigProvider interface for centralized configuration management
- Implement nodeAgentConfigProvider with single ConfigMap watcher
- Update all controllers to use ConfigProvider instead of direct watching
- Add comprehensive unit tests for ConfigProvider implementation
- Add enhanced MockConfigProvider for testing
- Add E2E test for validating centralized watching behavior
- Remove redundant ConfigMap watching code from controllers

Benefits:
- Single source of truth for ConfigMap watching
- Reduced resource usage with one watcher instead of four
- Consistent configuration updates across all controllers
- Improved testability with centralized mocking
- Better separation of concerns

🤖 Generated with [Claude Code](https://claude.ai/code)

Fix ConfigProvider tests for empty ConfigMap name and informer behavior

- Handle empty ConfigMap name gracefully in NewNodeAgentConfigProvider
- Skip informer start when ConfigMap name is empty
- Update test expectations to handle informer Add events on startup
- Ensure tests pass with correct behavior for edge cases

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
kaovilai added a commit to kaovilai/velero that referenced this pull request Jul 25, 2025
- Add --server-priority-class-name and --node-agent-priority-class-name flags to velero install command
- Configure data mover pods (PVB/PVR/DataUpload/DataDownload) to use priority class from node-agent-configmap
- Update e2e tests to include PriorityClass label for testing
- Move priority class design document to Implemented folder
- Add changelog entry for velero-io#8883

🤖 Generated with [Claude Code](https://claude.ai/code)

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

Add support for ConfigMap options in Velero server installation

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

Implement centralized ConfigMap watching for node-agent controllers

This change introduces a centralized ConfigMap watching mechanism to eliminate
redundant ConfigMap watchers across controllers. Previously, each controller
(PodVolumeBackup, PodVolumeRestore, DataUpload, DataDownload) independently
watched the same ConfigMap, leading to inefficiency.

Key changes:
- Add ConfigProvider interface for centralized configuration management
- Implement nodeAgentConfigProvider with single ConfigMap watcher
- Update all controllers to use ConfigProvider instead of direct watching
- Add comprehensive unit tests for ConfigProvider implementation
- Add enhanced MockConfigProvider for testing
- Add E2E test for validating centralized watching behavior
- Remove redundant ConfigMap watching code from controllers

Benefits:
- Single source of truth for ConfigMap watching
- Reduced resource usage with one watcher instead of four
- Consistent configuration updates across all controllers
- Improved testability with centralized mocking
- Better separation of concerns

🤖 Generated with [Claude Code](https://claude.ai/code)

Fix ConfigProvider tests for empty ConfigMap name and informer behavior

- Handle empty ConfigMap name gracefully in NewNodeAgentConfigProvider
- Skip informer start when ConfigMap name is empty
- Update test expectations to handle informer Add events on startup
- Ensure tests pass with correct behavior for edge cases

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
kaovilai added a commit to kaovilai/velero that referenced this pull request Jul 25, 2025
- Add --server-priority-class-name and --node-agent-priority-class-name flags to velero install command
- Configure data mover pods (PVB/PVR/DataUpload/DataDownload) to use priority class from node-agent-configmap
- Update e2e tests to include PriorityClass label for testing
- Move priority class design document to Implemented folder
- Add changelog entry for velero-io#8883

🤖 Generated with [Claude Code](https://claude.ai/code)

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

Add support for ConfigMap options in Velero server installation

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

Implement centralized ConfigMap watching for node-agent controllers

This change introduces a centralized ConfigMap watching mechanism to eliminate
redundant ConfigMap watchers across controllers. Previously, each controller
(PodVolumeBackup, PodVolumeRestore, DataUpload, DataDownload) independently
watched the same ConfigMap, leading to inefficiency.

Key changes:
- Add ConfigProvider interface for centralized configuration management
- Implement nodeAgentConfigProvider with single ConfigMap watcher
- Update all controllers to use ConfigProvider instead of direct watching
- Add comprehensive unit tests for ConfigProvider implementation
- Add enhanced MockConfigProvider for testing
- Add E2E test for validating centralized watching behavior
- Remove redundant ConfigMap watching code from controllers

Benefits:
- Single source of truth for ConfigMap watching
- Reduced resource usage with one watcher instead of four
- Consistent configuration updates across all controllers
- Improved testability with centralized mocking
- Better separation of concerns

🤖 Generated with [Claude Code](https://claude.ai/code)

Fix ConfigProvider tests for empty ConfigMap name and informer behavior

- Handle empty ConfigMap name gracefully in NewNodeAgentConfigProvider
- Skip informer start when ConfigMap name is empty
- Update test expectations to handle informer Add events on startup
- Ensure tests pass with correct behavior for edge cases

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread test/util/k8s/priority_class.go Outdated
Comment thread pkg/controller/data_download_controller.go Outdated
Comment thread pkg/controller/data_upload_controller.go Outdated
Comment thread pkg/controller/pod_volume_backup_controller.go Outdated
Comment thread pkg/controller/pod_volume_restore_controller.go Outdated
Comment thread pkg/util/kube/priority_class.go
Comment thread pkg/cmd/cli/nodeagent/server.go Outdated
Comment thread design/Implemented/priority-class-name-support_design.md Outdated
Comment thread pkg/cmd/cli/install/install.go Outdated
Comment thread pkg/util/kube/priority_class.go Outdated
kaovilai added a commit to kaovilai/velero that referenced this pull request Jul 25, 2025
- Add --server-priority-class-name and --node-agent-priority-class-name flags to velero install command
- Configure data mover pods (PVB/PVR/DataUpload/DataDownload) to use priority class from node-agent-configmap
- Update e2e tests to include PriorityClass label for testing
- Move priority class design document to Implemented folder
- Add changelog entry for velero-io#8883

🤖 Generated with [Claude Code](https://claude.ai/code)

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

Add support for ConfigMap options in Velero server installation

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

Implement centralized ConfigMap watching for node-agent controllers

This change introduces a centralized ConfigMap watching mechanism to eliminate
redundant ConfigMap watchers across controllers. Previously, each controller
(PodVolumeBackup, PodVolumeRestore, DataUpload, DataDownload) independently
watched the same ConfigMap, leading to inefficiency.

Key changes:
- Add ConfigProvider interface for centralized configuration management
- Implement nodeAgentConfigProvider with single ConfigMap watcher
- Update all controllers to use ConfigProvider instead of direct watching
- Add comprehensive unit tests for ConfigProvider implementation
- Add enhanced MockConfigProvider for testing
- Add E2E test for validating centralized watching behavior
- Remove redundant ConfigMap watching code from controllers

Benefits:
- Single source of truth for ConfigMap watching
- Reduced resource usage with one watcher instead of four
- Consistent configuration updates across all controllers
- Improved testability with centralized mocking
- Better separation of concerns

🤖 Generated with [Claude Code](https://claude.ai/code)

Fix ConfigProvider tests for empty ConfigMap name and informer behavior

- Handle empty ConfigMap name gracefully in NewNodeAgentConfigProvider
- Skip informer start when ConfigMap name is empty
- Update test expectations to handle informer Add events on startup
- Ensure tests pass with correct behavior for edge cases

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread pkg/cmd/cli/install/install.go Outdated
Comment thread pkg/controller/data_download_controller.go Outdated
Comment thread pkg/controller/data_upload_controller.go Outdated
Comment thread pkg/controller/pod_volume_restore_controller.go Outdated
Comment thread pkg/controller/pod_volume_backup_controller.go Outdated
Comment thread pkg/nodeagent/node_agent.go Outdated
Comment thread pkg/util/kube/priority_class.go Outdated
Comment thread pkg/repository/maintenance/maintenance.go Outdated
Comment thread test/util/k8s/priority_class.go Outdated
Comment thread site/content/docs/main/data-movement-pod-resource-configuration.md
Comment thread site/content/docs/main/csi-snapshot-data-movement.md Outdated
Comment thread site/content/docs/main/csi-snapshot-data-movement.md Outdated
Comment thread site/content/docs/main/csi-snapshot-data-movement.md
Comment thread site/content/docs/main/data-movement-pod-resource-configuration.md
Comment thread site/content/docs/main/csi-snapshot-data-movement.md Outdated
@Lyndon-Li
Copy link
Copy Markdown
Contributor

@kaovilai
Sorry, I forgot to publish my comments yesterday. They are all about docs, please have another look.

Lyndon-Li
Lyndon-Li previously approved these changes Jul 30, 2025
ywk253100
ywk253100 previously approved these changes Aug 1, 2025
@reasonerjt
Copy link
Copy Markdown
Contributor

@kaovilai Please resolve the conflicts.

@kaovilai
Copy link
Copy Markdown
Collaborator Author

kaovilai commented Aug 4, 2025

rebasing

@Lyndon-Li
Copy link
Copy Markdown
Contributor

@kaovilai
There are conflicts, could you solve them?

@kaovilai
Copy link
Copy Markdown
Collaborator Author

kaovilai commented Aug 6, 2025

yes

kaovilai and others added 2 commits August 6, 2025 01:36
- Add --server-priority-class-name and --node-agent-priority-class-name flags to velero install command
- Configure data mover pods (PVB/PVR/DataUpload/DataDownload) to use priority class from node-agent-configmap
- Configure maintenance jobs to use priority class from repo-maintenance-job-configmap (global config only)
- Add priority class validation with ValidatePriorityClass and GetDataMoverPriorityClassName utilities
- Update e2e tests to include PriorityClass testing utilities
- Move priority class design document to Implemented folder
- Add comprehensive unit tests for all priority class implementations
- Update documentation for priority class configuration
- Add changelog entry for velero-io#8883

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

remove unused test utils

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

feat: add unit test for getting priority class name in maintenance jobs

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

doc update

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

feat: add priority class validation for repository maintenance jobs

- Add ValidatePriorityClassWithClient function to validate priority class existence
- Integrate validation in maintenance.go when creating maintenance jobs
- Update tests to cover the new validation functionality
- Return boolean from ValidatePriorityClass to allow fallback behavior

This ensures maintenance jobs don't fail due to non-existent priority classes,
following the same pattern used for data mover pods.

Addresses feedback from:
velero-io#8883 (comment)

Refs velero-io#8869

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

refactor: clean up priority class handling for data mover pods

- Fix comment in node_agent.go to clarify PriorityClassName is only for data mover pods
- Simplify server.go to use dataPathConfigs.PriorityClassName directly
- Remove redundant priority class logging from controllers as it's already logged during server startup
- Keep logging centralized in the node-agent server initialization

This reduces code duplication and clarifies the scope of priority class configuration.

🤖 Generated with [Claude Code](https://claude.ai/code)

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

refactor: remove GetDataMoverPriorityClassName from kube utilities

Remove GetDataMoverPriorityClassName function and its tests as priority
class is now read directly from dataPathConfigs instead of parsing from
ConfigMap. This simplifies the codebase by eliminating the need for
indirect ConfigMap parsing.

Refs velero-io#8869

🤖 Generated with [Claude Code](https://claude.ai/code)

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

refactor: remove priority class validation from install command

Remove priority class validation during install as it's redundant
since validation already occurs during server startup. Users cannot
see console logs during install, making the validation warnings
ineffective at this stage.

The validation remains in place during server and node-agent startup
where it's more appropriate and visible to users.

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support configuring PriorityClass for Velero pods(server/node-agent/maintenence/data mover/etc.)

5 participants