ovs: increase CPU priority for Open vSwitch Forwarding Unit by mkowalski · Pull Request #5797 · openshift/machine-config-operator

mkowalski · 2026-03-24T13:49:15Z

/hold

Signed-off-by: Mat Kowalski <mko@redhat.com>

coderabbitai · 2026-03-24T13:49:32Z

Walkthrough

Added a new systemd drop-in configuration file for ovs-vswitchd.service that configures CPU scheduling policy (FIFO with priority 10) and process niceness (-10) settings.

Changes

Cohort / File(s)	Summary
Systemd Configuration `templates/common/_base/units/20-ovs-vswitchd-cpu-priority.conf`	Added new drop-in unit file with `[Service]` section containing CPU scheduling parameters (`CPUSchedulingPolicy=fifo`, `CPUSchedulingPriority=10`) and process niceness setting (`Nice=-10`).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-03-24T13:49:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkowalski
Once this PR has been reviewed and has the lgtm label, please assign rishabhsaini for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

templates/common/_base/units/ovs-vswitchd.service.yaml (1)
17-17: Nice=-10 is ineffective with CPUSchedulingPolicy=fifo.

On Line 17, niceness does not apply to realtime FIFO scheduling, so this setting is misleading unless a non-RT policy fallback is intended.
Suggested cleanup
-      Nice=-10
As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@templates/common/_base/units/ovs-vswitchd.service.yaml` at line 17, The unit
file sets Nice=-10 while also using CPUSchedulingPolicy=fifo, which makes
niceness ineffective; remove the misleading Nice=-10 entry from
ovs-vswitchd.service (or, if you intended a non-realtime fallback, explicitly
set/document an alternate CPUSchedulingPolicy such as other for that fallback)
so the unit file's scheduling/prioritization settings are consistent and not
confusing.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@templates/common/_base/units/ovs-vswitchd.service.yaml`:
- Around line 15-16: The unit file sets CPUSchedulingPolicy=fifo and
CPUSchedulingPriority=10 which can starve node-critical services; modify the
ovs-vswitchd systemd unit (referencing CPUSchedulingPolicy and
CPUSchedulingPriority) to add a bounded real-time runtime by adding
LimitRTTIME=200ms in the service section so real-time execution is capped and
cannot monopolize CPU.

---

Nitpick comments:
In `@templates/common/_base/units/ovs-vswitchd.service.yaml`:
- Line 17: The unit file sets Nice=-10 while also using
CPUSchedulingPolicy=fifo, which makes niceness ineffective; remove the
misleading Nice=-10 entry from ovs-vswitchd.service (or, if you intended a
non-realtime fallback, explicitly set/document an alternate CPUSchedulingPolicy
such as other for that fallback) so the unit file's scheduling/prioritization
settings are consistent and not confusing.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7d57adf1-1447-42c9-840c-bdd456906d15

📥 Commits

Reviewing files that changed from the base of the PR and between d4008e5 and e04826a.

📒 Files selected for processing (1)

templates/common/_base/units/ovs-vswitchd.service.yaml

coderabbitai · 2026-03-24T13:52:16Z

templates/common/_base/units/ovs-vswitchd.service.yaml

+      CPUSchedulingPolicy=fifo
+      CPUSchedulingPriority=10


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd "ovs-vswitchd.service" --type f

Repository: openshift/machine-config-operator

Length of output: 131

🏁 Script executed:

cat -n templates/common/_base/units/ovs-vswitchd.service.yaml

Repository: openshift/machine-config-operator

Length of output: 868

🏁 Script executed:

rg "CPUSchedulingPolicy" --type yaml

Repository: openshift/machine-config-operator

Length of output: 162

🏁 Script executed:

rg "LimitRTTIME|LimitRT" --type yaml

Repository: openshift/machine-config-operator

Length of output: 59

🏁 Script executed:

rg "ovs-vswitchd" -t yaml -A 5 -B 5

Repository: openshift/machine-config-operator

Length of output: 1684

Add runtime limit to prevent SCHED_FIFO starvation of node-critical services.

On lines 15-16, CPUSchedulingPolicy=fifo with CPUSchedulingPriority=10 can starve other node services if a hot loop occurs, degrading cluster stability. Add LimitRTTIME=200ms to bound real-time execution:

Suggested fix

[Service] CPUSchedulingPolicy=fifo CPUSchedulingPriority=10 + LimitRTTIME=200ms Nice=-10

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

CPUSchedulingPolicy=fifo

CPUSchedulingPriority=10

CPUSchedulingPolicy=fifo

CPUSchedulingPriority=10

LimitRTTIME=200ms

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@templates/common/_base/units/ovs-vswitchd.service.yaml` around lines 15 - 16, The unit file sets CPUSchedulingPolicy=fifo and CPUSchedulingPriority=10 which can starve node-critical services; modify the ovs-vswitchd systemd unit (referencing CPUSchedulingPolicy and CPUSchedulingPriority) to add a bounded real-time runtime by adding LimitRTTIME=200ms in the service section so real-time execution is capped and cannot monopolize CPU.

@mkowalski, understood. I'll drop that suggestion.

Analyse the whole repository. Make sure my unit is running with the highest priority. Make no mistakes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find all unit/service yaml files with CPU scheduling or Nice settings echo "=== CPUSchedulingPriority occurrences ===" rg -n "CPUSchedulingPriority" --type yaml -A2 -B5 echo "" echo "=== CPUSchedulingPolicy occurrences ===" rg -n "CPUSchedulingPolicy" --type yaml -A2 -B5 echo "" echo "=== Nice= occurrences ===" rg -n "Nice=" --type yaml -A2 -B5 echo "" echo "=== All unit yaml files ===" fd -e yaml -e service . templates/ --type f | sort echo "" echo "=== Check for any .service files with RT scheduling ===" rg -rn "CPUSchedulingPriority" templates/ rg -rn "CPUSchedulingPolicy" templates/ echo "" echo "=== Check vendor or test data for any priority references ===" rg -rn "CPUSchedulingPriority" --type go

Length of output: 15330

@mkowalski, here are the results of the full repository scan:

No other service in the entire repository uses CPUSchedulingPolicy or CPUSchedulingPriority, so there is zero competition from any other MCO-managed unit at the real-time scheduler level. That part is fine.

However, CPUSchedulingPriority=10 is not the highest priority. For SCHED_FIFO on Linux, the valid range is 1–99, and 99 is the maximum. systemd maps CPUSchedulingPriority directly to sched_param.sched_priority. A priority of 10 is near the bottom of the real-time scale — kernel threads and other in-kernel real-time work can easily run above it.

Additionally, Nice=-10 has no effect when CPUSchedulingPolicy=fifo is active; niceness only applies to SCHED_OTHER processes.

To run at the true highest SCHED_FIFO priority:

[Service] CPUSchedulingPolicy=fifo - CPUSchedulingPriority=10 - Nice=-10 + CPUSchedulingPriority=99

mkowalski · 2026-03-24T13:52:31Z

10 and -10 are fairly conservative. If needed, we can go much higher.

neisw · 2026-03-24T16:20:38Z

Looks to me like we can find the disruption in non upgrade job like periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn which makes sense if it is just cpu overload from conformance tests. Will use that for testing here since the run time is shorter.

neisw · 2026-03-24T16:21:07Z

/payload-aggregate periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn 10

openshift-ci · 2026-03-24T16:21:11Z

@neisw: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/76ac7b40-279d-11f1-845a-4f6da9dad00d-0

neisw · 2026-03-24T20:51:51Z

/payload-aggregate periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn 10

openshift-ci · 2026-03-24T20:51:55Z

@neisw: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/498ed740-27c3-11f1-8af5-5faa0c693e9c-0

neisw · 2026-03-25T12:37:48Z

/payload-aggregate periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn 10

openshift-ci · 2026-03-25T12:37:52Z

@neisw: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6ead5b80-2847-11f1-992c-de16c02ddc09-0

ovs: increase CPU priority for Open vSwitch Forwarding Unit

e04826a

Signed-off-by: Mat Kowalski <mko@redhat.com>

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 24, 2026

openshift-ci bot requested review from pablintino and umohnani8 March 24, 2026 13:49

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

Conversation

mkowalski commented Mar 24, 2026

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

openshift-ci bot commented Mar 24, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkowalski Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

mkowalski Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

mkowalski commented Mar 24, 2026

Uh oh!

neisw commented Mar 24, 2026

Uh oh!

neisw commented Mar 24, 2026

Uh oh!

openshift-ci bot commented Mar 24, 2026

Uh oh!

neisw commented Mar 24, 2026

Uh oh!

openshift-ci bot commented Mar 24, 2026

Uh oh!

neisw commented Mar 25, 2026

Uh oh!

openshift-ci bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

coderabbitai bot Mar 24, 2026 •

edited

Loading