Skip to content

Tuning of individual kernel threads#628

Open
adriaan42 wants to merge 2 commits into
redhat-performance:masterfrom
adriaan42:adriaan/kthreads
Open

Tuning of individual kernel threads#628
adriaan42 wants to merge 2 commits into
redhat-performance:masterfrom
adriaan42:adriaan/kthreads

Conversation

@adriaan42

@adriaan42 adriaan42 commented Apr 17, 2024

Copy link
Copy Markdown
Contributor

In combination with #596 and #580, this PR implements the third feature needed to dynamically tune all relevant aspects of a realtime application using a dedicated HW device (typically a NIC).

Two things might need some discussion:

  • In my implementation I kept the basic idea that one instace can cover a number of different groups of threads. That makes it easy to migrate from the current scheduler plugin, but means we still need _has_dynamic_options, which is marked as a hack in plugins/base.py. The alternative would be to have one plugin instance per "group", which would make the profiles much longer.
    group.ktimers=0:f:2:*:^\[ktimers
    would become something like
    [kthread_ktimers]
    type=kthread
    regex=^ktimers
    policy=fifo
    sched_prio=2
    affinity=*
  • I copied the approach of using perf to monitor for creation of new threads. That means that when running both the scheduler plugin and the kthread plugin, we'd have two threads doing the same thing. For my applications that's not a problem, because I no longer use the scheduler plugin at all:
    • scheduler handles three things: IRQ affinities, kernel threads, and userland threads
    • For IRQ affinities I can use the irq plugin
    • For kernel threads I can use kthread
    • For userland threads I use systemd and cgroupv2, and I don't want TuneD to touch them

Comment thread tuned/plugins/plugin_kthread.py Fixed
Comment thread tuned/plugins/plugin_kthread.py Fixed
Comment thread tuned/plugins/plugin_kthread.py
@yarda

yarda commented May 23, 2024

Copy link
Copy Markdown
Contributor

but means we still need _has_dynamic_options, which is marked as a hack in plugins/base.py.

It's OK for me, in long-term it's a candidate for rewrite/refactor, but there are other plugins using it as well. We will probably keep the idea and if we change the implementation, this could be then updated in all affected plugins the same way.

@yarda

yarda commented May 23, 2024

Copy link
Copy Markdown
Contributor

Regarding the cgroups, there is support for cgroups v1 in the scheduler plugin and we would also like to add support for the v2 for completeness. It could be useful for somebody.

It's OK if you are not using some plugin. We even wanted to add global configuration option allowing selective disablement of specific plugins in the stock profiles.

@adriaan42

Copy link
Copy Markdown
Contributor Author

Regarding the cgroups, there is support for cgroups v1 in the scheduler plugin and we would also like to add support for the v2 for completeness. It could be useful for somebody.

I found the whole cgroup topic to be rather tricky, because in modern systems, SystemD is the "cgroup manager", and it owns (by convention) the cgroup tree. So any creation of new cgroups should happen via SystemD, and can then use Delegation to create further sub-groups.

I've had some success with:

  • set AllowedCPUs on all the default slices (system.slice, user.slice, init.scope) to restrict all "normal" processes. This to some extent replaces the isolcpus= kernel option.
  • Create an isolated.slice using SystemD, with access to the desired CPUs, and then use Slice=isolated in my service file (or systemd-run --slice=isolated when launching from a shell) to gain access to the isolated CPUs.

But simply having TuneD move processes around seems like it could have unwanted side-effects, and should be handled with care...

Comment thread tuned/plugins/plugin_kthread.py Fixed
@adriaan42

Copy link
Copy Markdown
Contributor Author

Update:

@yarda did you already have a chance to review this PR?

@yarda

yarda commented Oct 14, 2024

Copy link
Copy Markdown
Contributor

Sorry for the delay, I am back on it.

The 5d28337 LGTM.

Regarding the kthread plugin is there a specific need for the proposed syntax:

group.ksoftirqd=0:f:2:*:^\[ksoftirqd

Wouldn't be better to use one group per instance? I.e. the same affinity/sched_opts setting for individual instance, e.g.:

[ksoftirqd]
type=kthread
devices_udev_regex=^\[ksoftirqd
setting=0:f:2:*

Or:

[ksoftirqd]
type=kthread
devices_udev_regex=^\[ksoftirqd
schedopts=SCHED_FIFO
affinity=*

Then for the priority the builtin instance priority option could be used. The affinity if unset could default to *.

You could also specify multiple regexes per instance:

[ksoftirqd_and_ksmd]
type=kthread
devices_udev_regex=^\[(ksoftirqd)|(ksmd)
schedopts=SCHED_FIFO

@adriaan42

Copy link
Copy Markdown
Contributor Author

Wouldn't be better to use one group per instance? I.e. the same affinity/sched_opts setting for individual instance, e.g.:

I thought about this when I wrote the plugin, and didn't like it because

  • I like the "default" of having only one instance per plugin in the profile, unless doing some specific optimizations. That instance has the name of the plugin, so type= is not needed. In the current profiles only few very specific cases need multiple instances (e.g. ThunderX).
  • The approach with multiple instances would make the common case (what's currently used in the realtime profiles) much more complex to express.

I find the current format in the scheduler plugin quite clear, except for minor points I've changed:

  • unify policy and priority into one option (f50 instead of f:50) because setting those separately makes no sense to me
  • remove the braces [] from thread names (the scheduler plugin adds them to identify kernel threads, but here we have only kernel threads, so no need for this)
[ksoftirqd]
type=kthread
devices_udev_regex=^\[ksoftirqd
setting=0:f:2:*

Also re-using the existing devices_udev_regex is not very clean, as we're not dealing with udev Devices here. So the setting we're using in the profiles would be "wrong", and internally we'd need to create pyudev.Device objects just to use the existing mechanism.

@yarda

yarda commented Nov 6, 2024

Copy link
Copy Markdown
Contributor

Here in one instance there can be multiple "devices" with multiple different tuning. I don't like that it breaks the used logic concept that in each instance there are "devices" with the same tuning. I am afraid that it could lead to multiple problems later (e.g. when moving "devices" between instances through the API).

But maybe it's just my personal preference, @zacikpa, @jmencak what's your opinion?

@zacikpa

zacikpa commented Nov 7, 2024

Copy link
Copy Markdown
Contributor

I'm personally fine with the implementation as it is now (since it's inspired by the groups.* implementation in the scheduler plugin), but I would not refer to the kthreads as devices anywhere in the code/comments. In other words, the new plugin would not support "devices" in the TuneD sense, similarly to the scheduler plugin.

@yarda

yarda commented Nov 7, 2024

Copy link
Copy Markdown
Contributor

No device plugin, like e.g. the sysfs plugin, this may work.

@adriaan42

Copy link
Copy Markdown
Contributor Author

No device plugin, like e.g. the sysfs plugin, this may work.

One of the main points of the new plugin is to allow dynamic changes of the tuning (through the instance_[create|destroy] dbus calls). That only works on device-plugins (based on hotplug.Plugin).

@zacikpa

zacikpa commented Nov 11, 2024

Copy link
Copy Markdown
Contributor

Huh, taking back what I said here, I say we should adjust instance_create and instance_destroy to work with any plugin, not just child classes of hotplug.Plugin.

IIUC, you can't and don't plan to use instance_acquire_devices with this plugin, am I right? (Because it does not support what TuneD calls "devices").

@adriaan42

Copy link
Copy Markdown
Contributor Author

Huh, taking back what I said here, I say we should adjust instance_create and instance_destroy to work with any plugin, not just child classes of hotplug.Plugin.

The change itself is probably simple, but I'm not sure what would happen for any of the existing non-hotplug Plugins if multiple instances were created. I expect those instances would just interfere with each other and break things.

IIUC, you can't and don't plan to use instance_acquire_devices with this plugin, am I right? (Because it does not support what TuneD calls "devices").

instance_acquire_devices could be used, but it does not make much sense in this case. (One could force the transfer of a "device" to an instance that does not match it, but then no tuning would be applied).

Treating kthreads as "devices" lets me reuse a lot of nice infrastructure. But it would of course be possible to do this all within the plugin.

@adriaan42

Copy link
Copy Markdown
Contributor Author

Huh, taking back what I said here, I say we should adjust instance_create and instance_destroy to work with any plugin, not just child classes of hotplug.Plugin.

The change itself is probably simple, but I'm not sure what would happen for any of the existing non-hotplug Plugins if multiple instances were created. I expect those instances would just interfere with each other and break things.

IIUC, you can't and don't plan to use instance_acquire_devices with this plugin, am I right? (Because it does not support what TuneD calls "devices").

instance_acquire_devices could be used, but it does not make much sense in this case. (One could force the transfer of a "device" to an instance that does not match it, but then no tuning would be applied).

Treating kthreads as "devices" lets me reuse a lot of nice infrastructure. But it would of course be possible to do this all within the plugin.

@zacikpa I just pushed a draft that bases the plugin on base.Plugin instead of hotplug.Plugin. The changes in controller.py probably need some more work.

@yarda

yarda commented Jan 18, 2025

Copy link
Copy Markdown
Contributor

@adriaan42 could you please squash the commits? Multiple commits each introducing separate logical change are OK, but multiple commits on the same file (e.g. plugin_kthread.py) reverting/changing things are making it a bit confusing.

Regarding the controller.py changes, I am afraid this could break something - this needs thorough testing at least. I am not brave enough to push it into the release now.

@zacikpa your opinion?

@yarda

yarda commented Jan 18, 2025

Copy link
Copy Markdown
Contributor

Or maybe split this PR to two PRs. The commit adding switch for disabling kthreads can be merged immediately and removing it from this PR could make rebasing easier.

@adriaan42

Copy link
Copy Markdown
Contributor Author

@adriaan42 could you please squash the commits? Multiple commits each introducing separate logical change are OK, but multiple commits on the same file (e.g. plugin_kthread.py) reverting/changing things are making it a bit confusing.

I pushed the changes as separate commit as part of the discussion, to show required changes and maybe get some feedback (confirmation that this might be the way to go before I spend effort to finalize the imlementation). Will post an update soon.

Regarding the controller.py changes, I am afraid this could break something - this needs thorough testing at least. I am not brave enough to push it into the release now.

I agree. I'm happy to look into better solutions and possible side-effects, if we agree that "allow dynamic instances of non-device plugins" is reasonable and has a chance to be integrated. Again, trying to avoid effort for something that might never get merged...

Or maybe split this PR to two PRs. The commit adding switch for disabling kthreads can be merged immediately and removing it from this PR could make rebasing easier.

Sure! -> #734

@adriaan42

Copy link
Copy Markdown
Contributor Author

Update:

@adriaan42

Copy link
Copy Markdown
Contributor Author

Update:

  • rebase following the latest release
  • implemented seamless transition of kthreads between plugin instances (similar to Seamless transfer of devices between plugin instances #750, but here the plugin needs to handle everything as part of "static" tuning, because we're no longer a "device" plugin)

@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 739758c9-cca4-439f-8972-47c6685a6fab

📥 Commits

Reviewing files that changed from the base of the PR and between 9a4895a and be3e6f6.

📒 Files selected for processing (1)
  • tuned/plugins/plugin_kthread.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tuned/plugins/plugin_kthread.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Kernel-thread tuning via per-group rules (name regex) to set CPU affinity and scheduler policy/priority.
    • Dynamic ownership and priority-aware transfer of threads across instances, preserving and restoring original settings.
    • Activation/deactivation behavior: immediate tuning on activate, rollback/restore on deactivate, and deferred acquisition from lower-priority instances.
    • Optional perf-backed monitoring to reduce races and runtime verification of applied settings.

Walkthrough

Adds a kthread plugin that discovers kernel threads, parses prioritized group. rules, assigns threads to plugin instances with priority-aware transfers, and applies/restores CPU affinity and scheduler policy/priority; optional python-perf integration provides delayed dynamic monitoring for newly spawned threads.

Changes

Kernel Thread Tuning Plugin

Layer / File(s) Summary
Module imports, feature gate, and exceptions
tuned/plugins/plugin_kthread.py
Detects python-perf availability; defines ThreadNoLongerExists and AffinityNotChangeable.
Per-thread and rule data containers
tuned/plugins/plugin_kthread.py
Defines KthreadInfo, SchedOpts, and GroupCmd for per-thread bookkeeping and parsed group rules (regex.fullmatch).
KthreadPlugin initialization and backend selection
tuned/plugins/plugin_kthread.py
Constructor initializes bookkeeping, selects SchedulerUtils backend, attempts perf setup, and exposes cleanup().
Plugin device/init and initial scan
tuned/plugins/plugin_kthread.py
Creates unassigned PID set and performs initial /proc scan; wires plugin-level add/remove routing to instances.
Instance init and group. rule parsing*
tuned/plugins/plugin_kthread.py
Instance tracking/state and parsing/validation of group.* options into ordered GroupCmd rules with compiled regex and parsed sched/affinity; acquires initial kthreads.
Rule matching and PID filtering
tuned/plugins/plugin_kthread.py
Finds first matching SchedOpts via regex.fullmatch(kthread.comm) and filters PID sets while removing vanished threads.
Instance add/remove behavior
tuned/plugins/plugin_kthread.py
Assigns matched PIDs to instances; when active applies tuning immediately (rollback on error), otherwise records assignment; removal restores tuned threads and prunes vanished PIDs.
Priority-based acquisition and transfer
tuned/plugins/plugin_kthread.py
Transfers preserve assigned vs processed semantics; acquisition prefers unassigned then steals from lower-priority instances while avoiding active-instance stealing when inactive; release returns PIDs appropriately.
Activation, apply/unapply lifecycle and verification
tuned/plugins/plugin_kthread.py
Activation rescans and may start perf monitor, applies tuning to assigned PIDs and marks them processed, then acquires deferred PIDs; verification checks processed PIDs; deactivation un-applies tuning and optionally fully restores originals, stopping perf when last instance deactivates.
/proc discovery and bookkeeping
tuned/plugins/plugin_kthread.py
Scans procfs for PF_KTHREAD threads, maintains KthreadInfo entries and unassigned PID set under lock, removes disappeared threads, and raises ThreadNoLongerExists for missing PIDs.
Perf-based dynamic monitoring
tuned/plugins/plugin_kthread.py
When python-perf is available, starts a monitor thread that delays handling of RECORD_COMM before adding threads and removes threads on RECORD_EXIT.
Low-level scheduling wrappers
tuned/plugins/plugin_kthread.py
SchedulerUtils wrappers translate ESRCH to ThreadNoLongerExists, detect non-changeable affinity and raise AffinityNotChangeable, and format SchedOpts for logging/verification.
Apply tuning and rollback
tuned/plugins/plugin_kthread.py
Capture original affinity/sched on first tuning, apply affinity and/or scheduler updates, record tuned flags, and rollback affinity if scheduler update fails after affinity change.
Change transitions and restoration
tuned/plugins/plugin_kthread.py
Subsequent tuning changes selectively restore affinity/sched when desired opts omit fields; full restore honors kthread.affinity_changeable and clears saved originals.
Verification
tuned/plugins/plugin_kthread.py
Verifies scheduler policy/priority always when specified; verifies affinity only when requested and affinity_changeable is truthy, normalizing cpulist formatting.

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Tuning of individual kernel threads' directly and clearly describes the main change: adding a new kthread plugin for tuning kernel threads.
Description check ✅ Passed The description is highly relevant, explaining the purpose of the PR, design decisions, and implementation considerations related to the kthread plugin.
Docstring Coverage ✅ Passed Docstring coverage is 88.37% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tuned/plugins/plugin_kthread.py`:
- Around line 331-353: The code reads and unions
other_instance._kthreads_assigned/_kthreads_processed and iterates those sets
without holding self._lock, causing concurrent-modification races; fix by
acquiring self._lock around every access/union/iteration of per-instance kthread
sets in _instance_release_kthreads and in the prior transfer loop (the block
using _get_matching_kthreads and calling _instance_transfer_kthread), e.g.
snapshot sets with set(...) while holding self._lock (using self._lock:), then
release the lock if you want to perform heavier work outside, or perform the
transfer while still holding the lock to prevent RuntimeError; ensure all
accesses to _kthreads_assigned, _kthreads_processed, _kthread_pids_unassigned
and calls to _instance_add_kthread/_instance_remove_kthread are protected by the
same self._lock.
- Around line 620-642: The method _apply_kthread_tuning currently assigns
kthread.sched_orig before performing changes, so if _set_affinity succeeds but
_set_schedopts later raises the PID becomes partially tuned and unrevertible; to
fix, capture current_affinity/current_policy/current_priority in local vars,
attempt to apply affinity and scheduling changes inside a try/except and only
set kthread.sched_orig = SchedOpts(...) and tuned_* flags after both operations
succeed, and on any exception revert any change already applied (call
_set_affinity(pid, original_affinity) and _set_schedopts(pid, original_policy,
original_priority) as needed), propagate the error, and ensure
affinity_changeable is set correctly based on whether revert succeeded; use the
existing helpers _set_affinity, _set_schedopts and _get_affinity/_get_schedopts
to locate where to implement this.
- Around line 137-142: When calling self._perf_setup() (guarded by have_perf)
wrap the call in a try/except to catch any exceptions (e.g., Exception as e)
raised during perf initialization and fall back to the non-perf path by calling
self._disable_perf(); also emit the same warning used when python-perf was not
found but include the caught exception details so the plugin degrades gracefully
instead of aborting. Target the call site where have_perf is checked and update
logic around self._perf_setup(), referencing the _perf_setup() and
_disable_perf() methods and the existing log.warning message.
- Around line 225-230: The current validation only checks SCHED_OTHER==0 and a
blanket 0..99 range; update the validation in the parsing/validation block that
uses variables policy and priority (in tuned/plugins/plugin_kthread.py) to use
policy-specific bounds: call self._scheduler_utils.get_priority_min(policy) and
get_priority_max(policy), ensure priority is between those bounds, and produce
policy-aware error messages (e.g., for SCHED_BATCH/SCHED_IDLE require 0, for
SCHED_FIFO/SCHED_RR disallow 0) instead of the generic 0..99 check so invalid
values (like b50, i1, f0) are rejected before _set_schedopts() is called.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 0573457f-0de3-4f29-99b2-088e885424ba

📥 Commits

Reviewing files that changed from the base of the PR and between 0eb28ac and a4c0fb6.

📒 Files selected for processing (1)
  • tuned/plugins/plugin_kthread.py

Comment thread tuned/plugins/plugin_kthread.py
Comment thread tuned/plugins/plugin_kthread.py Outdated
Comment thread tuned/plugins/plugin_kthread.py Outdated
Comment thread tuned/plugins/plugin_kthread.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tuned/plugins/plugin_kthread.py`:
- Around line 348-356: When reassigning free_kthreads you must iterate other
instances in priority order instead of dict order; change the loop that
currently iterates self._instances.values() to iterate instances sorted by their
priority (highest first) and skip the releasing instance. Use each instance's
priority attribute (e.g., instance.priority or instance._priority) as the sort
key, then call self._get_matching_kthreads(...) and
self._instance_transfer_kthread(...) in that order so overlapping match sets go
to higher-priority instances first.
- Around line 139-144: The _perf_setup() failure path in KthreadPlugin.__init__
can leak partially initialized perf resources; update the except block so it
invokes self._perf_shutdown() (to close any evlist/evsel/fds created) before
setting the global have_perf = False and logging the warning, and ensure any
exceptions from _perf_shutdown() are caught/suppressed or logged to avoid
masking the original error; reference KthreadPlugin.__init__, _perf_setup(),
_perf_shutdown(), cleanup(), and the global have_perf.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 63ae5d08-6824-4185-addc-db8e9f3a8741

📥 Commits

Reviewing files that changed from the base of the PR and between a4c0fb6 and 0f50e25.

📒 Files selected for processing (1)
  • tuned/plugins/plugin_kthread.py

Comment thread tuned/plugins/plugin_kthread.py
Comment thread tuned/plugins/plugin_kthread.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tuned/plugins/plugin_kthread.py (2)

292-303: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Isolate per-thread failures so the perf monitor doesn't die.

_instance_add_kthread() only handles ThreadNoLongerExists. Any other _apply_kthread_tuning() failure bubbles through _kthread_internal_add() into _perf_monitor_thread(), which has no catch. One bad kthread can therefore terminate dynamic monitoring for all future kthreads until TuneD restarts.

Suggested fix
 		while not self._terminate.is_set():
 			# process new threads when their delay has passed
 			now = time.time()
 			while len(new_threads) > 0 and now >= new_threads[0].ts:
-				self._kthread_internal_add(new_threads.pop(0).tid)
+				tid = new_threads.pop(0).tid
+				try:
+					self._kthread_internal_add(tid)
+				except Exception as e:
+					log.error("Failed to process new kthread %d: %s" % (tid, e))
@@
 						if event.type == perf.RECORD_COMM:
 							new_threads.append(NewThread(now + thread_add_delay_s, event.tid))
 						elif event.type == perf.RECORD_EXIT:
-							self._kthread_internal_remove(event.tid)
+							try:
+								self._kthread_internal_remove(event.tid)
+							except Exception as e:
+								log.error("Failed to process kthread exit %d: %s" % (event.tid, e))

Also applies to: 537-583

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tuned/plugins/plugin_kthread.py` around lines 292 - 303,
_instance_add_kthread currently only catches ThreadNoLongerExists so any other
exception from _get_instance_sched_options or _apply_kthread_tuning will
propagate up into _kthread_internal_add and kill _perf_monitor_thread; wrap the
work that fetches options and applies tuning (the call(s) to
_get_instance_sched_options and _apply_kthread_tuning) in a broad try/except
that catches Exception (while keeping the existing ThreadNoLongerExists handling
separate), log the error, call _kthread_internal_remove(pid) to clean up, and
only add pid to instance._kthreads_processed on successful completion; apply the
identical defensive try/except pattern to the other similar block referenced
(the second occurrence around the 537–583 region) so one bad kthread cannot
terminate the perf monitor.

357-372: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear dead PIDs out of the unassigned pool.

_kthread_internal_remove() never discards the PID from _kthread_pids_unassigned, and _instance_release_kthreads() can add it back after removal. In a long-lived daemon that means dead PIDs accumulate in the unassigned set and get re-scanned on every later acquire/release cycle.

Suggested fix
 	def _instance_release_kthreads(self, instance):
@@
 			# the remaining ones go back to unassigned
 			for pid in free_kthreads:
 				self._instance_remove_kthread(instance, pid)
-				self._kthread_pids_unassigned.add(pid)
+				if pid in self._kthreads:
+					self._kthread_pids_unassigned.add(pid)
@@
 	def _kthread_internal_remove(self, pid):
@@
 		try:
 			with self._lock:
+				self._kthread_pids_unassigned.discard(pid)
 				del self._kthreads[pid]
 				self._plugin_remove_kthread(pid)

Also applies to: 467-477

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tuned/plugins/plugin_kthread.py` around lines 357 - 372, _dead_ PIDs are
never removed from _kthread_pids_unassigned so unassigned set grows; update the
removal logic so dead PIDs are purged: modify _kthread_internal_remove (or
ensure callers like _instance_remove_kthread and _instance_release_kthreads) to
also discard the pid from _kthread_pids_unassigned (and only add back to
_kthread_pids_unassigned if the pid is still alive/valid). Use the symbols
_kthread_internal_remove, _instance_remove_kthread, _instance_release_kthreads
and _kthread_pids_unassigned to locate the code and implement the check/remove
so dead pids are not left in the unassigned pool.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tuned/plugins/plugin_kthread.py`:
- Around line 142-149: The current fallback path swallows exceptions from
_perf_shutdown() after a failed _perf_setup(), forcing have_perf=False and
preventing cleanup() from retrying; change the except-pass behavior so that any
exception raised by _perf_shutdown() is not silently dropped—either re-raise the
shutdown exception (preferably chaining it to the original _perf_setup()
exception) or record it and raise a combined exception so callers see the
failure; apply the same change to the other similar blocks handling
_perf_setup/_perf_shutdown (the other occurrences around the same pattern) so
resource cleanup failures are propagated and cleanup() can retry.

---

Outside diff comments:
In `@tuned/plugins/plugin_kthread.py`:
- Around line 292-303: _instance_add_kthread currently only catches
ThreadNoLongerExists so any other exception from _get_instance_sched_options or
_apply_kthread_tuning will propagate up into _kthread_internal_add and kill
_perf_monitor_thread; wrap the work that fetches options and applies tuning (the
call(s) to _get_instance_sched_options and _apply_kthread_tuning) in a broad
try/except that catches Exception (while keeping the existing
ThreadNoLongerExists handling separate), log the error, call
_kthread_internal_remove(pid) to clean up, and only add pid to
instance._kthreads_processed on successful completion; apply the identical
defensive try/except pattern to the other similar block referenced (the second
occurrence around the 537–583 region) so one bad kthread cannot terminate the
perf monitor.
- Around line 357-372: _dead_ PIDs are never removed from
_kthread_pids_unassigned so unassigned set grows; update the removal logic so
dead PIDs are purged: modify _kthread_internal_remove (or ensure callers like
_instance_remove_kthread and _instance_release_kthreads) to also discard the pid
from _kthread_pids_unassigned (and only add back to _kthread_pids_unassigned if
the pid is still alive/valid). Use the symbols _kthread_internal_remove,
_instance_remove_kthread, _instance_release_kthreads and
_kthread_pids_unassigned to locate the code and implement the check/remove so
dead pids are not left in the unassigned pool.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 1dff8677-9b1e-41be-87e2-933ea4f0cfa1

📥 Commits

Reviewing files that changed from the base of the PR and between 0f50e25 and 295cad9.

📒 Files selected for processing (1)
  • tuned/plugins/plugin_kthread.py

Comment thread tuned/plugins/plugin_kthread.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tuned/plugins/plugin_kthread.py`:
- Around line 292-305: The code in _instance_add_kthread currently mutates
ownership/bookkeeping before calling _apply_kthread_tuning, risking permanent
loss on non-ESRCH failures; change the flow so the PID is not removed from its
current owner nor from global bookkeeping until _apply_kthread_tuning(kthread,
opts) completes successfully, or ensure a rollback that re-inserts the PID back
into the previous instance/set and global maps if tuning throws; specifically
wrap the sequence using _kthread_get, _get_instance_sched_options and
_apply_kthread_tuning in a try/except where any exception triggers restoration
via re-adding the PID to the source instance and avoiding call to
_kthread_internal_remove, and apply the same atomic/rollback fix to the
analogous transfer paths that use the same helpers.
- Around line 179-185: The current kthread assignment logic (in
_plugin_add_kthread and in the transfer code paths using
_instance_transfer_kthread / _get_matching_kthreads / _instance_add_kthread)
chooses an owner purely by instance priority and can assign kthreads to inactive
instances; change the selection so that you first scan for matching active
instances and only if none exist fall back to matching inactive instances: in
_plugin_add_kthread iterate self._instances.values() and prefer instance objects
with an active flag (e.g., instance.is_active or instance.active) when calling
_get_matching_kthreads and _instance_add_kthread, and when implementing
_instance_transfer_kthread update the transfer logic to skip handing a PID to an
inactive instance (or park it in _kthreads_assigned/_kthread_pids_unassigned)
until that instance becomes active; apply the same active-first selection change
to the other matching/transfer blocks referenced (the blocks around the other
ranges).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: d18d0240-aefe-4540-b064-3a74479c5c3d

📥 Commits

Reviewing files that changed from the base of the PR and between 295cad9 and 8aeeb84.

📒 Files selected for processing (1)
  • tuned/plugins/plugin_kthread.py

Comment thread tuned/plugins/plugin_kthread.py
Comment thread tuned/plugins/plugin_kthread.py Outdated
Signed-off-by: Adriaan Schmidt <adriaan.schmidt@siemens.com>
@adriaan42

Copy link
Copy Markdown
Contributor Author

@coderabbitai resume

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
✅ Action performed

Reviews resumed.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tuned/plugins/plugin_kthread.py`:
- Around line 438-454: In _instance_unapply_static, avoid the TOCTOU by
computing whether shutdown is needed while holding self._lock: after
decrementing self._instance_count inside the with self._lock: block, set a local
flag (e.g., need_shutdown = (self._instance_count == 0)); then release the lock
and, if need_shutdown and have_perf, call self._perf_monitor_shutdown(). This
keeps the count/check atomic (refer to _instance_unapply_static,
self._instance_count, and _perf_monitor_shutdown) while avoiding holding the
lock during the actual shutdown call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 1f148f83-1207-4ee6-93d0-245687766a30

📥 Commits

Reviewing files that changed from the base of the PR and between 8aeeb84 and 9a4895a.

📒 Files selected for processing (1)
  • tuned/plugins/plugin_kthread.py

Comment thread tuned/plugins/plugin_kthread.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants