Tuning of individual kernel threads by adriaan42 · Pull Request #628 · redhat-performance/tuned

adriaan42 · 2024-04-17T14:57:51Z

In combination with #596 and #580, this PR implements the third feature needed to dynamically tune all relevant aspects of a realtime application using a dedicated HW device (typically a NIC).

Two things might need some discussion:

In my implementation I kept the basic idea that one instace can cover a number of different groups of threads. That makes it easy to migrate from the current scheduler plugin, but means we still need _has_dynamic_options, which is marked as a hack in plugins/base.py. The alternative would be to have one plugin instance per "group", which would make the profiles much longer.
```
group.ktimers=0:f:2:*:^\[ktimers
```
would become something like
```
[kthread_ktimers]
type=kthread
regex=^ktimers
policy=fifo
sched_prio=2
affinity=*
```
I copied the approach of using perf to monitor for creation of new threads. That means that when running both the scheduler plugin and the kthread plugin, we'd have two threads doing the same thing. For my applications that's not a problem, because I no longer use the scheduler plugin at all:
- scheduler handles three things: IRQ affinities, kernel threads, and userland threads
- For IRQ affinities I can use the irq plugin
- For kernel threads I can use kthread
- For userland threads I use systemd and cgroupv2, and I don't want TuneD to touch them

yarda · 2024-05-23T19:55:59Z

but means we still need _has_dynamic_options, which is marked as a hack in plugins/base.py.

It's OK for me, in long-term it's a candidate for rewrite/refactor, but there are other plugins using it as well. We will probably keep the idea and if we change the implementation, this could be then updated in all affected plugins the same way.

yarda · 2024-05-23T20:01:35Z

Regarding the cgroups, there is support for cgroups v1 in the scheduler plugin and we would also like to add support for the v2 for completeness. It could be useful for somebody.

It's OK if you are not using some plugin. We even wanted to add global configuration option allowing selective disablement of specific plugins in the stock profiles.

adriaan42 · 2024-05-24T10:59:12Z

Regarding the cgroups, there is support for cgroups v1 in the scheduler plugin and we would also like to add support for the v2 for completeness. It could be useful for somebody.

I found the whole cgroup topic to be rather tricky, because in modern systems, SystemD is the "cgroup manager", and it owns (by convention) the cgroup tree. So any creation of new cgroups should happen via SystemD, and can then use Delegation to create further sub-groups.

I've had some success with:

set AllowedCPUs on all the default slices (system.slice, user.slice, init.scope) to restrict all "normal" processes. This to some extent replaces the isolcpus= kernel option.
Create an isolated.slice using SystemD, with access to the desired CPUs, and then use Slice=isolated in my service file (or systemd-run --slice=isolated when launching from a shell) to gain access to the isolated CPUs.

But simply having TuneD move processes around seems like it could have unwanted side-effects, and should be handled with care...

adriaan42 · 2024-10-01T08:24:04Z

Update:

rebased on latest main
added check of perf events, according to finding by @zacikpa in scheduler: Do not assume that perf events have type attribute #691

@yarda did you already have a chance to review this PR?

yarda · 2024-10-14T12:26:17Z

Sorry for the delay, I am back on it.

The 5d28337 LGTM.

Regarding the kthread plugin is there a specific need for the proposed syntax:

group.ksoftirqd=0:f:2:*:^\[ksoftirqd

Wouldn't be better to use one group per instance? I.e. the same affinity/sched_opts setting for individual instance, e.g.:

[ksoftirqd]
type=kthread
devices_udev_regex=^\[ksoftirqd
setting=0:f:2:*

Or:

[ksoftirqd]
type=kthread
devices_udev_regex=^\[ksoftirqd
schedopts=SCHED_FIFO
affinity=*

Then for the priority the builtin instance priority option could be used. The affinity if unset could default to *.

You could also specify multiple regexes per instance:

[ksoftirqd_and_ksmd]
type=kthread
devices_udev_regex=^\[(ksoftirqd)|(ksmd)
schedopts=SCHED_FIFO

adriaan42 · 2024-10-16T07:59:15Z

Wouldn't be better to use one group per instance? I.e. the same affinity/sched_opts setting for individual instance, e.g.:

I thought about this when I wrote the plugin, and didn't like it because

I like the "default" of having only one instance per plugin in the profile, unless doing some specific optimizations. That instance has the name of the plugin, so type= is not needed. In the current profiles only few very specific cases need multiple instances (e.g. ThunderX).
The approach with multiple instances would make the common case (what's currently used in the realtime profiles) much more complex to express.

I find the current format in the scheduler plugin quite clear, except for minor points I've changed:

unify policy and priority into one option (f50 instead of f:50) because setting those separately makes no sense to me
remove the braces [] from thread names (the scheduler plugin adds them to identify kernel threads, but here we have only kernel threads, so no need for this)

[ksoftirqd]
type=kthread
devices_udev_regex=^\[ksoftirqd
setting=0:f:2:*

Also re-using the existing devices_udev_regex is not very clean, as we're not dealing with udev Devices here. So the setting we're using in the profiles would be "wrong", and internally we'd need to create pyudev.Device objects just to use the existing mechanism.

yarda · 2024-11-06T16:33:59Z

Here in one instance there can be multiple "devices" with multiple different tuning. I don't like that it breaks the used logic concept that in each instance there are "devices" with the same tuning. I am afraid that it could lead to multiple problems later (e.g. when moving "devices" between instances through the API).

But maybe it's just my personal preference, @zacikpa, @jmencak what's your opinion?

zacikpa · 2024-11-07T10:49:15Z

I'm personally fine with the implementation as it is now (since it's inspired by the groups.* implementation in the scheduler plugin), but I would not refer to the kthreads as devices anywhere in the code/comments. In other words, the new plugin would not support "devices" in the TuneD sense, similarly to the scheduler plugin.

yarda · 2024-11-07T11:35:37Z

No device plugin, like e.g. the sysfs plugin, this may work.

adriaan42 · 2024-11-11T13:08:05Z

No device plugin, like e.g. the sysfs plugin, this may work.

One of the main points of the new plugin is to allow dynamic changes of the tuning (through the instance_[create|destroy] dbus calls). That only works on device-plugins (based on hotplug.Plugin).

zacikpa · 2024-11-11T14:29:56Z

Huh, taking back what I said here, I say we should adjust instance_create and instance_destroy to work with any plugin, not just child classes of hotplug.Plugin.

IIUC, you can't and don't plan to use instance_acquire_devices with this plugin, am I right? (Because it does not support what TuneD calls "devices").

adriaan42 · 2024-11-14T07:30:10Z

Huh, taking back what I said here, I say we should adjust instance_create and instance_destroy to work with any plugin, not just child classes of hotplug.Plugin.

The change itself is probably simple, but I'm not sure what would happen for any of the existing non-hotplug Plugins if multiple instances were created. I expect those instances would just interfere with each other and break things.

IIUC, you can't and don't plan to use instance_acquire_devices with this plugin, am I right? (Because it does not support what TuneD calls "devices").

instance_acquire_devices could be used, but it does not make much sense in this case. (One could force the transfer of a "device" to an instance that does not match it, but then no tuning would be applied).

Treating kthreads as "devices" lets me reuse a lot of nice infrastructure. But it would of course be possible to do this all within the plugin.

adriaan42 · 2024-12-03T12:49:26Z

Huh, taking back what I said here, I say we should adjust instance_create and instance_destroy to work with any plugin, not just child classes of hotplug.Plugin.

The change itself is probably simple, but I'm not sure what would happen for any of the existing non-hotplug Plugins if multiple instances were created. I expect those instances would just interfere with each other and break things.

IIUC, you can't and don't plan to use instance_acquire_devices with this plugin, am I right? (Because it does not support what TuneD calls "devices").

instance_acquire_devices could be used, but it does not make much sense in this case. (One could force the transfer of a "device" to an instance that does not match it, but then no tuning would be applied).

Treating kthreads as "devices" lets me reuse a lot of nice infrastructure. But it would of course be possible to do this all within the plugin.

@zacikpa I just pushed a draft that bases the plugin on base.Plugin instead of hotplug.Plugin. The changes in controller.py probably need some more work.

yarda · 2025-01-18T18:50:06Z

@adriaan42 could you please squash the commits? Multiple commits each introducing separate logical change are OK, but multiple commits on the same file (e.g. plugin_kthread.py) reverting/changing things are making it a bit confusing.

Regarding the controller.py changes, I am afraid this could break something - this needs thorough testing at least. I am not brave enough to push it into the release now.

@zacikpa your opinion?

yarda · 2025-01-18T19:05:51Z

Or maybe split this PR to two PRs. The commit adding switch for disabling kthreads can be merged immediately and removing it from this PR could make rebasing easier.

adriaan42 · 2025-01-21T07:35:55Z

@adriaan42 could you please squash the commits? Multiple commits each introducing separate logical change are OK, but multiple commits on the same file (e.g. plugin_kthread.py) reverting/changing things are making it a bit confusing.

I pushed the changes as separate commit as part of the discussion, to show required changes and maybe get some feedback (confirmation that this might be the way to go before I spend effort to finalize the imlementation). Will post an update soon.

Regarding the controller.py changes, I am afraid this could break something - this needs thorough testing at least. I am not brave enough to push it into the release now.

I agree. I'm happy to look into better solutions and possible side-effects, if we agree that "allow dynamic instances of non-device plugins" is reasonable and has a chance to be integrated. Again, trying to avoid effort for something that might never get merged...

Or maybe split this PR to two PRs. The commit adding switch for disabling kthreads can be merged immediately and removing it from this PR could make rebasing easier.

Sure! -> #734

adriaan42 · 2025-01-21T15:43:10Z

Update:

rebased on current master
split off plugin_scheduler: add switch to disable processing of kthreads #734 and Allow dynamic instances of non-hotplug plugins #735
squashed plugin_kthread commits
made perf optional to follow plugin_scheduler: make perf support optional #716
added code to delay tuning of newly created kthreads to deal with a race we're seeing with irq/* threads (both TuneD and the thread itself setting scheduling policy/priority).

adriaan42 · 2025-02-19T09:17:17Z

Update:

rebase following the latest release
implemented seamless transition of kthreads between plugin instances (similar to Seamless transfer of devices between plugin instances #750, but here the plugin needs to handle everything as part of "static" tuning, because we're no longer a "device" plugin)

coderabbitai · 2026-06-05T13:23:56Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 739758c9-cca4-439f-8972-47c6685a6fab

📥 Commits

Reviewing files that changed from the base of the PR and between 9a4895a and be3e6f6.

📒 Files selected for processing (1)

tuned/plugins/plugin_kthread.py

🚧 Files skipped from review as they are similar to previous changes (1)

tuned/plugins/plugin_kthread.py

📝 Walkthrough

Summary by CodeRabbit

New Features
- Kernel-thread tuning via per-group rules (name regex) to set CPU affinity and scheduler policy/priority.
- Dynamic ownership and priority-aware transfer of threads across instances, preserving and restoring original settings.
- Activation/deactivation behavior: immediate tuning on activate, rollback/restore on deactivate, and deferred acquisition from lower-priority instances.
- Optional perf-backed monitoring to reduce races and runtime verification of applied settings.

Walkthrough

Adds a kthread plugin that discovers kernel threads, parses prioritized group. rules, assigns threads to plugin instances with priority-aware transfers, and applies/restores CPU affinity and scheduler policy/priority; optional python-perf integration provides delayed dynamic monitoring for newly spawned threads.

Changes

Kernel Thread Tuning Plugin

Layer / File(s)	Summary
Module imports, feature gate, and exceptions `tuned/plugins/plugin_kthread.py`	Detects python-perf availability; defines `ThreadNoLongerExists` and `AffinityNotChangeable`.
Per-thread and rule data containers `tuned/plugins/plugin_kthread.py`	Defines `KthreadInfo`, `SchedOpts`, and `GroupCmd` for per-thread bookkeeping and parsed group rules (`regex.fullmatch`).
KthreadPlugin initialization and backend selection `tuned/plugins/plugin_kthread.py`	Constructor initializes bookkeeping, selects SchedulerUtils backend, attempts perf setup, and exposes `cleanup()`.
Plugin device/init and initial scan `tuned/plugins/plugin_kthread.py`	Creates unassigned PID set and performs initial /proc scan; wires plugin-level add/remove routing to instances.
Instance init and group. rule parsing* `tuned/plugins/plugin_kthread.py`	Instance tracking/state and parsing/validation of `group.*` options into ordered `GroupCmd` rules with compiled regex and parsed sched/affinity; acquires initial kthreads.
Rule matching and PID filtering `tuned/plugins/plugin_kthread.py`	Finds first matching `SchedOpts` via `regex.fullmatch(kthread.comm)` and filters PID sets while removing vanished threads.
Instance add/remove behavior `tuned/plugins/plugin_kthread.py`	Assigns matched PIDs to instances; when active applies tuning immediately (rollback on error), otherwise records assignment; removal restores tuned threads and prunes vanished PIDs.
Priority-based acquisition and transfer `tuned/plugins/plugin_kthread.py`	Transfers preserve assigned vs processed semantics; acquisition prefers unassigned then steals from lower-priority instances while avoiding active-instance stealing when inactive; release returns PIDs appropriately.
Activation, apply/unapply lifecycle and verification `tuned/plugins/plugin_kthread.py`	Activation rescans and may start perf monitor, applies tuning to assigned PIDs and marks them processed, then acquires deferred PIDs; verification checks processed PIDs; deactivation un-applies tuning and optionally fully restores originals, stopping perf when last instance deactivates.
/proc discovery and bookkeeping `tuned/plugins/plugin_kthread.py`	Scans procfs for PF_KTHREAD threads, maintains `KthreadInfo` entries and unassigned PID set under lock, removes disappeared threads, and raises `ThreadNoLongerExists` for missing PIDs.
Perf-based dynamic monitoring `tuned/plugins/plugin_kthread.py`	When python-perf is available, starts a monitor thread that delays handling of `RECORD_COMM` before adding threads and removes threads on `RECORD_EXIT`.
Low-level scheduling wrappers `tuned/plugins/plugin_kthread.py`	SchedulerUtils wrappers translate ESRCH to `ThreadNoLongerExists`, detect non-changeable affinity and raise `AffinityNotChangeable`, and format SchedOpts for logging/verification.
Apply tuning and rollback `tuned/plugins/plugin_kthread.py`	Capture original affinity/sched on first tuning, apply affinity and/or scheduler updates, record tuned flags, and rollback affinity if scheduler update fails after affinity change.
Change transitions and restoration `tuned/plugins/plugin_kthread.py`	Subsequent tuning changes selectively restore affinity/sched when desired opts omit fields; full restore honors `kthread.affinity_changeable` and clears saved originals.
Verification `tuned/plugins/plugin_kthread.py`	Verifies scheduler policy/priority always when specified; verifies affinity only when requested and `affinity_changeable` is truthy, normalizing cpulist formatting.

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Tuning of individual kernel threads' directly and clearly describes the main change: adding a new kthread plugin for tuning kernel threads.
Description check	✅ Passed	The description is highly relevant, explaining the purpose of the PR, design decisions, and implementation considerations related to the kthread plugin.
Docstring Coverage	✅ Passed	Docstring coverage is 88.37% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tuned/plugins/plugin_kthread.py`:
- Around line 331-353: The code reads and unions
other_instance._kthreads_assigned/_kthreads_processed and iterates those sets
without holding self._lock, causing concurrent-modification races; fix by
acquiring self._lock around every access/union/iteration of per-instance kthread
sets in _instance_release_kthreads and in the prior transfer loop (the block
using _get_matching_kthreads and calling _instance_transfer_kthread), e.g.
snapshot sets with set(...) while holding self._lock (using self._lock:), then
release the lock if you want to perform heavier work outside, or perform the
transfer while still holding the lock to prevent RuntimeError; ensure all
accesses to _kthreads_assigned, _kthreads_processed, _kthread_pids_unassigned
and calls to _instance_add_kthread/_instance_remove_kthread are protected by the
same self._lock.
- Around line 620-642: The method _apply_kthread_tuning currently assigns
kthread.sched_orig before performing changes, so if _set_affinity succeeds but
_set_schedopts later raises the PID becomes partially tuned and unrevertible; to
fix, capture current_affinity/current_policy/current_priority in local vars,
attempt to apply affinity and scheduling changes inside a try/except and only
set kthread.sched_orig = SchedOpts(...) and tuned_* flags after both operations
succeed, and on any exception revert any change already applied (call
_set_affinity(pid, original_affinity) and _set_schedopts(pid, original_policy,
original_priority) as needed), propagate the error, and ensure
affinity_changeable is set correctly based on whether revert succeeded; use the
existing helpers _set_affinity, _set_schedopts and _get_affinity/_get_schedopts
to locate where to implement this.
- Around line 137-142: When calling self._perf_setup() (guarded by have_perf)
wrap the call in a try/except to catch any exceptions (e.g., Exception as e)
raised during perf initialization and fall back to the non-perf path by calling
self._disable_perf(); also emit the same warning used when python-perf was not
found but include the caught exception details so the plugin degrades gracefully
instead of aborting. Target the call site where have_perf is checked and update
logic around self._perf_setup(), referencing the _perf_setup() and
_disable_perf() methods and the existing log.warning message.
- Around line 225-230: The current validation only checks SCHED_OTHER==0 and a
blanket 0..99 range; update the validation in the parsing/validation block that
uses variables policy and priority (in tuned/plugins/plugin_kthread.py) to use
policy-specific bounds: call self._scheduler_utils.get_priority_min(policy) and
get_priority_max(policy), ensure priority is between those bounds, and produce
policy-aware error messages (e.g., for SCHED_BATCH/SCHED_IDLE require 0, for
SCHED_FIFO/SCHED_RR disallow 0) instead of the generic 0..99 check so invalid
values (like b50, i1, f0) are rejected before _set_schedopts() is called.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 0573457f-0de3-4f29-99b2-088e885424ba

📥 Commits

Reviewing files that changed from the base of the PR and between 0eb28ac and a4c0fb6.

📒 Files selected for processing (1)

tuned/plugins/plugin_kthread.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tuned/plugins/plugin_kthread.py`:
- Around line 348-356: When reassigning free_kthreads you must iterate other
instances in priority order instead of dict order; change the loop that
currently iterates self._instances.values() to iterate instances sorted by their
priority (highest first) and skip the releasing instance. Use each instance's
priority attribute (e.g., instance.priority or instance._priority) as the sort
key, then call self._get_matching_kthreads(...) and
self._instance_transfer_kthread(...) in that order so overlapping match sets go
to higher-priority instances first.
- Around line 139-144: The _perf_setup() failure path in KthreadPlugin.__init__
can leak partially initialized perf resources; update the except block so it
invokes self._perf_shutdown() (to close any evlist/evsel/fds created) before
setting the global have_perf = False and logging the warning, and ensure any
exceptions from _perf_shutdown() are caught/suppressed or logged to avoid
masking the original error; reference KthreadPlugin.__init__, _perf_setup(),
_perf_shutdown(), cleanup(), and the global have_perf.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 63ae5d08-6824-4185-addc-db8e9f3a8741

📥 Commits

Reviewing files that changed from the base of the PR and between a4c0fb6 and 0f50e25.

📒 Files selected for processing (1)

tuned/plugins/plugin_kthread.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tuned/plugins/plugin_kthread.py (2)

292-303: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Isolate per-thread failures so the perf monitor doesn't die.

_instance_add_kthread() only handles ThreadNoLongerExists. Any other _apply_kthread_tuning() failure bubbles through _kthread_internal_add() into _perf_monitor_thread(), which has no catch. One bad kthread can therefore terminate dynamic monitoring for all future kthreads until TuneD restarts.

Suggested fix

 		while not self._terminate.is_set():
 			# process new threads when their delay has passed
 			now = time.time()
 			while len(new_threads) > 0 and now >= new_threads[0].ts:
-				self._kthread_internal_add(new_threads.pop(0).tid)
+				tid = new_threads.pop(0).tid
+				try:
+					self._kthread_internal_add(tid)
+				except Exception as e:
+					log.error("Failed to process new kthread %d: %s" % (tid, e))
@@
 						if event.type == perf.RECORD_COMM:
 							new_threads.append(NewThread(now + thread_add_delay_s, event.tid))
 						elif event.type == perf.RECORD_EXIT:
-							self._kthread_internal_remove(event.tid)
+							try:
+								self._kthread_internal_remove(event.tid)
+							except Exception as e:
+								log.error("Failed to process kthread exit %d: %s" % (event.tid, e))

Also applies to: 537-583

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tuned/plugins/plugin_kthread.py` around lines 292 - 303,
_instance_add_kthread currently only catches ThreadNoLongerExists so any other
exception from _get_instance_sched_options or _apply_kthread_tuning will
propagate up into _kthread_internal_add and kill _perf_monitor_thread; wrap the
work that fetches options and applies tuning (the call(s) to
_get_instance_sched_options and _apply_kthread_tuning) in a broad try/except
that catches Exception (while keeping the existing ThreadNoLongerExists handling
separate), log the error, call _kthread_internal_remove(pid) to clean up, and
only add pid to instance._kthreads_processed on successful completion; apply the
identical defensive try/except pattern to the other similar block referenced
(the second occurrence around the 537–583 region) so one bad kthread cannot
terminate the perf monitor.

357-372: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear dead PIDs out of the unassigned pool.

_kthread_internal_remove() never discards the PID from _kthread_pids_unassigned, and _instance_release_kthreads() can add it back after removal. In a long-lived daemon that means dead PIDs accumulate in the unassigned set and get re-scanned on every later acquire/release cycle.

Suggested fix

 	def _instance_release_kthreads(self, instance):
@@
 			# the remaining ones go back to unassigned
 			for pid in free_kthreads:
 				self._instance_remove_kthread(instance, pid)
-				self._kthread_pids_unassigned.add(pid)
+				if pid in self._kthreads:
+					self._kthread_pids_unassigned.add(pid)
@@
 	def _kthread_internal_remove(self, pid):
@@
 		try:
 			with self._lock:
+				self._kthread_pids_unassigned.discard(pid)
 				del self._kthreads[pid]
 				self._plugin_remove_kthread(pid)

Also applies to: 467-477

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tuned/plugins/plugin_kthread.py` around lines 357 - 372, _dead_ PIDs are
never removed from _kthread_pids_unassigned so unassigned set grows; update the
removal logic so dead PIDs are purged: modify _kthread_internal_remove (or
ensure callers like _instance_remove_kthread and _instance_release_kthreads) to
also discard the pid from _kthread_pids_unassigned (and only add back to
_kthread_pids_unassigned if the pid is still alive/valid). Use the symbols
_kthread_internal_remove, _instance_remove_kthread, _instance_release_kthreads
and _kthread_pids_unassigned to locate the code and implement the check/remove
so dead pids are not left in the unassigned pool.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tuned/plugins/plugin_kthread.py`:
- Around line 142-149: The current fallback path swallows exceptions from
_perf_shutdown() after a failed _perf_setup(), forcing have_perf=False and
preventing cleanup() from retrying; change the except-pass behavior so that any
exception raised by _perf_shutdown() is not silently dropped—either re-raise the
shutdown exception (preferably chaining it to the original _perf_setup()
exception) or record it and raise a combined exception so callers see the
failure; apply the same change to the other similar blocks handling
_perf_setup/_perf_shutdown (the other occurrences around the same pattern) so
resource cleanup failures are propagated and cleanup() can retry.

---

Outside diff comments:
In `@tuned/plugins/plugin_kthread.py`:
- Around line 292-303: _instance_add_kthread currently only catches
ThreadNoLongerExists so any other exception from _get_instance_sched_options or
_apply_kthread_tuning will propagate up into _kthread_internal_add and kill
_perf_monitor_thread; wrap the work that fetches options and applies tuning (the
call(s) to _get_instance_sched_options and _apply_kthread_tuning) in a broad
try/except that catches Exception (while keeping the existing
ThreadNoLongerExists handling separate), log the error, call
_kthread_internal_remove(pid) to clean up, and only add pid to
instance._kthreads_processed on successful completion; apply the identical
defensive try/except pattern to the other similar block referenced (the second
occurrence around the 537–583 region) so one bad kthread cannot terminate the
perf monitor.
- Around line 357-372: _dead_ PIDs are never removed from
_kthread_pids_unassigned so unassigned set grows; update the removal logic so
dead PIDs are purged: modify _kthread_internal_remove (or ensure callers like
_instance_remove_kthread and _instance_release_kthreads) to also discard the pid
from _kthread_pids_unassigned (and only add back to _kthread_pids_unassigned if
the pid is still alive/valid). Use the symbols _kthread_internal_remove,
_instance_remove_kthread, _instance_release_kthreads and
_kthread_pids_unassigned to locate the code and implement the check/remove so
dead pids are not left in the unassigned pool.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 1dff8677-9b1e-41be-87e2-933ea4f0cfa1

📥 Commits

Reviewing files that changed from the base of the PR and between 0f50e25 and 295cad9.

📒 Files selected for processing (1)

tuned/plugins/plugin_kthread.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tuned/plugins/plugin_kthread.py`:
- Around line 292-305: The code in _instance_add_kthread currently mutates
ownership/bookkeeping before calling _apply_kthread_tuning, risking permanent
loss on non-ESRCH failures; change the flow so the PID is not removed from its
current owner nor from global bookkeeping until _apply_kthread_tuning(kthread,
opts) completes successfully, or ensure a rollback that re-inserts the PID back
into the previous instance/set and global maps if tuning throws; specifically
wrap the sequence using _kthread_get, _get_instance_sched_options and
_apply_kthread_tuning in a try/except where any exception triggers restoration
via re-adding the PID to the source instance and avoiding call to
_kthread_internal_remove, and apply the same atomic/rollback fix to the
analogous transfer paths that use the same helpers.
- Around line 179-185: The current kthread assignment logic (in
_plugin_add_kthread and in the transfer code paths using
_instance_transfer_kthread / _get_matching_kthreads / _instance_add_kthread)
chooses an owner purely by instance priority and can assign kthreads to inactive
instances; change the selection so that you first scan for matching active
instances and only if none exist fall back to matching inactive instances: in
_plugin_add_kthread iterate self._instances.values() and prefer instance objects
with an active flag (e.g., instance.is_active or instance.active) when calling
_get_matching_kthreads and _instance_add_kthread, and when implementing
_instance_transfer_kthread update the transfer logic to skip handing a PID to an
inactive instance (or park it in _kthreads_assigned/_kthread_pids_unassigned)
until that instance becomes active; apply the same active-first selection change
to the other matching/transfer blocks referenced (the blocks around the other
ranges).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: d18d0240-aefe-4540-b064-3a74479c5c3d

📥 Commits

Reviewing files that changed from the base of the PR and between 295cad9 and 8aeeb84.

📒 Files selected for processing (1)

tuned/plugins/plugin_kthread.py

Signed-off-by: Adriaan Schmidt <adriaan.schmidt@siemens.com>

adriaan42 · 2026-06-08T08:09:29Z

@coderabbitai resume

coderabbitai · 2026-06-08T08:09:35Z

✅ Action performed

Reviews resumed.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tuned/plugins/plugin_kthread.py`:
- Around line 438-454: In _instance_unapply_static, avoid the TOCTOU by
computing whether shutdown is needed while holding self._lock: after
decrementing self._instance_count inside the with self._lock: block, set a local
flag (e.g., need_shutdown = (self._instance_count == 0)); then release the lock
and, if need_shutdown and have_perf, call self._perf_monitor_shutdown(). This
keeps the count/check atomic (refer to _instance_unapply_static,
self._instance_count, and _perf_monitor_shutdown) while avoiding holding the
lock during the actual shutdown call.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 1f148f83-1207-4ee6-93d0-245687766a30

📥 Commits

Reviewing files that changed from the base of the PR and between 8aeeb84 and 9a4895a.

📒 Files selected for processing (1)

tuned/plugins/plugin_kthread.py

adriaan42 mentioned this pull request Apr 29, 2024

Dynamic tuning of IRQs and kthreads #631

Open

adriaan42 force-pushed the adriaan/kthreads branch from c2c554a to 1c03a83 Compare May 22, 2024 07:01

github-advanced-security AI found potential problems May 23, 2024

View reviewed changes

Comment thread tuned/plugins/plugin_kthread.py Fixed

Comment thread tuned/plugins/plugin_kthread.py Fixed

Comment thread tuned/plugins/plugin_kthread.py

adriaan42 force-pushed the adriaan/kthreads branch from 1c03a83 to 826223f Compare May 24, 2024 10:43

adriaan42 force-pushed the adriaan/kthreads branch from 826223f to 262305b Compare June 18, 2024 05:24

adriaan42 force-pushed the adriaan/kthreads branch from 262305b to f50ea9e Compare July 2, 2024 14:25

adriaan42 requested a review from yarda July 2, 2024 14:25

adriaan42 force-pushed the adriaan/kthreads branch from f50ea9e to 55f965d Compare July 9, 2024 04:45

adriaan42 mentioned this pull request Jul 25, 2024

Device-independent, instance-specific tuning #662

Open

adriaan42 force-pushed the adriaan/kthreads branch from 55f965d to 3e91c01 Compare July 25, 2024 05:49

github-advanced-security AI found potential problems Jul 25, 2024

View reviewed changes

Comment thread tuned/plugins/plugin_kthread.py Fixed

adriaan42 force-pushed the adriaan/kthreads branch from 3e91c01 to 97b2e60 Compare October 1, 2024 08:06

adriaan42 force-pushed the adriaan/kthreads branch from 97b2e60 to 4c924b9 Compare December 3, 2024 12:42

adriaan42 mentioned this pull request Jan 21, 2025

plugin_scheduler: add switch to disable processing of kthreads #734

Merged

adriaan42 mentioned this pull request Jan 21, 2025

Allow dynamic instances of non-hotplug plugins #735

Open

adriaan42 force-pushed the adriaan/kthreads branch from 4c924b9 to 502b280 Compare January 21, 2025 15:37

adriaan42 force-pushed the adriaan/kthreads branch from 502b280 to a22a223 Compare February 19, 2025 08:01

adriaan42 force-pushed the adriaan/kthreads branch from a22a223 to 545a307 Compare May 21, 2025 05:03

adriaan42 force-pushed the adriaan/kthreads branch from 545a307 to f5eba3c Compare June 3, 2025 06:08

adriaan42 force-pushed the adriaan/kthreads branch from f5eba3c to b068785 Compare February 23, 2026 17:09

adriaan42 force-pushed the adriaan/kthreads branch from b068785 to a4c0fb6 Compare June 5, 2026 13:23

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread tuned/plugins/plugin_kthread.py

Comment thread tuned/plugins/plugin_kthread.py Outdated

Comment thread tuned/plugins/plugin_kthread.py Outdated

Comment thread tuned/plugins/plugin_kthread.py Outdated

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread tuned/plugins/plugin_kthread.py

Comment thread tuned/plugins/plugin_kthread.py

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread tuned/plugins/plugin_kthread.py

adriaan42 force-pushed the adriaan/kthreads branch from 295cad9 to 8aeeb84 Compare June 8, 2026 06:37

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread tuned/plugins/plugin_kthread.py

Comment thread tuned/plugins/plugin_kthread.py Outdated

feat: add plugin_kthread

9a4895a

Signed-off-by: Adriaan Schmidt <adriaan.schmidt@siemens.com>

adriaan42 force-pushed the adriaan/kthreads branch from 8aeeb84 to 9a4895a Compare June 8, 2026 08:02

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread tuned/plugins/plugin_kthread.py Outdated

fix: address review finding

be3e6f6

Uh oh!

Conversation

adriaan42 commented Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yarda commented May 23, 2024

Uh oh!

yarda commented May 23, 2024

Uh oh!

adriaan42 commented May 24, 2024

Uh oh!

Uh oh!

adriaan42 commented Oct 1, 2024

Uh oh!

yarda commented Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adriaan42 commented Oct 16, 2024

Uh oh!

yarda commented Nov 6, 2024

Uh oh!

zacikpa commented Nov 7, 2024

Uh oh!

yarda commented Nov 7, 2024

Uh oh!

adriaan42 commented Nov 11, 2024

Uh oh!

zacikpa commented Nov 11, 2024

Uh oh!

adriaan42 commented Nov 14, 2024

Uh oh!

adriaan42 commented Dec 3, 2024

Uh oh!

yarda commented Jan 18, 2025

Uh oh!

yarda commented Jan 18, 2025

Uh oh!

adriaan42 commented Jan 21, 2025

Uh oh!

adriaan42 commented Jan 21, 2025

Uh oh!

adriaan42 commented Feb 19, 2025

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

adriaan42 commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

adriaan42 commented Apr 17, 2024 •

edited

Loading

yarda commented Oct 14, 2024 •

edited

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading