Skip to content

Conversation

@ailiop-suse
Copy link

Recent kernel commit 1773572863c4 ("thermal: netlink: Add the commands and the events for the thresholds") merged in v6.13 added the "THERMAL_GENL_ATTR_TZ_PREV_TEMP" attribute in the middle of enum thermal_genl_attr and so all the following attrs are now offset by one position.

This causes irqbalance to segfault during thermal event handling as it encounters a null nlattr in the parsed attrs array in the position where it expects to find THERMAL_GENL_ATTR_CAPACITY (which is now returned in the next array position). The null is unchecked and passed to nla_len() which dereferences it to obtain the nla_len, and triggers the segfault.

Fix this by removing the hardcoded attrs and instead including the uapi header file. Also, check the parsed attr for null to prevent crashing in case of similar kernel updates.

Recent kernel commit 1773572863c4 ("thermal: netlink: Add the commands
and the events for the thresholds") merged in v6.13 added the
"THERMAL_GENL_ATTR_TZ_PREV_TEMP" attribute in the middle of enum
thermal_genl_attr and so all the following attrs are now offset by one
position.

This causes irqbalance to segfault during thermal event handling as it
encounters a null nlattr in the parsed attrs array in the position where
it expects to find THERMAL_GENL_ATTR_CAPACITY (which is now returned in
the next array position). The null is unchecked and passed to nla_len()
which dereferences it to obtain the nla_len, and triggers the segfault.

Fix this by removing the hardcoded attrs and instead including the uapi
header file. Also, check the parsed attr for null to prevent crashing
in case of similar kernel updates.

Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
@nhorman
Copy link
Member

nhorman commented Feb 16, 2025

um, isn't changing attribute values a kernel ABI break? That seems like it should be fixed in the kernel

@ailiop-suse
Copy link
Author

um, isn't changing attribute values a kernel ABI break? That seems like it should be fixed in the kernel

Indeed, and it appears that it has already been addressed in commit c195b9c6ab9c ("thermal/netlink: Prevent userspace segmentation fault by adjusting UAPI header") which is already merged and part of v6.14-rc3.

Nevertheless, it seemed like a good idea to remove the hardcoded attrs and sync-up with the kernel uapi (for example all the CAPACITY attrs are misnamed, this has always been CAPABILITY in the kernel, and this was a bit confusing by itself while trying to determine the issue).

Considering though that 1773572863c4 made it out on a stable release (v6.13), perhaps it is desirable to keep the definitions local, since it would avoid other bugs in case e.g. irqbalance was compiled with v6.13 headers and then run on top of a v6.14 or later kernel. On the other hand, similar mishaps may happen and it will lead again to segfaults.

Would it make sense to keep the definitions local as they were, and just add the null check for the parsed attribute?

@ppwaskie
Copy link
Contributor

um, isn't changing attribute values a kernel ABI break? That seems like it should be fixed in the kernel

Indeed, and it appears that it has already been addressed in commit c195b9c6ab9c ("thermal/netlink: Prevent userspace segmentation fault by adjusting UAPI header") which is already merged and part of v6.14-rc3.

Nevertheless, it seemed like a good idea to remove the hardcoded attrs and sync-up with the kernel uapi (for example all the CAPACITY attrs are misnamed, this has always been CAPABILITY in the kernel, and this was a bit confusing by itself while trying to determine the issue).

Considering though that 1773572863c4 made it out on a stable release (v6.13), perhaps it is desirable to keep the definitions local, since it would avoid other bugs in case e.g. irqbalance was compiled with v6.13 headers and then run on top of a v6.14 or later kernel. On the other hand, similar mishaps may happen and it will lead again to segfaults.

Would it make sense to keep the definitions local as they were, and just add the null check for the parsed attribute?

I'm a fan of assuming the ABI isn't broken, and check in the unlikely event you run into a brain-damaged kernel.

I think your suggestion of leaving the local definitions as-is, and adding the check for the unlikely scenario the null check can happen in, and log loudly that the kernel they're running on is broken. Then compatibility is maintained without adding additional complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants