Skip to content

refactor(irq-tuning): improve robustness and fix some corner cases#1010

Merged
xu282934741 merged 5 commits intokubewharf:mainfrom
zhanghaoyu1986:dev/v0.5.34-verify
Nov 4, 2025
Merged

refactor(irq-tuning): improve robustness and fix some corner cases#1010
xu282934741 merged 5 commits intokubewharf:mainfrom
zhanghaoyu1986:dev/v0.5.34-verify

Conversation

@zhanghaoyu1986
Copy link
Copy Markdown
Collaborator

What type of PR is this?

Bug fixes/Enhancements

What this PR does / why we need it:

fix some corner cases and improve the robustness and versatility

…ue and irq

chore(irq-tuning): improve universality of getting mapping of nic queue
and irq

Signed-off-by: 张浩宇 <zhanghaoyu.zhy@bytedance.com>
…ent number of leading "0"

fix(irq-tuning): fix ComparesHexBitmapStrings two strings with different
number of leading "0" character.

Signed-off-by: 张浩宇 <zhanghaoyu.zhy@bytedance.com>
…after irq affinity write

After setting the kernel's IRQ affinity, the kernel cannot guarantee
that an immediate subsequent read will return the latest value; in some
cases, it may return a stale value. This is because if
irq_do_set_affinity returns EBUSY, the kernel may invoke
irqd_set_move_pending to set the IRQD_SETAFFINITY_PENDING flag.
Subsequently, the next IRQ affinity write operation will store the newly
affinitized CPU mask in irq_desc->pending_mask.

At a later point, irq_move_masked_irq will copy irq_desc->pending_mask
to irq_desc->desc->irq_common_data.affinity and then clear the
IRQD_SETAFFINITY_PENDING flag within the interrupt context. The IRQ
affinity read function show_irq_affinity first reads
irq_desc->desc->irq_common_data.affinity and then checks whether the
IRQD_SETAFFINITY_PENDING flag is set. If the flag is set, it retrieves
the value from irq_desc->pending_mask.

If the QRM's IRQ affinity read syscall has already read the stale
irq_desc->desc->irq_common_data.affinity but not yet read
irq_desc->pending_mask, and the interrupt context's irq_move_masked_irq
clears the IRQD_SETAFFINITY_PENDING flag at this point, QRM will end up
reading a stale value.

Even if the SetIrqAffinity function can obtain the latest affinity value
before returning, it still cannot ensure that this latest value
originates from irq_desc->desc->irq_common_data.affinity. If the value
is instead fetched from irq_desc->pending_mask, a subsequent nic.sync
may retrieve a stale value when a race condition occurs between the
interrupt context's irqd_set_move_pending and QRM's IRQ affinity read
operations.

Therefore, we choose to trust the kernel: if SetIrqAffinity returns no
error, the IRQ affinity is considered to have been set correctly, and
there is no need for a subsequent sync from the kernel to verify its
correctness. Instead, IRQ affinity will be synced from the kernel during
periodic syncNics operations, with the values read from the kernel
compared against those managed by QRM.

Signed-off-by: 张浩宇 <zhanghaoyu.zhy@bytedance.com>
chore(irq-tuning): ListNetNS ignores non-existed netns

Signed-off-by: 张浩宇 <zhanghaoyu.zhy@bytedance.com>
syncNics only concerns NICs that can be shared among multiple
containers. Therefore, when listing active uplink NICs (via
ListActiveUplinkNics), container NICs (specifically SRIOV container
NICs) should be filtered out.

Signed-off-by: 张浩宇 <zhanghaoyu.zhy@bytedance.com>
@zhanghaoyu1986 zhanghaoyu1986 added the workflow/need-review review: test succeeded, need to review label Nov 3, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 4, 2025

Codecov Report

❌ Patch coverage is 26.34409% with 137 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.65%. Comparing base (85ac1d3) to head (499f48f).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
...amicpolicy/irqtuner/controller/controller_linux.go 0.00% 98 Missing ⚠️
pkg/util/machine/network_linux.go 56.97% 31 Missing and 6 partials ⚠️
pkg/util/machine/network.go 0.00% 2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (26.34%) is below the target coverage (50.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1010      +/-   ##
==========================================
+ Coverage   58.68%   59.65%   +0.97%     
==========================================
  Files         679      682       +3     
  Lines       77274    63929   -13345     
==========================================
- Hits        45347    38137    -7210     
+ Misses      27540    21385    -6155     
- Partials     4387     4407      +20     
Flag Coverage Δ
unittest 59.65% <26.34%> (+0.97%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread pkg/util/machine/network.go
@zhanghaoyu1986 zhanghaoyu1986 changed the title Dev/v0.5.34 verify refactor(irq-tuning): improve robustness and fix some corner cases Nov 4, 2025
@xu282934741 xu282934741 merged commit 59975f5 into kubewharf:main Nov 4, 2025
53 of 61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

workflow/need-review review: test succeeded, need to review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants