Skip to content

reactor: Disable hot polling if wakeup granularity is too high #2715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 10, 2025

Conversation

StephanDollberg
Copy link
Contributor

We are seeing an issue where rand write performance under performs
massively (100x).

This is caused by the linux dio kernel thread/workqueue being starved
and hence aio write completitions aren't being served in a timely
manner.

This doesn't happen using "default" linux settings but only if
/proc/sys/kernel/sched_wakeup_granularity_ns or
(/sys/kernel/debug/sched/wakeup_granularity_ns on newer kernels) is
raised.

Specifically this effect can be observed on RHEL-8 as the tuned
version that ships with it sets this value to 15000000 but can
reproduced on any other system by just bumping that value.

This patch tries to detect this being the case and if so it will warn
and disable hot polling (both --poll-aio and --idle-poll-time-us)
which gives back the majority of the performance.

Note because this setting has moved to debug fs on newer kernels which
requires root rights to read it's actually not very likely that we will
be able to detect it on these. However, RHEL8 uses an older kernel and
is likely the major offender to run into this bug (we have had multiple
customers run into this).

Ref #2696

We are seeing an issue where rand write performance under performs
massively (100x).

This is caused by the linux dio kernel thread/workqueue being starved
and hence aio write completitions aren't being served in a timely
manner.

This doesn't happen using "default" linux settings but only if
`/proc/sys/kernel/sched_wakeup_granularity_ns` or
(`/sys/kernel/debug/sched/wakeup_granularity_ns` on newer kernels) is
raised.

Specifically this effect can be observed on RHEL-8 as the `tuned`
version that ships with it sets this value to 15000000 but can
reproduced on any other system by just bumping that value.

This patch tries to detect this being the case and if so it will warn
and disable hot polling (both `--poll-aio` and `--idle-poll-time-us`)
which gives back the majority of the performance.

Note because this setting has moved to debug fs on newer kernels which
requires root rights to read it's actually not very likely that we will
be able to detect it on these. However, RHEL8 uses an older kernel and
is likely the major offender to run into this bug (we have had multiple
customers run into this).

Ref scylladb#2696
// dio thread will be starved otherwise
// see https://github.com/scylladb/seastar/issues/2696
if (!reactor_cfg.no_poll_aio || reactor_cfg.max_poll_time != 0us) {
auto [wakeup_file, granularity] = wakeup_granularity();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think max_poll_time is relevant. If no_poll_aio is false (default) we enter the if and don't care about max_poll_time. If it's true, then we may still want to spin for a few microseconds before entering the kernel to avoid the expensive sleep path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the spin still takes some time and I think the default is 100usec on a virtualized environment. Not sure what's best. Since it's a bad case, anyway, maybe setting it to zero is the safer step.

Note with tuned (not tuned(8)) setting of 0.5ms scheduling granularity and 100us spins, we can spend a large fraction of time spinning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah as per my tests in the issue thread changing --idle-poll-time-us definitely does have an additional impact on top of --poll-aio but yes the latter is the more important one.

@avikivity avikivity merged commit 099cf61 into scylladb:master Apr 10, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants