Skip to content

Fix accidental spinloop in connection accept logic#1979

Merged
rukai merged 2 commits intoshotover:mainfrom
rukai:fix_accidental_spin_loop
Mar 1, 2026
Merged

Fix accidental spinloop in connection accept logic#1979
rukai merged 2 commits intoshotover:mainfrom
rukai:fix_accidental_spin_loop

Conversation

@rukai
Copy link
Copy Markdown
Contributor

@rukai rukai commented Feb 27, 2026

Problem

During implementation of the hot reload functionality, a spin loop was accidentally introduced into the connection acception loop.
This does not impact shotovers ability to function, as evidenced by all the tests continuing to pass, but it is a huge performance concern as 1 CPU core will always be spent in a loop.

I discovered the problem because the connection IDs were jumping to ridiculously high values in the integration tests. Here you can see a connection ID of 12787950, which implies that 12787950 connections have been made during this test.

image

When in reality a driver will only make ~10 connections during a lengthy test.

image

Cause

The loop is this one here:

The tokio select found here, will wait until either:

  • a connection has come in that we need to accept
  • a hot reload request has come in that needs to be processed.

However when hot reload is disabled (the default configuration) the hot_reload_rx channel is closed, and so the hot reload branch of the select will always immediately trigger. When the hot reload branch triggers, nothing happens, the loop ends, and starts from the beginning again.

Why is hot_reload_rx closed?

The other end of the hot_reload_rx is the hot_reload_tx, which is stored in Source:

pub struct Source {
pub join_handle: JoinHandle<()>,
pub hot_reload_tx: UnboundedSender<HotReloadListenerRequest>,
pub gradual_shutdown_tx: UnboundedSender<GradualShutdownRequest>,
pub name: String,
}

Source::into_join_handle consumes the Source, dropping everything inside the Source, except for the join_handle which is returned. You can tell it consumes Source because the method argument is self instead of &self. see: https://google.github.io/comprehensive-rust/methods-and-traits/methods.html

pub fn into_join_handle(self) -> JoinHandle<()> {
self.join_handle
}

As a result, hot_reload_tx is dropped during shotover setup, closing the channel.

The fix

The fix is subtle, but the goal here is to ensure that the hot_reload_tx remains alive for as long as the listener background task is alive.
We do this by changing the into_join_handle into an async method, this ensures that the rest of Source is kept alive for the lifetime of the listener task.

Technically the std::mem::drop(self.hot_reload_tx); isnt required, since the drop would occur implicitly at the end of the function, but since this issue is subtle, I figure its better to be explicit here.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Feb 27, 2026

Merging this PR will not alter performance

✅ 38 untouched benchmarks


Comparing rukai:fix_accidental_spin_loop (9b79cae) with main (7a7f863)

Open in CodSpeed

Copy link
Copy Markdown
Collaborator

@yallen-ic yallen-ic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rukai rukai merged commit e6ffac5 into shotover:main Mar 1, 2026
42 checks passed
@rukai rukai mentioned this pull request Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants