net: posix: fix pollable_fd_state leak on cross-shard connection forwarding by avikivity · Pull Request #3394 · scylladb/seastar

avikivity · 2026-05-07T13:15:38Z

Fix a memory leak of pollable_fd_state objects detected by AddressSanitizer in socket_test (36800 bytes in 100 allocations).

The leak has three contributing factors, all in the cross-shard connection forwarding path of posix_server_socket_impl::accept():

When a connection is forwarded to another shard, the raw file_desc is moved out of the pollable_fd via get_file_desc() and captured in the smp::submit_to lambda. However, the pollable_fd wrapper (holding an intrusive_ptr to the now-empty pollable_fd_state) was left to be destroyed implicitly by the coroutine frame at end of the loop iteration. In practice, the coroutine frame did not destroy the structured binding variable between iterations (observed with the proxy protocol path where I/O is performed on the fd before forwarding), causing the pollable_fd_state to be leaked. Fix this by explicitly calling fd.close() after extracting the raw file_desc.
posix_ap_server_socket_impl::move_connected_socket() unconditionally queues connections into the static conn_q when no acceptor is waiting. If the server_socket has already been destroyed (entry removed from ports), these queued connections are never consumed. Fix by checking ports.contains() before queuing, and closing the fd otherwise.
posix_ap_server_socket_impl's destructor only removed from ports but did not drain conn_q entries for its address. Connections queued between abort_accept() (which drains conn_q) and the destructor would be leaked. Fix by also erasing from conn_q in the destructor.

…arding Fix a memory leak of pollable_fd_state objects detected by AddressSanitizer in socket_test (36800 bytes in 100 allocations). The leak has three contributing factors, all in the cross-shard connection forwarding path of posix_server_socket_impl::accept(): 1. When a connection is forwarded to another shard, the raw file_desc is moved out of the pollable_fd via get_file_desc() and captured in the smp::submit_to lambda. However, the pollable_fd wrapper (holding an intrusive_ptr to the now-empty pollable_fd_state) was left to be destroyed implicitly by the coroutine frame at end of the loop iteration. In practice, the coroutine frame did not destroy the structured binding variable between iterations (observed with the proxy protocol path where I/O is performed on the fd before forwarding), causing the pollable_fd_state to be leaked. Fix this by explicitly calling fd.close() after extracting the raw file_desc. 2. posix_ap_server_socket_impl::move_connected_socket() unconditionally queues connections into the static conn_q when no acceptor is waiting. If the server_socket has already been destroyed (entry removed from ports), these queued connections are never consumed. Fix by checking ports.contains() before queuing, and closing the fd otherwise. 3. posix_ap_server_socket_impl's destructor only removed from ports but did not drain conn_q entries for its address. Connections queued between abort_accept() (which drains conn_q) and the destructor would be leaked. Fix by also erasing from conn_q in the destructor.

travisdowns · 2026-05-12T16:16:21Z

In practice, the coroutine frame did not destroy the structured binding variable between iterations (observed with the proxy protocol path where I/O is performed on the fd before forwarding)

Wait, isn't this a massive compiler bug? Are you saying fd dtor is not called at the closing bracket for the loop? What happens later? Is it ever called? Does the next co_await _lfd.accept(); re-use the space?

avikivity · 2026-05-13T18:42:16Z

In practice, the coroutine frame did not destroy the structured binding variable between iterations (observed with the proxy protocol path where I/O is performed on the fd before forwarding)

Wait, isn't this a massive compiler bug? Are you saying fd dtor is not called at the closing bracket for the loop? What happens later? Is it ever called? Does the next co_await _lfd.accept(); re-use the space?

It's indeed a massive bug.

I started a fruitless bisect, need to retry it.

travisdowns · 2026-05-13T19:32:42Z

It's indeed a massive bug.

I started a fruitless bisect, need to retry it.

Does this occur in clang or gcc?

avikivity · 2026-05-14T12:10:29Z

It's indeed a massive bug.
I started a fruitless bisect, need to retry it.

Does this occur in clang or gcc?

gcc, we'd know about a such clang bug much sooner.

travisdowns · 2026-05-14T15:36:27Z

gcc, we'd know about a such clang bug much sooner.

Right.

I guess I question the wisdom of doing one-off fixes like this one (actually I was unclear what was going on, I had to check that the dtor had the same effect as close()) given the magnitude of the issue, it just seems unsafe to use gcc in this state.

avikivity · 2026-05-14T16:10:18Z

gcc, we'd know about a such clang bug much sooner.

Right.

I guess I question the wisdom of doing one-off fixes like this one (actually I was unclear what was going on, I had to check that the dtor had the same effect as close()) given the magnitude of the issue, it just seems unsafe to use gcc in this state.

I'm against this fix too, this was before I realized it is likely a gcc bug.

avikivity · 2026-05-14T16:11:34Z

gcc, we'd know about a such clang bug much sooner.

Right.
I guess I question the wisdom of doing one-off fixes like this one (actually I was unclear what was going on, I had to check that the dtor had the same effect as close()) given the magnitude of the issue, it just seems unsafe to use gcc in this state.

I'm against this fix too, this was before I realized it is likely a gcc bug.

Or rather, the patch is actually good, just not for its stated purpose. The code now can hang on to an fd for an unlimited amount of time, and it's better to drop the fd early.

avikivity · 2026-05-17T11:09:07Z

Likely cause: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124584

travisdowns · 2026-05-21T02:04:17Z

@avikivity ISTM gcc is simply unsuitable for seastar applications until this is fixed? just very hard to avoid that problem.

avikivity · 2026-05-22T05:47:52Z

@avikivity ISTM gcc is simply unsuitable for seastar applications until this is fixed? just very hard to avoid that problem.

Coroutines, gcc, structured bindings - pick any two

travisdowns · 2026-05-22T16:19:54Z

Coroutines, gcc, structured bindings - pick any two

Coroutines, structured bindings ... every time :)

avikivity · 2026-05-23T12:25:26Z

Coroutines, gcc, structured bindings - pick any two

Coroutines, structured bindings ... every time :)

I'm with you here

avikivity marked this pull request as draft May 13, 2026 18:42

travisdowns mentioned this pull request May 21, 2026

github: revamp test matrix (allpairs, containers, arm) #3412

Merged

travisdowns mentioned this pull request May 26, 2026

gcc 15+ miscompiles structured bindings in coroutine loops (tracking upstream gcc PR 124584) #3431

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

net: posix: fix pollable_fd_state leak on cross-shard connection forwarding#3394

net: posix: fix pollable_fd_state leak on cross-shard connection forwarding#3394
avikivity wants to merge 1 commit into
masterfrom
fix-accept-smp

avikivity commented May 7, 2026

Uh oh!

travisdowns commented May 12, 2026

Uh oh!

avikivity commented May 13, 2026

Uh oh!

travisdowns commented May 13, 2026

Uh oh!

avikivity commented May 14, 2026

Uh oh!

travisdowns commented May 14, 2026

Uh oh!

avikivity commented May 14, 2026

Uh oh!

avikivity commented May 14, 2026

Uh oh!

avikivity commented May 17, 2026

Uh oh!

travisdowns commented May 21, 2026

Uh oh!

avikivity commented May 22, 2026 •

edited

Loading

Uh oh!

travisdowns commented May 22, 2026

Uh oh!

avikivity commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

avikivity commented May 7, 2026

Uh oh!

travisdowns commented May 12, 2026

Uh oh!

avikivity commented May 13, 2026

Uh oh!

travisdowns commented May 13, 2026

Uh oh!

avikivity commented May 14, 2026

Uh oh!

travisdowns commented May 14, 2026

Uh oh!

avikivity commented May 14, 2026

Uh oh!

avikivity commented May 14, 2026

Uh oh!

avikivity commented May 17, 2026

Uh oh!

travisdowns commented May 21, 2026

Uh oh!

avikivity commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

travisdowns commented May 22, 2026

Uh oh!

avikivity commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

avikivity commented May 22, 2026 •

edited

Loading