Ins 38005 kafka close waits#1971
Merged
tharinduamila-insta merged 11 commits intoshotover:mainfrom Feb 9, 2026
Merged
Conversation
rukai
reviewed
Feb 4, 2026
539445b to
30d27c2
Compare
rukai
reviewed
Feb 4, 2026
…ader detects an error
… when reader detects an error" This reverts commit 595eaeb.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical issue where Shotover was leaking CLOSE_WAIT connections when Kafka brokers closed idle connections. The fix ensures the transform chain is notified when reader errors occur, allowing proper cleanup of broken connections.
Changes:
- Modified connection handling to notify the transform chain when reader errors are detected
- Updated Kafka sink cluster logic to cleanup connections without pending requests immediately
- Added test infrastructure to verify no CLOSE_WAIT connections leak
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| shotover/src/connection.rs | Added notification to transform chain when reader task detects connection closure |
| shotover/src/transforms/kafka/sink_cluster/mod.rs | Removed guard preventing receive on connections without pending requests and added logic to cleanup broken connections |
| test-helpers/src/connection/kafka/java.rs | Added Drop implementations to properly close Java Kafka clients |
| shotover-proxy/tests/kafka_int_tests/test_cases.rs | Added helper functions to detect CLOSE_WAIT connections and integrated check into broker idle timeout test |
| shotover-proxy/tests/kafka_int_tests/mod.rs | Updated test to pass Shotover PID and adjusted error message matcher |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
rukai
reviewed
Feb 8, 2026
rukai
approved these changes
Feb 8, 2026
Contributor
rukai
left a comment
There was a problem hiding this comment.
This PR is great, really good investigation work on this one!
I ran the new test locally with:
- the
connection.rsfix reverted - the
sink_cluster/mod.rsfix reverted
And both cases the test failed, indicating that the test is doing its job and that both fixes are required.
I appreciate the updated internal documentation.
yallen-ic
approved these changes
Feb 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Problem
When a Kafka broker closed a connection and Shotover wasn't actively polling that connection (no pending requests), the writer_task would remain blocked on out_rx.recv().await indefinitely, keeping the TCP write half open. This caused CLOSE-WAIT state connections to accumulate.
Additionally it was also identified by @rukai that we were not sending a notification to the transform chain when reader errors are met.
We have added the notification behaviour to awaken the transform chain in shotover::connection::spawn_read_write_tasks and also updated the definition of shotover::transforms::kafka::sink_cluster::KafkaSinkCluster::recv_responses to cleanup the broken connections regardless pending messages state