Skip to content

Ins 38005 kafka close waits#1971

Merged
tharinduamila-insta merged 11 commits intoshotover:mainfrom
tharinduamila-insta:INS-38005-kafka-close-waits
Feb 9, 2026
Merged

Ins 38005 kafka close waits#1971
tharinduamila-insta merged 11 commits intoshotover:mainfrom
tharinduamila-insta:INS-38005-kafka-close-waits

Conversation

@tharinduamila-insta
Copy link
Copy Markdown
Collaborator

@tharinduamila-insta tharinduamila-insta commented Feb 2, 2026

The Problem
When a Kafka broker closed a connection and Shotover wasn't actively polling that connection (no pending requests), the writer_task would remain blocked on out_rx.recv().await indefinitely, keeping the TCP write half open. This caused CLOSE-WAIT state connections to accumulate.

Additionally it was also identified by @rukai that we were not sending a notification to the transform chain when reader errors are met.
We have added the notification behaviour to awaken the transform chain in shotover::connection::spawn_read_write_tasks and also updated the definition of shotover::transforms::kafka::sink_cluster::KafkaSinkCluster::recv_responses to cleanup the broken connections regardless pending messages state

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Feb 2, 2026

Merging this PR will not alter performance

✅ 38 untouched benchmarks


Comparing tharinduamila-insta:INS-38005-kafka-close-waits (9105443) with main (8172c11)

Open in CodSpeed

@tharinduamila-insta tharinduamila-insta force-pushed the INS-38005-kafka-close-waits branch from 539445b to 30d27c2 Compare February 4, 2026 02:53
@tharinduamila-insta tharinduamila-insta self-assigned this Feb 4, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical issue where Shotover was leaking CLOSE_WAIT connections when Kafka brokers closed idle connections. The fix ensures the transform chain is notified when reader errors occur, allowing proper cleanup of broken connections.

Changes:

  • Modified connection handling to notify the transform chain when reader errors are detected
  • Updated Kafka sink cluster logic to cleanup connections without pending requests immediately
  • Added test infrastructure to verify no CLOSE_WAIT connections leak

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
shotover/src/connection.rs Added notification to transform chain when reader task detects connection closure
shotover/src/transforms/kafka/sink_cluster/mod.rs Removed guard preventing receive on connections without pending requests and added logic to cleanup broken connections
test-helpers/src/connection/kafka/java.rs Added Drop implementations to properly close Java Kafka clients
shotover-proxy/tests/kafka_int_tests/test_cases.rs Added helper functions to detect CLOSE_WAIT connections and integrated check into broker idle timeout test
shotover-proxy/tests/kafka_int_tests/mod.rs Updated test to pass Shotover PID and adjusted error message matcher

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@rukai rukai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is great, really good investigation work on this one!

I ran the new test locally with:

  • the connection.rs fix reverted
  • the sink_cluster/mod.rs fix reverted
    And both cases the test failed, indicating that the test is doing its job and that both fixes are required.

I appreciate the updated internal documentation.

@tharinduamila-insta tharinduamila-insta enabled auto-merge (squash) February 9, 2026 00:49
@tharinduamila-insta tharinduamila-insta merged commit 7780d97 into shotover:main Feb 9, 2026
42 checks passed
@tharinduamila-insta tharinduamila-insta deleted the INS-38005-kafka-close-waits branch February 9, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants