Skip to content

fix: close connections leaked during pool shutdown (#1831)#1833

Open
Laotree wants to merge 3 commits intoClickHouse:mainfrom
Laotree:fix-1831-conn-leak-on-close
Open

fix: close connections leaked during pool shutdown (#1831)#1833
Laotree wants to merge 3 commits intoClickHouse:mainfrom
Laotree:fix-1831-conn-leak-on-close

Conversation

@Laotree
Copy link
Copy Markdown
Contributor

@Laotree Laotree commented Apr 17, 2026

Two bugs allowed TCP connections to escape cleanup when Close() was called:

  1. clickhouse.Close() drained the idle pool before marking it closed, leaving a window where release() could Put() connections back. Fixed by storing closed=true first, then draining.

  2. connPool.Put() silently dropped connections when the pool was already closed instead of closing them, leaking the underlying TCP connection. Fixed by calling conn.close() before returning.

Summary

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added

Two bugs allowed TCP connections to escape cleanup when Close() was called:

1. clickhouse.Close() drained the idle pool before marking it closed,
   leaving a window where release() could Put() connections back.
   Fixed by storing closed=true first, then draining.

2. connPool.Put() silently dropped connections when the pool was already
   closed instead of closing them, leaking the underlying TCP connection.
   Fixed by calling conn.close() before returning.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kavirajk
Copy link
Copy Markdown
Contributor

addresses this issue #1831

@github-actions
Copy link
Copy Markdown

Summary

This PR fixes two distinct TCP connection-leak bugs in the pool shutdown path. Bug 1: clickhouse.Close() stored closed=true after draining the idle pool, leaving a window where a concurrent release() could see closed=false and re-add a connection to an already-closed pool — which the old Put() silently dropped without closing. Bug 2: connPool.Put() returned without calling conn.close() when the pool was already closed, leaking the underlying TCP connection. Both fixes are individually correct, and together they provide layered defence: Fix 1 shrinks the race window to nearly zero; Fix 2 ensures any connection that slips through is still closed. The approach is sound.

Should fix

  • Missing tests/issues/ regression file. Per project convention (CLAUDE.md): "Regression tests for bug fixes go in tests/issues/ named after the issue number: issue_1234_test.go." The unit-test update in conn_pool_test.go validates Fix 2 well, but there is no tests/issues/issue_1831_test.go. Even a lightweight integration test — open a native Conn, grab connections, call Close() concurrently with active release() calls, and verify clean teardown with -race — would pin the regression and satisfy the convention.

  • Fix 1 (ordering change in Close()) has no test coverage. The race between Close() and release() is not exercised anywhere. The Go race detector (go test -race) would catch it if a test exercised the concurrent path. Consider adding a goroutine-based test that calls release() while Close() is in flight.

Nits

  • nit: conn_pool.go:97conn.close() is called while holding i.mu. This is consistent with drainPool(), but holding the pool mutex across a network syscall (TCP FIN) can stall other callers of Put()/Get() for the duration. Not introduced by this PR, but worth a follow-up to unlock before closing.

Verdict

Approve — the two-line logic fix is correct and the test update verifies the previously-silently-dropped connection is now closed. The should-fix items (regression file in tests/issues/ and a race-exercising test) are process gaps rather than correctness blockers, and can be addressed in a follow-up if the maintainers prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants