executor/join: fix HashJoinV2 Close hang on cancelled queries#68478
executor/join: fix HashJoinV2 Close hang on cancelled queries#68478joechenrh wants to merge 1 commit into
Conversation
HashJoinV2Exec.Close serially drains buildFinished, joinResultCh and each probeResultChs[i] via channel.Clear. Each drain waits for the producer goroutine to close that channel, which only happens once the producer exits. Three send sites missed the closeCh signal and could keep a producer alive forever after close(closeCh) fired: 1. createTasks (hash_join_v2.go:1228, 1251) only selected on doneCh while sending to a buildTaskCh whose buffer == Concurrency. Under a large build (e.g. TPC-DS) the channel filled while workers were busy, and Close could not unblock the send. 2. BuildWorkerV2.buildHashTable looped over taskCh with no closeCh short-circuit, so a slow per-task build kept the worker alive through every queued task before the deferred close on buildTaskCh fired. 3. fetchProbeSideChunks dropped its result into probeSideResource.dest with no closeCh case. Probe workers honor closeCh on their receive select and may exit before draining, leaving the send to block on a probeResultChs[i] that nobody will close. Add a closeCh branch at each send (and an early-exit check at the top of buildHashTable's loop) so Close can interrupt build and probe producers reliably. Tests: - close_cancel_test.go::TestBuildWorkerExitsOnCloseCh is a focused regression for fix (2): with closeCh pre-closed it exits without touching hashTableContext, which would nil-deref without the check. - close_cancel_test.go::TestHashJoinV2CloseUnblocksBuildPhase and TestHashJoinV2CloseUnblocksProbeFetcher cover fixes (1)+(2) and (3) end-to-end via two new test-only failpoints (slowBuildTask and holdProbeFetcherBeforeSend) and assert Close returns within a bounded time even when producers would otherwise hang. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @joechenrh. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What problem does this PR solve?
Issue Number: ref #TODO (will follow up with an issue link)
Problem Summary:
When a query that uses hash join v2 is killed or its connection is dropped while the build/probe goroutines are still running,
HashJoinV2Exec.Closecan hang forever. Pprof showsCloseparked insidechannel.Clear(thefor range chdrain helper), waiting for a producer-side channel to be closed.The root cause is that three send sites in the build/probe pipeline did not honor
closeCh. WhenCloseruns:close(closeCh)fires.Closeserially drainsbuildFinished,joinResultCh,probeChkResourceCh, and eachprobeResultChs[i]viachannel.Clear. Each drain only completes once the corresponding channel is closed.RunWithRecoverdefers — but the producer goroutines themselves were stuck on a send that did not select oncloseCh. SoClosenever observes the close signal and parks indefinitely.The three sites:
createTasks(pkg/executor/join/hash_join_v2.go): select onbuildTaskCh <- taskonly watcheddoneCh. In TPC-DS-scale builds the channel (buffer sizeConcurrency) filled while workers were busy, so the send blocked indefinitely afterclose(closeCh).BuildWorkerV2.buildHashTable(pkg/executor/join/hash_join_v2.go):for task := range taskChhad nocloseChshort-circuit. A slow per-task build kept each worker alive through every queued task, delayingrange's exit until the producer's deferredclose(buildTaskCh)ran.fetchProbeSideChunks(pkg/executor/join/hash_join_base.go): the finalprobeSideResource.dest <- probeSideResulthad nocloseChcase. Probe workers do honorcloseChon their receive select and exit eagerly, so the fetcher's send could deadlock on aprobeResultChs[i]whose buffer (size 1) was full and whose consumer had already gone away.What changed and how does it work?
For each of the three send sites, add a
case <-closeChbranch (and, for the build worker, acloseChshort-circuit at the top of the per-task loop) so that producers exit promptly afterCloseis called. The existing sites that already honoredcloseCh(e.g.fetchBuildSideRows,controlWorkersForRestore,getProbeSideResource,wait4BuildSide) follow the same pattern; this PR brings the remaining three in line.No behavioral change on the happy path — the new branches only fire after
Closeruns.Check List
Tests
Added
pkg/executor/join/close_cancel_test.go:TestBuildWorkerExitsOnCloseCh— focused unit test for the build-worker fix. Pre-closescloseChand queues several invalid tasks; with the fix the worker returns before touchinghashTableContext(which would nil-deref without it).TestHashJoinV2CloseUnblocksBuildPhase— drives a 50 K-row hash join with a newslowBuildTaskfailpoint set to 1 s per task; assertsClosereturns within 2.5 s instead of waiting for every queued task.TestHashJoinV2CloseUnblocksProbeFetcher— drives a probe pass withholdProbeFetcherBeforeSendset to 800 ms; assertsClosereturns within 2.5 s instead of stalling onchannel.Clear(probeResultChs[i]).Two new failpoints (
slowBuildTask,holdProbeFetcherBeforeSend) are test-only and no-op when not enabled, matching the surroundingslowWorkers/buildHashTablePanic/createTasksPanicstyle.Local validation (
./tools/check/failpoint-go-test.shequivalent flow):TestBuildWorkerExitsOnCloseCh,TestHashJoinV2CloseUnblocks*— PASSTestKillDuringBuild,TestKillDuringProbe,TestCreateTasksPanic,TestBuildHashTablePanic,TestSplitPartitionPanic,TestProcessOneProbeChunkPanic— PASSSide effects
Documentation
Release note
```release-note
Fix a deadlock where killing or dropping a connection running a hash-join-v2 query could hang in Close because some build/probe goroutines did not honor the executor's cancel signal.
```