[Fix](fragment) avoid query-ctx map clear self-deadlock when stop FragmentMgr#62954
Open
linrrzqqq wants to merge 2 commits intoapache:masterfrom
Open
[Fix](fragment) avoid query-ctx map clear self-deadlock when stop FragmentMgr#62954linrrzqqq wants to merge 2 commits intoapache:masterfrom
linrrzqqq wants to merge 2 commits intoapache:masterfrom
Conversation
Collaborator
Author
|
run buildall |
1 similar comment
Collaborator
|
run buildall |
Collaborator
Author
|
/review |
Contributor
There was a problem hiding this comment.
Automated code review summary:
No blocking issues found in the reviewed PR diff. Existing inline review context was empty, and the user focus file provided no additional focus points.
Critical checkpoint conclusions:
- Goal and test: the PR fixes the shutdown self-deadlock where
_query_ctx_map_delay_delete.clear()can release the lastQueryContextreference while holding the shard mutex, and the added death test covers that reentrant destructor path. - Scope: the change is small and focused on moving value destruction outside the map shard lock.
- Concurrency: the lock scope is reduced; the reentrant
QueryContext::~QueryContext()->FragmentMgr::remove_query_context()-> delay-delete-maperase()path no longer attempts to reacquire the same shard mutex while destruction is in progress. No new lock ordering issue was identified. - Lifecycle: the non-intuitive lifecycle is the
QueryContextdestructor callback intoFragmentMgr; the new swap-then-clear approach handles that lifecycle safely for the affected map. - Configuration, compatibility, persistence, and FE/BE protocol: not applicable; no config, storage format, edit log, or RPC/thrift compatibility changes were introduced.
- Parallel code paths:
ConcurrentContextMap::clear()is generic, so the reduced lock scope applies consistently to the existing clear users; no missed equivalent clear path was found in this PR. - Tests: the PR adds a BE death test for the regression. I did not run the full BE UT locally in this review runner; GitHub/TeamCity status checks are the source of verification.
- Observability: no new observability appears necessary for this shutdown deadlock fix.
- Performance: the extra local
flat_hash_mapper shard is bounded by the existing shard count and avoids holding the mutex during potentially expensive destructors, which is preferable on this path.
Focus points: no additional user-provided review focus was present.
Collaborator
Author
|
run buildall |
Mryange
approved these changes
Apr 30, 2026
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Previously
ConcurrentContextMap::clear()destroyed map values while holding the shard_mutex. For_query_ctx_map_delay_delete, releasing the last QueryContext reference can runQueryContext::~QueryContext(), which callsFragmentMgr::remove_query_context()and re-enters the same delay-delete mapthrough
erase(). This can throw std::system_error with "Resource deadlock avoided" during BE shutdown.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)