Fix inner hits + aggregations concurrency bug #128036

benchaplin · 2025-05-12T19:52:23Z

There's a concurrency bug that occurs when doing aggregations on inner hits. It can result in one of three exceptions:

java.lang.IllegalStateException: Error retrieving path
java.lang.NullPointerException: Cannot invoke "java.util.Map.get(Object)" because "this.preloadedStoredFieldValues" is null
java.lang.AssertionError: invalid decRef call: already closed

The underlying issue is that InnerHitSubContext is not thread safe, yet instances are shared across leaf slice search threads during an aggregation. Specifically, the race condition occurs when InnerHitSubContext.rootId & InnerHitSubContext.rootSource fields are set and accessed concurrently by multiple threads.

The tests I've added to TopHitsIT reproduce the issue. If you paste those tests into main and run them a few times you should see one of the exceptions.

I've solved this by forking the InnerHitSubContext instances, similar to what was done here #106990. SearchExecutionContext is at times accessed from InnerHitSubContext, so I also had to make sure the forked SearchExecutionContext was used in those cases.

elasticsearchmachine · 2025-05-12T19:52:47Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

elasticsearchmachine · 2025-05-12T19:52:48Z

Hi @benchaplin, I've created a changelog YAML for you.

...er/src/internalClusterTest/java/org/elasticsearch/search/aggregations/metrics/TopHitsIT.java

javanna

LGTM great work

Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to elastic#122419

Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to #122419

javanna · 2025-06-03T07:01:40Z

Hey @benchaplin I think it makes sense to backport this fix to 9.0 as well. Thoughts?

Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to elastic#122419

benchaplin · 2025-06-03T13:28:35Z

Hey @benchaplin I think it makes sense to backport this fix to 9.0 as well. Thoughts?

Ah, agreed. Thanks for catching this, I got a little mixed up with versions.

Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to #122419

Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to elastic#122419

benchaplin added 18 commits April 28, 2025 18:32

Test cloning InnerHitContext

88d0afa

Actually use inner hits

0ed0ec1

testing

6373273

Switch to copy constructor pattern

f8f64a6

Override getSearchExecutionContext for each copied InnerHitSubContext

bbbd6cc

Stop using copy pattern

39d29ba

Merge branch 'main' into inner_hits_aggs_concurrency_bug

47d989f

remove test setup

d9491b9

Fix tests

31a735a

iter

4d84198

Merge remote-tracking branch 'upstream/main'

edf2f2b

Add repro IT

fa0e97b

Merge remote-tracking branch 'upstream/main'

f6c4d73

Merge branch 'main' into inner_hits_aggs_concurrency_bug

89b4c08

Clean up

dcb2882

Clean up

61e05ba

Clean up

d3803a7

Switch back to copy pattern, with search execution context override

6ca8524

benchaplin requested a review from javanna May 12, 2025 19:52

benchaplin added >bug Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations v8.19.0 v9.1.0 labels May 12, 2025

Update docs/changelog/128036.yaml

9c046b1

javanna reviewed May 19, 2025

View reviewed changes

...er/src/internalClusterTest/java/org/elasticsearch/search/aggregations/metrics/TopHitsIT.java Show resolved Hide resolved

Merge branch 'main' into inner_hits_aggs_concurrency_bug

ff7d042

benchaplin force-pushed the inner_hits_aggs_concurrency_bug branch from 3647f68 to ff7d042 Compare May 21, 2025 21:33

Add IT to test JoinFieldInnerHitSubContext

6e9b6f4

benchaplin requested a review from javanna June 2, 2025 13:20

javanna approved these changes Jun 2, 2025

View reviewed changes

benchaplin merged commit 13bce60 into elastic:main Jun 2, 2025
18 checks passed

benchaplin added the auto-backport Automatically create backport pull requests when merged label Jun 2, 2025

elasticsearchmachine pushed a commit that referenced this pull request Jun 2, 2025

Fix inner hits + aggregations concurrency bug (#128036) (#128785)

2a2adcc

Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to #122419

elasticsearchmachine pushed a commit that referenced this pull request Jun 3, 2025

Fix inner hits + aggregations concurrency bug (#128036) (#128830)

0c3a1bc

Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to #122419

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix inner hits + aggregations concurrency bug #128036

Fix inner hits + aggregations concurrency bug #128036

Uh oh!

benchaplin commented May 12, 2025

Uh oh!

elasticsearchmachine commented May 12, 2025

Uh oh!

elasticsearchmachine commented May 12, 2025

Uh oh!

Uh oh!

javanna left a comment

Uh oh!

Uh oh!

javanna commented Jun 3, 2025

Uh oh!

benchaplin commented Jun 3, 2025

Uh oh!

Uh oh!

Fix inner hits + aggregations concurrency bug #128036

Fix inner hits + aggregations concurrency bug #128036

Uh oh!

Conversation

benchaplin commented May 12, 2025

Uh oh!

elasticsearchmachine commented May 12, 2025

Uh oh!

elasticsearchmachine commented May 12, 2025

Uh oh!

Uh oh!

javanna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

javanna commented Jun 3, 2025

Uh oh!

benchaplin commented Jun 3, 2025

Uh oh!

Uh oh!