Skip to content

Conversation

atris
Copy link
Contributor

@atris atris commented Aug 28, 2025

Summary

Implement streaming search on the coordinator, emitting early partial results from the query phase with optional scoring. The change introduces request flags and a mode selector, integrates streaming into the existing SearchAction
path, and adds a reproducible TTFB benchmark. When streaming is not used, behavior is unchanged.

Motivation

• Reduce time-to-first-byte (TTFB) at the coordinator by not waiting for all shards to complete query phase before starting fetch-eligible work.
• Provide mode-specific controls for batching and scoring, with safe defaults.
• Keep backward compatibility on the transport wire and preserve REST semantics.

Design and Scope

• Request flags and mode
• SearchRequest gains version-gated fields (V_3_3_0): streamingScoring (boolean) and streamingSearchMode (string).
• REST: stream=true enables streaming; optional stream_scoring_mode and streaming_mode select behavior.
• No change to default behavior; streaming is opt‑in.
• Coordinator streaming
• Streaming is integrated into TransportSearchAction (SearchAction). No separate transport action is required.
• SearchPhaseController.newSearchPhaseResults(...) returns either the existing QueryPhaseResultConsumer or a StreamQueryPhaseResultConsumer based on the request mode.
• StreamQueryPhaseResultConsumer controls partial reduce cadence via mode-specific multipliers and emits TopDocs-aware partials to the progress listener.
• Partial reduce notifications
• SearchProgressListener gains a TopDocs-aware hook with a compatibility fallback:
• onPartialReduceWithTopDocs(…) → defaults to onPartialReduce(…).
• notifyPartialReduceWithTopDocs(…) invokes the hook safely.
• Existing listeners are unaffected.
• Query execution
• For streaming queries, the QueryPhase routes to streaming collector contexts based on StreamingSearchMode:
• NO_SCORING: unsorted documents, fastest emission.
• SCORED_UNSORTED: scored documents without sort.
• SCORED_SORTED: scored, sorted via Lucene’s top-N collectors.
• CONFIDENCE_BASED: early emission guided by simple Hoeffding-style bounds.
• Collector batch size is bounded and read via SearchContext.getStreamingBatchSize(); partial batches are emitted to the stream channel when available.
• Transport integration
• Both the classic and stream transport handlers are registered:
• Classic: SearchTransportService.registerRequestHandler(…).
• Stream (if available): StreamSearchTransportService.registerStreamRequestHandler(…).
• The streaming transport path is selected only for streaming requests and used thread pools are chosen accordingly.

Settings and Controls

• Dynamic cluster settings for streaming are added (StreamingSearchSettings, node-scoped, dynamic). Examples:
• search.streaming.batch_size
• Mode-specific reduce multipliers, emission interval, and minimal doc thresholds
• Circuit breaker and limits for buffering in streaming code paths
• Defaults are conservative. The feature remains opt-in via request flags; settings do not change behavior unless the request is streaming.

Wire Compatibility and API

• Transport wire BWC
• New SearchRequest and ShardSearchRequest fields are gated by Version.V_3_3_0 on read/write. Older peers neither write nor read these fields.
• Public API
• No breaking changes to REST endpoints.
• SearchProgressListener adds new methods with safe defaults; existing code continues to compile and run.

Tests and Benchmark

• Unit tests:
• Stream consumer batch sizing and dynamic settings effects.
• Hoeffding bounds behavior.
• Integration tests:
• Basic streaming search workflows.
• Streaming aggregations with and without sub-aggregations.
• Mode coverage (NO_SCORING, SCORED_UNSORTED, SCORED_SORTED, CONFIDENCE_BASED).
• Benchmark:
• StreamingPerformanceBenchmarkTests: measures coordinator-side TTFB (time to first partial reduce) vs. classic full reduce for a large query.
• Logger-only reporting; no REST streaming is introduced.

Non-Goals / Limitations

• This change does not implement HTTP/REST streaming of partial responses.
• The SearchResponse partial/sequence metadata used internally by the streaming listener is not serialized on the wire and does not alter REST payloads.
• Confidence-based mode uses a conservative and simple bound; it is adequate for early gating but not a full ranking stability analysis.

Backward Compatibility and Risk

• Default behavior unchanged unless streaming flags are provided.
• Wire BWC ensured via version gating; JApiCmp passes.
• Aggregation partial reductions are unaffected; for TopDocs partials we call the new TopDocs-aware hook, otherwise we continue to notify via the existing method.

Operational Notes

• Streaming is disabled by default and must be explicitly requested with stream=true (REST) or by setting SearchRequest flags programmatically.
• Mode selection allows tuning for latency vs. coordination cost.
• Dynamic settings enable safe runtime tuning if necessary.

If reviewers prefer, I can split the settings and the confidence-based collector into a follow-up to further reduce the initial surface.

Copy link
Contributor

❌ Gradle check result for 5554606: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@atris atris closed this Aug 29, 2025
@atris atris reopened this Aug 29, 2025
Copy link
Contributor

❌ Gradle check result for 5554606: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@atris atris closed this Aug 29, 2025
@atris atris reopened this Aug 29, 2025
Copy link
Contributor

❌ Gradle check result for 5554606: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

  Introduces streaming search infrastructure that enables progressive emission
  of search results with three configurable scoring modes. The implementation
  extends the existing streaming transport layer to support partial result
  computation at the coordinator level.

  Scoring modes:
  - NO_SCORING: Immediate result emission without confidence requirements
  - CONFIDENCE_BASED: Statistical emission using Hoeffding inequality bounds
  - FULL_SCORING: Complete scoring before result emission

  The implementation leverages OpenSearch's inter-node streaming capabilities
  to reduce query latency through early result emission. Partial reductions
  are triggered based on the selected scoring mode, with results accumulated
  at the coordinator before final response generation.

  Key changes:
  - Add HoeffdingBounds for statistical confidence calculation
  - Extend QueryPhaseResultConsumer to support streaming reduction
  - Add StreamingScoringCollector wrapping TopScoreDocCollector
  - Integrate streaming scorer selection in QueryPhase
  - Add REST parameter stream_scoring_mode for mode selection
  - Include streaming metadata in SearchResponse

  The current implementation operates within architectural constraints where
  streaming is limited to inter-node communication. Client-facing streaming
  will be addressed in a follow-up contribution.

  Addresses opensearch-project#18725

Signed-off-by: Atri Sharma <[email protected]>
Signed-off-by: Atri Sharma <[email protected]>
Copy link
Contributor

❌ Gradle check result for 6a4d92e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@atris atris closed this Sep 27, 2025
@github-project-automation github-project-automation bot moved this from In-Review to Done in Performance Roadmap Sep 27, 2025
@atris atris reopened this Sep 27, 2025
@github-project-automation github-project-automation bot moved this from Done to In Progress in Performance Roadmap Sep 27, 2025
Copy link
Contributor

❌ Gradle check result for 6a4d92e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Atri Sharma <[email protected]>
Copy link
Contributor

❌ Gradle check result for df7ad7b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Oct 9, 2025

❌ Gradle check result for c084b56: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Atri Sharma <[email protected]>
Copy link
Contributor

github-actions bot commented Oct 9, 2025

❌ Gradle check result for ad9c30d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@atris atris closed this Oct 9, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in Performance Roadmap Oct 9, 2025
@atris atris reopened this Oct 9, 2025
@github-project-automation github-project-automation bot moved this from Done to In Progress in Performance Roadmap Oct 9, 2025
Copy link
Contributor

github-actions bot commented Oct 9, 2025

❌ Gradle check result for ad9c30d: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Atri Sharma <[email protected]>
@atris atris requested a review from peternied as a code owner October 10, 2025 19:41
Copy link
Contributor

❌ Gradle check result for b4b16b0: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@rishabhmaurya
Copy link
Contributor

rishabhmaurya commented Oct 11, 2025

Thank you @atris for working and owning it as this a big feature.

I have very high level comments -

  1. This is a huge PR and a bit hard to review, we need to think about splitting this PR. May be start with the first requirement as mentioned here - [RFC] New search streaming API #18725 (comment)? Where this streaming search will be most useful. And start with FlushMode as PER_SEGMENT to simplify it further.
  2. Did you get a chance to get to some benchmark numbers? This feature is mostly governed from 2 angles - 1) TTFB and 2) overall resource consumption per query fetching all pages compared to traditional approach. Without these numbers it would be hard to justify this effort. Theoretically it should help in both, but some initial numbers while you work on cleaning/refactoring rest of the PR is much appreciated.

Once you split the PR, it would be great if we clearly describe the changes introduced to make it easier for others to pitch in here.

We should open a RFC for client support too as this feature is not useful without client support.

* Streaming search parameters for configuring progressive result emission.
* These parameters control how and when intermediate results are sent.
*/
public class StreamingSearchParameters implements Writeable, ToXContent {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of most of these params if we split this PR and start with simple case i.e. no scoring or sorting

keepAlive = in.readOptionalTimeValue();
originalIndices = OriginalIndices.readOriginalIndices(in);
assert keepAlive == null || readerId != null : "readerId: " + readerId + " keepAlive: " + keepAlive;
// Read streaming fields - gated on version for BWC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have isStreamingSearch() why are we introducing a new one here?

import org.apache.lucene.search.TopDocs;
import org.opensearch.common.annotation.ExperimentalApi;

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move it to next PR

? (Exception) exception.getCause()
: new OpenSearchException(exception.getCause());
}
if (isStreamSearch && logger.isTraceEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't have to check logger.isTraceEnabled()


currentBatch.add(scoreDoc);
if (currentBatch.size() >= batchSize) {
emitCurrentBatch(false);
Copy link
Contributor

@rishabhmaurya rishabhmaurya Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check how we emit batches for aggregation cases currently?
let's try to be consistent with it. If we want to make logic of handling emission better, let's do it for both.
We also introduced the FlushMode and currently defaulting to PER_SEGMENT. Maybe we can start with PER_SEGMENT and later make it intra segment i.e. based on batch size, let me know what do you think?

Copy link
Contributor

❌ Gradle check result for cbf228d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants