-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Summary
This proposal introduces a search pipeline response processor in the Learning to Rank (LTR) plugin that enables reranking search results using a remote ML model (e.g., hosted on Amazon SageMaker) with LTR-computed feature vectors as input. The processor bridges LTR's feature logging capabilities with OpenSearch ML Commons' remote model connector framework.
Related issues:
- [FEATURE] Support remote inference on LTR plugin #27 — Support remote inference on LTR plugin
Motivation
The current customer workflow
A common production pattern among OpenSearch LTR users is:
- Define feature sets in the LTR plugin (BM25 scores, term statistics, field matches, etc.)
- Execute a search query with LTR feature logging enabled (
_ltrlog) - Export the logged feature vectors out of OpenSearch to the application layer
- Combine them with external features (user interaction data, session context, time-based signals)
- Call a remote model on SageMaker from the application layer
- Re-sort the results in the application layer
Steps 3 through 6 happen entirely outside OpenSearch. This introduces:
- Application complexity — ranking logic is split across the search layer and the application layer
- Additional latency — extra network round-trips: OpenSearch → Application → SageMaker → Application
- Operational burden — more moving parts to monitor, debug, and maintain
- Architectural rigidity — server-side personalization and ranking experiments require application deployments
What this feature enables
Collapsing the above workflow into a single search request:
- Define feature sets in the LTR plugin
- Execute a search query → LTR logs features → pipeline processor sends feature vectors to a remote model via ML Commons → results return reranked
The ranking logic moves into the search layer, where it belongs.
Value Proposition
Remote models vs. local LTR models
The LTR plugin currently supports local model execution via RankLib (LambdaMART, MART, Random Forests, Linear Regression) and XGBoost. These models run in-process on the OpenSearch node with no network overhead.
Remote model inference via SageMaker is not a replacement for local models. It serves a fundamentally different set of requirements:
| Dimension | Local LTR Models | Remote Models (SageMaker) |
|---|---|---|
| Latency | Sub-millisecond per-document scoring, no network hop | Network round-trip to SageMaker endpoint (typically 10-50ms per batch) |
| Model flexibility | Limited to RankLib and XGBoost model formats | Any model architecture: deep learning, gradient boosted ensembles, custom models, LLM-based rankers |
| Feature set | Lucene-computable features only (query-document signals) | Can incorporate arbitrary external features (user history, session data, real-time signals) alongside LTR features |
| Model size | Constrained by OpenSearch heap and circuit breaker limits | No size constraints; model runs on dedicated ML infrastructure |
| MLOps | Model binary stored in .ltrstore index, manual updates |
Centralized model registry, A/B testing, automated retraining, monitoring via SageMaker |
| Best for | Low-latency ranking with well-defined query-document features | Complex ranking with rich feature sets and advanced model architectures |
Key insight: Customers choosing SageMaker over local LTR models are typically not making a latency trade-off. They need capabilities that local models cannot provide — richer feature sets, larger models, and MLOps infrastructure. The two approaches are complementary, not competing.
LTR feature-based reranking vs. existing OpenSearch rerankers
OpenSearch already supports reranking via search pipeline response processors (introduced in 2.12), including cross-encoder reranking with remote models on SageMaker. The question is: what does LTR add?
| Dimension | Existing Rerank Processor | LTR Feature-Based Reranking |
|---|---|---|
| Input to model | Raw query text + document text | Structured, numeric feature vectors computed by LTR |
| Feature engineering | None — the model receives raw text and must learn relevance from scratch | Rich, domain-specific features: BM25 per-field, term statistics, custom script features, derived features |
| Model paradigm | Cross-encoders and neural rerankers that operate on text pairs | Traditional ML models (XGBoost, LightGBM, linear models) and neural models that operate on feature vectors |
| Interpretability | Opaque — model produces a score from text | Transparent — each feature value is logged and inspectable |
| Latency profile | Depends on model inference time (cross-encoders can be expensive) | Feature computation is fast (Lucene-native); remote inference is one batched call |
| Training workflow | Requires labeled query-document pairs with raw text | Uses feature vectors exported from LTR logging, standard tabular ML training |
These are different paradigms, not competing ones. Cross-encoder reranking is ideal when you want a neural model to assess query-document relevance from raw text. LTR feature-based reranking is ideal when you have domain-specific features, need interpretability, or want to combine search signals with external features in a traditional ML model.
A customer might even use both: LTR feature-based reranking for a first-pass rerank with a fast model, followed by a cross-encoder for a final rerank on the top-k results.
Note: OpenSearch also has an
ml_inferencesearch response processor (since 2.16) that can call ML Commons from a search pipeline. However, that processor operates on raw document fields — it does not understand LTR's_ltrlogfeature vector format. The processor proposed here is purpose-built to parse_ltrlog, apply feature mappings, and batch feature vectors for efficient remote inference.
Architecture Options Considered
Option A: Build the processor in ML Commons
ML Commons already owns the rerank processor and the remote model connector framework. This option would extend the existing rerank processor (or add a new variant) to read structured feature vectors from _ltrlog fields in search results.
Pros:
- Close to existing rerank infrastructure
- ML Commons already has the client code for remote model connectors
- No new plugin dependency graph changes
Cons:
- ML Commons would need to understand and parse LTR's
_ltrlogoutput format, creating conceptual coupling between the two plugins - If the
_ltrlogformat changes in the LTR plugin, ML Commons would break - ML Commons team would need to maintain awareness of LTR's data contract
- Violates separation of concerns: ML Commons becomes aware of a specific plugin's internal data format
Option B: Build the processor in the LTR plugin, call ML Commons client (Recommended)
The LTR plugin registers a new search pipeline response processor. This processor reads _ltrlog from its own logging system, formats the feature vectors into model input, and calls ML Commons' predict API via its Java client to get scores from the remote model.
Pros:
- LTR owns both sides of its data — it produces the feature vectors and knows how to consume them
- ML Commons stays generic — it serves predictions via the predict API without any knowledge of LTR internals
- Changes to
_ltrlogformat are internal to one plugin — no cross-plugin breakage - LTR becomes a more complete end-to-end ranking solution — feature engineering, local scoring, and remote scoring all in one plugin
- Clean dependency direction: LTR depends on ML Commons client (stable, public API), not the reverse
Cons:
- LTR plugin takes a compile-time dependency on the ML Commons client
- LTR plugin does not currently register any search pipeline processors, so this introduces a new extension pattern for the plugin
Recommendation
Option B is the recommended approach. The core principle is that the producer of the data should own the transformation. LTR creates _ltrlog, so LTR should be the component that knows how to read it, reshape it, and hand it off to a generic prediction service. ML Commons' role is to be that generic service — "give me input, get back scores" — which is exactly what its predict API already does.
The dependency on the ML Commons client is acceptable. ML Commons is a core OpenSearch plugin with a stable client API, and this is a runtime integration, not a deep architectural coupling.
Proposed Design
High-level flow
Search Request
|
v
Query Phase — normal retrieval (BM25, etc.)
|
v
Rescore Phase — sltr query with feature logging enabled
| (computes LTR feature vectors, attaches _ltrlog to hits)
|
v
Response Phase
|
v
LtrRemoteRerankProcessor (new)
| 1. Read _ltrlog feature vectors from each hit
| 2. Optionally merge with additional document field values
| 3. Read query-level context from ext.ltr_rerank_context (user/session features)
| 4. Batch all feature vectors into a single model input payload
| 5. Call ML Commons predict API (routes to SageMaker via connector)
| 6. Receive scores for all documents
| 7. Re-sort hits by remote model scores
| 8. Optionally remove _ltrlog from the response
|
v
Reranked Results returned to client
Three feature sources
The processor assembles a feature vector from three sources, covering the feature types common in production ranking systems:
| Feature Type | Source | Per-document? | Example |
|---|---|---|---|
| Query-document features | _ltrlog (LTR feature logging) |
Yes | title_bm25, body_match, term_stat |
| Document features | Stored fields on each hit | Yes | popularity, category, price |
| Query/user features | ext.ltr_rerank_context in the search request |
No (same for all docs) | user_segment, time_of_day, user_click_rate |
Query/user features are passed by the application at search time — they represent user preferences, session state, or external signals not stored in the index. The processor appends them to every document's feature vector.
Search pipeline configuration (example)
PUT /_search/pipeline/ltr-remote-rerank
{
"response_processors": [
{
"ltr_remote_rerank": {
"model_id": "<ml-commons-remote-model-id>",
"ltr_log_name": "main_log",
"feature_mapping": {
"title_bm25": 0,
"body_bm25": 1,
"title_match": 2
},
"remove_ltr_log": true,
"context": {
"document_fields": ["category", "popularity"],
"query_context_fields": ["user_segment", "time_of_day", "user_click_rate"]
}
}
}
]
}Search request (example)
POST /my-index/_search?search_pipeline=ltr-remote-rerank
{
"query": { "match": { "title": "wireless headphones" } },
"rescore": {
"window_size": 50,
"query": {
"rescore_query": {
"sltr": {
"_name": "logged_features",
"featureset": "my_feature_set",
"params": { "keywords": "wireless headphones" }
}
},
"query_weight": 0,
"rescore_query_weight": 0
}
},
"ext": {
"ltr_log": {
"log_specs": [
{ "name": "main_log", "rescore_index": 0 }
]
},
"ltr_rerank_context": {
"user_segment": 2,
"time_of_day": 14.5,
"user_click_rate": 0.73
}
}
}Key implementation considerations
-
Batching is critical. The processor must send all documents' feature vectors in a single predict call to the remote model. Per-document remote calls would be unacceptably slow.
-
Feature ordering contract. The remote model expects features in a specific order. The processor must map LTR feature names to the model's expected feature indices. The
feature_mappingconfiguration handles this. -
External feature augmentation. The processor supports three sources of features:
- LTR features (per-document): Query-document signals from
_ltrlog. - Document fields (per-document): Read from stored fields / doc values on each hit. Configured via
context.document_fields. Note: these must be stored fields or doc values, not_source-only fields. - Query/user context (per-query): Read from
ext.ltr_rerank_contextin the search request. Same for every document in the batch. Configured viacontext.query_context_fields.
- LTR features (per-document): Query-document signals from
-
Error handling and fallback. If the remote model is unavailable or slow, the processor should support configurable behavior: fail open (return results in original order), fail closed (return error), or timeout with fallback. A configurable timeout on the ML Commons predict call is essential.
-
Batch size limits. SageMaker synchronous inference has a payload size limit (typically 6MB). The processor should enforce a configurable maximum batch size and return a clear error if the rescore window exceeds it.
-
Observability. The processor should emit metrics: remote call latency, batch size, error rates. These should integrate with LTR's existing stats framework (
LTRStats).
Scope and Non-Goals
In scope
- Search pipeline response processor that reads
_ltrlogand calls ML Commons predict API - Batched inference for all documents in a single call
- Configurable feature mapping (LTR feature name → model feature index)
- Optional inclusion of document fields as additional per-document features
- Query-level context features passed via
ext.ltr_rerank_contextin the search request - Configurable timeout and fail-open/fail-closed behavior
- Integration with LTR's stats framework
Not in scope (future work)
- Modifying the LTR plugin's core
LtrRankerscoring path for remote inference - Training workflow integration (model training remains external)
- Multi-stage reranking orchestration (combining LTR rerank with cross-encoder rerank)
- Custom pre/post-processing of model input/output beyond feature mapping
- Caching of remote model scores
Known Limitations
-
Pagination interaction. The rescore
window_sizedetermines how many documents get feature vectors. Documents outside the rescore window are not reranked by the remote model. This is consistent with how LTR rescoring works today, but users should be aware thatfrom/sizepagination beyond the window will not reflect remote-model ordering. -
Numeric features only. The current design supports numeric feature values only (floats). Categorical features must be encoded as numbers by the application before passing them in
ltr_rerank_context. -
Rescore phase used for feature computation. The
sltrrescore query with zero weights is a logging-only pass. This is the established LTR logging pattern, but it means the rescore phase is used for side effects rather than scoring.
Open Questions
-
How should we handle the rescore weight interaction? When
sltris used in the rescore phase for logging, the rescore weights are typically set to 0 (logging only). The processor reranks after the rescore phase. We should document this interaction clearly. -
Should the processor support reading
_ltrlogfrom nested/inner hits? The current logging system supports inner hits in some contexts. Remote reranking of inner hits adds complexity.
Technical Feasibility
We have validated the following aspects of this proposal against the actual ML Commons and OpenSearch codebases:
RemoteInferenceInputDataSetacceptsMap<String, String>with no size limits on values. JSON-serialized feature matrices (~20KB for typical workloads) are well within the 100MB default JSON size limit.- Connector template substitution uses Apache Commons
StringSubstitutorfor simple string replacement. JSON array strings in parameters survive substitution intact — ML Commons explicitly skips escaping for values that are already valid JSON. - The LTR plugin (targeting OpenSearch 3.6) can add
SearchPipelinePluginto its interface list and register aSearchResponseProcessor. neural-search plugin provides an exact reference implementation of this pattern. _ltrlogDocumentFields set during the fetch sub-phase are retained onSearchHitobjects through the response phase, confirmed by existing integration tests.SearchRequest.source().ext()is accessible in aSearchResponseProcessor, confirmed by neural-search'sQueryContextSourceFetcherwhich reads ext builders in the response processor context.
References
- [FEATURE] Support remote inference on LTR plugin #27 — Support remote inference on LTR plugin
- LTR plugin digest #26 — LTR plugin digest
- OpenSearch ML Commons — Remote Model Connectors
- OpenSearch Search Pipelines — Rerank Processor
- OpenSearch Reranking with SageMaker
Metadata
Metadata
Assignees
Labels
Type
Projects
Status