Skip to content

[RFC] Remote Model Reranking with LTR Feature Vectors via ML Commons #291

@rithin-pullela-aws

Description

@rithin-pullela-aws

Summary

This proposal introduces a search pipeline response processor in the Learning to Rank (LTR) plugin that enables reranking search results using a remote ML model (e.g., hosted on Amazon SageMaker) with LTR-computed feature vectors as input. The processor bridges LTR's feature logging capabilities with OpenSearch ML Commons' remote model connector framework.

Related issues:


Motivation

The current customer workflow

A common production pattern among OpenSearch LTR users is:

  1. Define feature sets in the LTR plugin (BM25 scores, term statistics, field matches, etc.)
  2. Execute a search query with LTR feature logging enabled (_ltrlog)
  3. Export the logged feature vectors out of OpenSearch to the application layer
  4. Combine them with external features (user interaction data, session context, time-based signals)
  5. Call a remote model on SageMaker from the application layer
  6. Re-sort the results in the application layer

Steps 3 through 6 happen entirely outside OpenSearch. This introduces:

  • Application complexity — ranking logic is split across the search layer and the application layer
  • Additional latency — extra network round-trips: OpenSearch → Application → SageMaker → Application
  • Operational burden — more moving parts to monitor, debug, and maintain
  • Architectural rigidity — server-side personalization and ranking experiments require application deployments

What this feature enables

Collapsing the above workflow into a single search request:

  1. Define feature sets in the LTR plugin
  2. Execute a search query → LTR logs features → pipeline processor sends feature vectors to a remote model via ML Commons → results return reranked

The ranking logic moves into the search layer, where it belongs.


Value Proposition

Remote models vs. local LTR models

The LTR plugin currently supports local model execution via RankLib (LambdaMART, MART, Random Forests, Linear Regression) and XGBoost. These models run in-process on the OpenSearch node with no network overhead.

Remote model inference via SageMaker is not a replacement for local models. It serves a fundamentally different set of requirements:

Dimension Local LTR Models Remote Models (SageMaker)
Latency Sub-millisecond per-document scoring, no network hop Network round-trip to SageMaker endpoint (typically 10-50ms per batch)
Model flexibility Limited to RankLib and XGBoost model formats Any model architecture: deep learning, gradient boosted ensembles, custom models, LLM-based rankers
Feature set Lucene-computable features only (query-document signals) Can incorporate arbitrary external features (user history, session data, real-time signals) alongside LTR features
Model size Constrained by OpenSearch heap and circuit breaker limits No size constraints; model runs on dedicated ML infrastructure
MLOps Model binary stored in .ltrstore index, manual updates Centralized model registry, A/B testing, automated retraining, monitoring via SageMaker
Best for Low-latency ranking with well-defined query-document features Complex ranking with rich feature sets and advanced model architectures

Key insight: Customers choosing SageMaker over local LTR models are typically not making a latency trade-off. They need capabilities that local models cannot provide — richer feature sets, larger models, and MLOps infrastructure. The two approaches are complementary, not competing.

LTR feature-based reranking vs. existing OpenSearch rerankers

OpenSearch already supports reranking via search pipeline response processors (introduced in 2.12), including cross-encoder reranking with remote models on SageMaker. The question is: what does LTR add?

Dimension Existing Rerank Processor LTR Feature-Based Reranking
Input to model Raw query text + document text Structured, numeric feature vectors computed by LTR
Feature engineering None — the model receives raw text and must learn relevance from scratch Rich, domain-specific features: BM25 per-field, term statistics, custom script features, derived features
Model paradigm Cross-encoders and neural rerankers that operate on text pairs Traditional ML models (XGBoost, LightGBM, linear models) and neural models that operate on feature vectors
Interpretability Opaque — model produces a score from text Transparent — each feature value is logged and inspectable
Latency profile Depends on model inference time (cross-encoders can be expensive) Feature computation is fast (Lucene-native); remote inference is one batched call
Training workflow Requires labeled query-document pairs with raw text Uses feature vectors exported from LTR logging, standard tabular ML training

These are different paradigms, not competing ones. Cross-encoder reranking is ideal when you want a neural model to assess query-document relevance from raw text. LTR feature-based reranking is ideal when you have domain-specific features, need interpretability, or want to combine search signals with external features in a traditional ML model.

A customer might even use both: LTR feature-based reranking for a first-pass rerank with a fast model, followed by a cross-encoder for a final rerank on the top-k results.

Note: OpenSearch also has an ml_inference search response processor (since 2.16) that can call ML Commons from a search pipeline. However, that processor operates on raw document fields — it does not understand LTR's _ltrlog feature vector format. The processor proposed here is purpose-built to parse _ltrlog, apply feature mappings, and batch feature vectors for efficient remote inference.


Architecture Options Considered

Option A: Build the processor in ML Commons

ML Commons already owns the rerank processor and the remote model connector framework. This option would extend the existing rerank processor (or add a new variant) to read structured feature vectors from _ltrlog fields in search results.

Pros:

  • Close to existing rerank infrastructure
  • ML Commons already has the client code for remote model connectors
  • No new plugin dependency graph changes

Cons:

  • ML Commons would need to understand and parse LTR's _ltrlog output format, creating conceptual coupling between the two plugins
  • If the _ltrlog format changes in the LTR plugin, ML Commons would break
  • ML Commons team would need to maintain awareness of LTR's data contract
  • Violates separation of concerns: ML Commons becomes aware of a specific plugin's internal data format

Option B: Build the processor in the LTR plugin, call ML Commons client (Recommended)

The LTR plugin registers a new search pipeline response processor. This processor reads _ltrlog from its own logging system, formats the feature vectors into model input, and calls ML Commons' predict API via its Java client to get scores from the remote model.

Pros:

  • LTR owns both sides of its data — it produces the feature vectors and knows how to consume them
  • ML Commons stays generic — it serves predictions via the predict API without any knowledge of LTR internals
  • Changes to _ltrlog format are internal to one plugin — no cross-plugin breakage
  • LTR becomes a more complete end-to-end ranking solution — feature engineering, local scoring, and remote scoring all in one plugin
  • Clean dependency direction: LTR depends on ML Commons client (stable, public API), not the reverse

Cons:

  • LTR plugin takes a compile-time dependency on the ML Commons client
  • LTR plugin does not currently register any search pipeline processors, so this introduces a new extension pattern for the plugin

Recommendation

Option B is the recommended approach. The core principle is that the producer of the data should own the transformation. LTR creates _ltrlog, so LTR should be the component that knows how to read it, reshape it, and hand it off to a generic prediction service. ML Commons' role is to be that generic service — "give me input, get back scores" — which is exactly what its predict API already does.

The dependency on the ML Commons client is acceptable. ML Commons is a core OpenSearch plugin with a stable client API, and this is a runtime integration, not a deep architectural coupling.


Proposed Design

High-level flow

Search Request
  |
  v
Query Phase — normal retrieval (BM25, etc.)
  |
  v
Rescore Phase — sltr query with feature logging enabled
  |               (computes LTR feature vectors, attaches _ltrlog to hits)
  |
  v
Response Phase
  |
  v
LtrRemoteRerankProcessor (new)
  |  1. Read _ltrlog feature vectors from each hit
  |  2. Optionally merge with additional document field values
  |  3. Read query-level context from ext.ltr_rerank_context (user/session features)
  |  4. Batch all feature vectors into a single model input payload
  |  5. Call ML Commons predict API (routes to SageMaker via connector)
  |  6. Receive scores for all documents
  |  7. Re-sort hits by remote model scores
  |  8. Optionally remove _ltrlog from the response
  |
  v
Reranked Results returned to client

Three feature sources

The processor assembles a feature vector from three sources, covering the feature types common in production ranking systems:

Feature Type Source Per-document? Example
Query-document features _ltrlog (LTR feature logging) Yes title_bm25, body_match, term_stat
Document features Stored fields on each hit Yes popularity, category, price
Query/user features ext.ltr_rerank_context in the search request No (same for all docs) user_segment, time_of_day, user_click_rate

Query/user features are passed by the application at search time — they represent user preferences, session state, or external signals not stored in the index. The processor appends them to every document's feature vector.

Search pipeline configuration (example)

PUT /_search/pipeline/ltr-remote-rerank
{
  "response_processors": [
    {
      "ltr_remote_rerank": {
        "model_id": "<ml-commons-remote-model-id>",
        "ltr_log_name": "main_log",
        "feature_mapping": {
          "title_bm25": 0,
          "body_bm25": 1,
          "title_match": 2
        },
        "remove_ltr_log": true,
        "context": {
          "document_fields": ["category", "popularity"],
          "query_context_fields": ["user_segment", "time_of_day", "user_click_rate"]
        }
      }
    }
  ]
}

Search request (example)

POST /my-index/_search?search_pipeline=ltr-remote-rerank
{
  "query": { "match": { "title": "wireless headphones" } },
  "rescore": {
    "window_size": 50,
    "query": {
      "rescore_query": {
        "sltr": {
          "_name": "logged_features",
          "featureset": "my_feature_set",
          "params": { "keywords": "wireless headphones" }
        }
      },
      "query_weight": 0,
      "rescore_query_weight": 0
    }
  },
  "ext": {
    "ltr_log": {
      "log_specs": [
        { "name": "main_log", "rescore_index": 0 }
      ]
    },
    "ltr_rerank_context": {
      "user_segment": 2,
      "time_of_day": 14.5,
      "user_click_rate": 0.73
    }
  }
}

Key implementation considerations

  1. Batching is critical. The processor must send all documents' feature vectors in a single predict call to the remote model. Per-document remote calls would be unacceptably slow.

  2. Feature ordering contract. The remote model expects features in a specific order. The processor must map LTR feature names to the model's expected feature indices. The feature_mapping configuration handles this.

  3. External feature augmentation. The processor supports three sources of features:

    • LTR features (per-document): Query-document signals from _ltrlog.
    • Document fields (per-document): Read from stored fields / doc values on each hit. Configured via context.document_fields. Note: these must be stored fields or doc values, not _source-only fields.
    • Query/user context (per-query): Read from ext.ltr_rerank_context in the search request. Same for every document in the batch. Configured via context.query_context_fields.
  4. Error handling and fallback. If the remote model is unavailable or slow, the processor should support configurable behavior: fail open (return results in original order), fail closed (return error), or timeout with fallback. A configurable timeout on the ML Commons predict call is essential.

  5. Batch size limits. SageMaker synchronous inference has a payload size limit (typically 6MB). The processor should enforce a configurable maximum batch size and return a clear error if the rescore window exceeds it.

  6. Observability. The processor should emit metrics: remote call latency, batch size, error rates. These should integrate with LTR's existing stats framework (LTRStats).


Scope and Non-Goals

In scope

  • Search pipeline response processor that reads _ltrlog and calls ML Commons predict API
  • Batched inference for all documents in a single call
  • Configurable feature mapping (LTR feature name → model feature index)
  • Optional inclusion of document fields as additional per-document features
  • Query-level context features passed via ext.ltr_rerank_context in the search request
  • Configurable timeout and fail-open/fail-closed behavior
  • Integration with LTR's stats framework

Not in scope (future work)

  • Modifying the LTR plugin's core LtrRanker scoring path for remote inference
  • Training workflow integration (model training remains external)
  • Multi-stage reranking orchestration (combining LTR rerank with cross-encoder rerank)
  • Custom pre/post-processing of model input/output beyond feature mapping
  • Caching of remote model scores

Known Limitations

  1. Pagination interaction. The rescore window_size determines how many documents get feature vectors. Documents outside the rescore window are not reranked by the remote model. This is consistent with how LTR rescoring works today, but users should be aware that from/size pagination beyond the window will not reflect remote-model ordering.

  2. Numeric features only. The current design supports numeric feature values only (floats). Categorical features must be encoded as numbers by the application before passing them in ltr_rerank_context.

  3. Rescore phase used for feature computation. The sltr rescore query with zero weights is a logging-only pass. This is the established LTR logging pattern, but it means the rescore phase is used for side effects rather than scoring.


Open Questions

  1. How should we handle the rescore weight interaction? When sltr is used in the rescore phase for logging, the rescore weights are typically set to 0 (logging only). The processor reranks after the rescore phase. We should document this interaction clearly.

  2. Should the processor support reading _ltrlog from nested/inner hits? The current logging system supports inner hits in some contexts. Remote reranking of inner hits adds complexity.


Technical Feasibility

We have validated the following aspects of this proposal against the actual ML Commons and OpenSearch codebases:

  • RemoteInferenceInputDataSet accepts Map<String, String> with no size limits on values. JSON-serialized feature matrices (~20KB for typical workloads) are well within the 100MB default JSON size limit.
  • Connector template substitution uses Apache Commons StringSubstitutor for simple string replacement. JSON array strings in parameters survive substitution intact — ML Commons explicitly skips escaping for values that are already valid JSON.
  • The LTR plugin (targeting OpenSearch 3.6) can add SearchPipelinePlugin to its interface list and register a SearchResponseProcessor. neural-search plugin provides an exact reference implementation of this pattern.
  • _ltrlog DocumentFields set during the fetch sub-phase are retained on SearchHit objects through the response phase, confirmed by existing integration tests.
  • SearchRequest.source().ext() is accessible in a SearchResponseProcessor, confirmed by neural-search's QueryContextSourceFetcher which reads ext builders in the response processor context.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions