[RFC] Remote Model Reranking with LTR Feature Vectors via ML Commons

## Summary

This proposal introduces a search pipeline response processor in the Learning to Rank (LTR) plugin that enables reranking search results using a remote ML model (e.g., hosted on Amazon SageMaker) with LTR-computed feature vectors as input. The processor bridges LTR's feature logging capabilities with OpenSearch ML Commons' remote model connector framework.

**Related issues:**
- #27 — Support remote inference on LTR plugin

---

## Motivation

### The current customer workflow

A common production pattern among OpenSearch LTR users is:

1. Define feature sets in the LTR plugin (BM25 scores, term statistics, field matches, etc.)
2. Execute a search query with LTR feature logging enabled (`_ltrlog`)
3. **Export** the logged feature vectors out of OpenSearch to the application layer
4. **Combine** them with external features (user interaction data, session context, time-based signals)
5. **Call** a remote model on SageMaker from the application layer
6. **Re-sort** the results in the application layer

Steps 3 through 6 happen entirely outside OpenSearch. This introduces:

- **Application complexity** — ranking logic is split across the search layer and the application layer
- **Additional latency** — extra network round-trips: OpenSearch → Application → SageMaker → Application
- **Operational burden** — more moving parts to monitor, debug, and maintain
- **Architectural rigidity** — server-side personalization and ranking experiments require application deployments

### What this feature enables

Collapsing the above workflow into a single search request:

1. Define feature sets in the LTR plugin
2. Execute a search query → LTR logs features → pipeline processor sends feature vectors to a remote model via ML Commons → results return reranked

The ranking logic moves into the search layer, where it belongs.

---

## Value Proposition

### Remote models vs. local LTR models

The LTR plugin currently supports local model execution via RankLib (LambdaMART, MART, Random Forests, Linear Regression) and XGBoost. These models run in-process on the OpenSearch node with no network overhead.

Remote model inference via SageMaker is **not** a replacement for local models. It serves a fundamentally different set of requirements:

| Dimension | Local LTR Models | Remote Models (SageMaker) |
|---|---|---|
| **Latency** | Sub-millisecond per-document scoring, no network hop | Network round-trip to SageMaker endpoint (typically 10-50ms per batch) |
| **Model flexibility** | Limited to RankLib and XGBoost model formats | Any model architecture: deep learning, gradient boosted ensembles, custom models, LLM-based rankers |
| **Feature set** | Lucene-computable features only (query-document signals) | Can incorporate arbitrary external features (user history, session data, real-time signals) alongside LTR features |
| **Model size** | Constrained by OpenSearch heap and circuit breaker limits | No size constraints; model runs on dedicated ML infrastructure |
| **MLOps** | Model binary stored in `.ltrstore` index, manual updates | Centralized model registry, A/B testing, automated retraining, monitoring via SageMaker |
| **Best for** | Low-latency ranking with well-defined query-document features | Complex ranking with rich feature sets and advanced model architectures |

**Key insight:** Customers choosing SageMaker over local LTR models are typically not making a latency trade-off. They need capabilities that local models cannot provide — richer feature sets, larger models, and MLOps infrastructure. The two approaches are complementary, not competing.

### LTR feature-based reranking vs. existing OpenSearch rerankers

OpenSearch already supports reranking via search pipeline response processors (introduced in 2.12), including cross-encoder reranking with remote models on SageMaker. The question is: what does LTR add?

| Dimension | Existing Rerank Processor | LTR Feature-Based Reranking |
|---|---|---|
| **Input to model** | Raw query text + document text | Structured, numeric feature vectors computed by LTR |
| **Feature engineering** | None — the model receives raw text and must learn relevance from scratch | Rich, domain-specific features: BM25 per-field, term statistics, custom script features, derived features |
| **Model paradigm** | Cross-encoders and neural rerankers that operate on text pairs | Traditional ML models (XGBoost, LightGBM, linear models) and neural models that operate on feature vectors |
| **Interpretability** | Opaque — model produces a score from text | Transparent — each feature value is logged and inspectable |
| **Latency profile** | Depends on model inference time (cross-encoders can be expensive) | Feature computation is fast (Lucene-native); remote inference is one batched call |
| **Training workflow** | Requires labeled query-document pairs with raw text | Uses feature vectors exported from LTR logging, standard tabular ML training |

**These are different paradigms, not competing ones.** Cross-encoder reranking is ideal when you want a neural model to assess query-document relevance from raw text. LTR feature-based reranking is ideal when you have domain-specific features, need interpretability, or want to combine search signals with external features in a traditional ML model.

A customer might even use both: LTR feature-based reranking for a first-pass rerank with a fast model, followed by a cross-encoder for a final rerank on the top-k results.

> **Note:** OpenSearch also has an `ml_inference` search response processor (since 2.16) that can call ML Commons from a search pipeline. However, that processor operates on raw document fields — it does not understand LTR's `_ltrlog` feature vector format. The processor proposed here is purpose-built to parse `_ltrlog`, apply feature mappings, and batch feature vectors for efficient remote inference.

---

## Architecture Options Considered

### Option A: Build the processor in ML Commons

ML Commons already owns the rerank processor and the remote model connector framework. This option would extend the existing rerank processor (or add a new variant) to read structured feature vectors from `_ltrlog` fields in search results.

**Pros:**
- Close to existing rerank infrastructure
- ML Commons already has the client code for remote model connectors
- No new plugin dependency graph changes

**Cons:**
- ML Commons would need to understand and parse LTR's `_ltrlog` output format, creating conceptual coupling between the two plugins
- If the `_ltrlog` format changes in the LTR plugin, ML Commons would break
- ML Commons team would need to maintain awareness of LTR's data contract
- Violates separation of concerns: ML Commons becomes aware of a specific plugin's internal data format

### Option B: Build the processor in the LTR plugin, call ML Commons client (Recommended)

The LTR plugin registers a new search pipeline response processor. This processor reads `_ltrlog` from its own logging system, formats the feature vectors into model input, and calls ML Commons' predict API via its Java client to get scores from the remote model.

**Pros:**
- **LTR owns both sides of its data** — it produces the feature vectors and knows how to consume them
- **ML Commons stays generic** — it serves predictions via the predict API without any knowledge of LTR internals
- **Changes to `_ltrlog` format are internal to one plugin** — no cross-plugin breakage
- **LTR becomes a more complete end-to-end ranking solution** — feature engineering, local scoring, and remote scoring all in one plugin
- Clean dependency direction: LTR depends on ML Commons client (stable, public API), not the reverse

**Cons:**
- LTR plugin takes a compile-time dependency on the ML Commons client
- LTR plugin does not currently register any search pipeline processors, so this introduces a new extension pattern for the plugin

### Recommendation

**Option B** is the recommended approach. The core principle is that the producer of the data should own the transformation. LTR creates `_ltrlog`, so LTR should be the component that knows how to read it, reshape it, and hand it off to a generic prediction service. ML Commons' role is to be that generic service — "give me input, get back scores" — which is exactly what its predict API already does.

The dependency on the ML Commons client is acceptable. ML Commons is a core OpenSearch plugin with a stable client API, and this is a runtime integration, not a deep architectural coupling.

---

## Proposed Design

### High-level flow

```
Search Request
  |
  v
Query Phase — normal retrieval (BM25, etc.)
  |
  v
Rescore Phase — sltr query with feature logging enabled
  |               (computes LTR feature vectors, attaches _ltrlog to hits)
  |
  v
Response Phase
  |
  v
LtrRemoteRerankProcessor (new)
  |  1. Read _ltrlog feature vectors from each hit
  |  2. Optionally merge with additional document field values
  |  3. Read query-level context from ext.ltr_rerank_context (user/session features)
  |  4. Batch all feature vectors into a single model input payload
  |  5. Call ML Commons predict API (routes to SageMaker via connector)
  |  6. Receive scores for all documents
  |  7. Re-sort hits by remote model scores
  |  8. Optionally remove _ltrlog from the response
  |
  v
Reranked Results returned to client
```

### Three feature sources

The processor assembles a feature vector from three sources, covering the feature types common in production ranking systems:

| Feature Type | Source | Per-document? | Example |
|---|---|---|---|
| **Query-document features** | `_ltrlog` (LTR feature logging) | Yes | `title_bm25`, `body_match`, `term_stat` |
| **Document features** | Stored fields on each hit | Yes | `popularity`, `category`, `price` |
| **Query/user features** | `ext.ltr_rerank_context` in the search request | No (same for all docs) | `user_segment`, `time_of_day`, `user_click_rate` |

Query/user features are passed by the application at search time — they represent user preferences, session state, or external signals not stored in the index. The processor appends them to every document's feature vector.

### Search pipeline configuration (example)

```json
PUT /_search/pipeline/ltr-remote-rerank
{
  "response_processors": [
    {
      "ltr_remote_rerank": {
        "model_id": "<ml-commons-remote-model-id>",
        "ltr_log_name": "main_log",
        "feature_mapping": {
          "title_bm25": 0,
          "body_bm25": 1,
          "title_match": 2
        },
        "remove_ltr_log": true,
        "context": {
          "document_fields": ["category", "popularity"],
          "query_context_fields": ["user_segment", "time_of_day", "user_click_rate"]
        }
      }
    }
  ]
}
```

### Search request (example)

```json
POST /my-index/_search?search_pipeline=ltr-remote-rerank
{
  "query": { "match": { "title": "wireless headphones" } },
  "rescore": {
    "window_size": 50,
    "query": {
      "rescore_query": {
        "sltr": {
          "_name": "logged_features",
          "featureset": "my_feature_set",
          "params": { "keywords": "wireless headphones" }
        }
      },
      "query_weight": 0,
      "rescore_query_weight": 0
    }
  },
  "ext": {
    "ltr_log": {
      "log_specs": [
        { "name": "main_log", "rescore_index": 0 }
      ]
    },
    "ltr_rerank_context": {
      "user_segment": 2,
      "time_of_day": 14.5,
      "user_click_rate": 0.73
    }
  }
}
```

### Key implementation considerations

1. **Batching is critical.** The processor must send all documents' feature vectors in a single predict call to the remote model. Per-document remote calls would be unacceptably slow.

2. **Feature ordering contract.** The remote model expects features in a specific order. The processor must map LTR feature names to the model's expected feature indices. The `feature_mapping` configuration handles this.

3. **External feature augmentation.** The processor supports three sources of features:
   - **LTR features** (per-document): Query-document signals from `_ltrlog`.
   - **Document fields** (per-document): Read from stored fields / doc values on each hit. Configured via `context.document_fields`. Note: these must be stored fields or doc values, not `_source`-only fields.
   - **Query/user context** (per-query): Read from `ext.ltr_rerank_context` in the search request. Same for every document in the batch. Configured via `context.query_context_fields`.

4. **Error handling and fallback.** If the remote model is unavailable or slow, the processor should support configurable behavior: fail open (return results in original order), fail closed (return error), or timeout with fallback. A configurable timeout on the ML Commons predict call is essential.

5. **Batch size limits.** SageMaker synchronous inference has a payload size limit (typically 6MB). The processor should enforce a configurable maximum batch size and return a clear error if the rescore window exceeds it.

6. **Observability.** The processor should emit metrics: remote call latency, batch size, error rates. These should integrate with LTR's existing stats framework (`LTRStats`).

---

## Scope and Non-Goals

### In scope
- Search pipeline response processor that reads `_ltrlog` and calls ML Commons predict API
- Batched inference for all documents in a single call
- Configurable feature mapping (LTR feature name → model feature index)
- Optional inclusion of document fields as additional per-document features
- Query-level context features passed via `ext.ltr_rerank_context` in the search request
- Configurable timeout and fail-open/fail-closed behavior
- Integration with LTR's stats framework

### Not in scope (future work)
- Modifying the LTR plugin's core `LtrRanker` scoring path for remote inference
- Training workflow integration (model training remains external)
- Multi-stage reranking orchestration (combining LTR rerank with cross-encoder rerank)
- Custom pre/post-processing of model input/output beyond feature mapping
- Caching of remote model scores

---

## Known Limitations

1. **Pagination interaction.** The rescore `window_size` determines how many documents get feature vectors. Documents outside the rescore window are not reranked by the remote model. This is consistent with how LTR rescoring works today, but users should be aware that `from`/`size` pagination beyond the window will not reflect remote-model ordering.

2. **Numeric features only.** The current design supports numeric feature values only (floats). Categorical features must be encoded as numbers by the application before passing them in `ltr_rerank_context`.

3. **Rescore phase used for feature computation.** The `sltr` rescore query with zero weights is a logging-only pass. This is the established LTR logging pattern, but it means the rescore phase is used for side effects rather than scoring.

---

## Open Questions

1. **How should we handle the rescore weight interaction?** When `sltr` is used in the rescore phase for logging, the rescore weights are typically set to 0 (logging only). The processor reranks after the rescore phase. We should document this interaction clearly.

2. **Should the processor support reading `_ltrlog` from nested/inner hits?** The current logging system supports inner hits in some contexts. Remote reranking of inner hits adds complexity.

---

## Technical Feasibility

We have validated the following aspects of this proposal against the actual ML Commons and OpenSearch codebases:

- **`RemoteInferenceInputDataSet`** accepts `Map<String, String>` with no size limits on values. JSON-serialized feature matrices (~20KB for typical workloads) are well within the 100MB default JSON size limit.
- **Connector template substitution** uses Apache Commons `StringSubstitutor` for simple string replacement. JSON array strings in parameters survive substitution intact — ML Commons explicitly skips escaping for values that are already valid JSON.
- **The LTR plugin** (targeting OpenSearch 3.6) can add `SearchPipelinePlugin` to its interface list and register a `SearchResponseProcessor`. neural-search plugin provides an exact reference implementation of this pattern.
- **`_ltrlog` DocumentFields** set during the fetch sub-phase are retained on `SearchHit` objects through the response phase, confirmed by existing integration tests.
- **`SearchRequest.source().ext()`** is accessible in a `SearchResponseProcessor`, confirmed by neural-search's `QueryContextSourceFetcher` which reads ext builders in the response processor context.

---

## References

- #27 — Support remote inference on LTR plugin
- #26 — LTR plugin digest
- [OpenSearch ML Commons — Remote Model Connectors](https://docs.opensearch.org/latest/ml-commons-plugin/remote-models/connectors/)
- [OpenSearch Search Pipelines — Rerank Processor](https://docs.opensearch.org/latest/search-plugins/search-pipelines/rerank-processor/)
- [OpenSearch Reranking with SageMaker](https://docs.opensearch.org/latest/tutorials/reranking/reranking-sagemaker/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Remote Model Reranking with LTR Feature Vectors via ML Commons #291

Summary

Motivation

The current customer workflow

What this feature enables

Value Proposition

Remote models vs. local LTR models

LTR feature-based reranking vs. existing OpenSearch rerankers

Architecture Options Considered

Option A: Build the processor in ML Commons

Option B: Build the processor in the LTR plugin, call ML Commons client (Recommended)

Recommendation

Proposed Design

High-level flow

Three feature sources

Search pipeline configuration (example)

Search request (example)

Key implementation considerations

Scope and Non-Goals

In scope

Not in scope (future work)

Known Limitations

Open Questions

Technical Feasibility

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dimension	Local LTR Models	Remote Models (SageMaker)
Latency	Sub-millisecond per-document scoring, no network hop	Network round-trip to SageMaker endpoint (typically 10-50ms per batch)
Model flexibility	Limited to RankLib and XGBoost model formats	Any model architecture: deep learning, gradient boosted ensembles, custom models, LLM-based rankers
Feature set	Lucene-computable features only (query-document signals)	Can incorporate arbitrary external features (user history, session data, real-time signals) alongside LTR features
Model size	Constrained by OpenSearch heap and circuit breaker limits	No size constraints; model runs on dedicated ML infrastructure
MLOps	Model binary stored in `.ltrstore` index, manual updates	Centralized model registry, A/B testing, automated retraining, monitoring via SageMaker
Best for	Low-latency ranking with well-defined query-document features	Complex ranking with rich feature sets and advanced model architectures

Dimension	Existing Rerank Processor	LTR Feature-Based Reranking
Input to model	Raw query text + document text	Structured, numeric feature vectors computed by LTR
Feature engineering	None — the model receives raw text and must learn relevance from scratch	Rich, domain-specific features: BM25 per-field, term statistics, custom script features, derived features
Model paradigm	Cross-encoders and neural rerankers that operate on text pairs	Traditional ML models (XGBoost, LightGBM, linear models) and neural models that operate on feature vectors
Interpretability	Opaque — model produces a score from text	Transparent — each feature value is logged and inspectable
Latency profile	Depends on model inference time (cross-encoders can be expensive)	Feature computation is fast (Lucene-native); remote inference is one batched call
Training workflow	Requires labeled query-document pairs with raw text	Uses feature vectors exported from LTR logging, standard tabular ML training

Feature Type	Source	Per-document?	Example
Query-document features	`_ltrlog` (LTR feature logging)	Yes	`title_bm25`, `body_match`, `term_stat`
Document features	Stored fields on each hit	Yes	`popularity`, `category`, `price`
Query/user features	`ext.ltr_rerank_context` in the search request	No (same for all docs)	`user_segment`, `time_of_day`, `user_click_rate`

[RFC] Remote Model Reranking with LTR Feature Vectors via ML Commons #291

Description

Summary

Motivation

The current customer workflow

What this feature enables

Value Proposition

Remote models vs. local LTR models

LTR feature-based reranking vs. existing OpenSearch rerankers

Architecture Options Considered

Option A: Build the processor in ML Commons

Option B: Build the processor in the LTR plugin, call ML Commons client (Recommended)

Recommendation

Proposed Design

High-level flow

Three feature sources

Search pipeline configuration (example)

Search request (example)

Key implementation considerations

Scope and Non-Goals

In scope

Not in scope (future work)

Known Limitations

Open Questions

Technical Feasibility

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions