[RFC] HLD for Dynamic Weight Optimization for Hybrid Search

# Dynamic Weight Optimization for Hybrid Search

## Introduction

Hybrid search combines multiple query types, like keyword and neural search, to improve search relevance. In 2.11, the team has release hybrid query that is [part of the neural-search plugin](https://opensearch.org/docs/latest/search-plugins/hybrid-search/). Main responsibility of the hybrid query is to return scores of multiple queries that are normalized and combined.

Main way of improving relevance of the hybrid search results is through sub-query weights. By assigning greater or less coefficient to lexical and semantic sub-queries we can increase or decrease their respective contribution to the final combined document score. Initially, identifying weights is the user's responsibility. The Search Relevance Workbench introduced a straightforward approach for identifying suitable hybrid search parameters by trying out a hard-coded set of different alternatives with a predefined query set and hybrid search configuration.

## Problem Statement

With hybrid experiment users can identify optimal weights that are best on average for the whole set of documents and queries. However, this approach produces exactly one parameter combination as “the best” one for all queries. With this parameter combination, some queries will benefit (as in their search quality metrics improve) and other queries will not benefit (as in their search quality metrics decrease) as described in the [_“Hybrid Search Optimization” blog post_](https://opensearch.org/blog/hybrid-search-optimization/).

**This RFC proposes a framework for dynamic weight optimization** that can predict query-specific weights. The initial implementation establishes the foundation for this capability, with the expectation that future iterations will improve prediction accuracy through enhanced features and models.

## Requirements

Dynamic hybrid search optimization is a search relevance tuning operation for advanced users that requires machine learning knowledge, specifically around feature selection and engineering processes and model training.

#### Functional Requirements

* system should predict weights for lexical and semantic parts of hybrid query at “per query” level
* weight prediction is made based on the query text, we’re adding limitation that such text must be identical among all sub-queries
* fallback to pre-configured global optimization weights in case of dynamic weights are not available
* framework shall support extensible feature engineering for future enhancements

#### Non functional requirements

* minimize added latency. target is <10ms additional latency for weight prediction measured by 95th percentile query latency increase
* provide clear extension points for improved models in future versions

## Out of Scope

* details of how scores are normalized and combined for hybrid query, we are using existing OpenSearch techniques
* details of training of ML models involved into weights prediction. We care about them as building blocks for our solution and threat then as black boxes 

## Current State

Currently weights for score combination are predicted globally for the whole dataset based on the average relevance metrics from the test query set. 

<img width="256" height="631" alt="Image" src="https://github.com/user-attachments/assets/228c8be5-b5f7-45d5-b90b-f0ab707d5abe" />


## Solution Overview

To achieve best results in terms of relevance we propose to use ML techniques for weight’s prediction. Such predicting has the requirements that all ML-powered applications have:

* Feature selection and feature engineering
* Model training and serving
* Model inference

This proposal introduces a foundational framework for ML-based weight prediction in hybrid search. The framework supports:

* **extensible feature engineering** - starting with basic query features
* **pluggable model architecture** - beginning with linear regression  
* **scalable inference pipeline** - designed for future enhancements

**Version 1 Focus**: Establish the core framework and processing pipeline, providing a baseline for future improvements rather than optimizing for immediate performance gains.

Following diagram shows high level flow that is needed to train weight prediction model and use it in hybrid queries.

<img width="831" height="444" alt="Image" src="https://github.com/user-attachments/assets/a847d066-3b24-453e-aee4-e123eaa02ecf" />

Following main components are part of the future framework 

**Feature engineering pipeline**
The framework provides a **standardized feature extraction system** for hybrid queries:

* **query-level features**: Length, token count, presence of numbers/special characters
* **result-level features**: BM25 scores, semantic similarity scores, result counts
* **extensibility**: Clear interfaces for adding domain-specific features in future versions

**Model integration architecture** 
Version 1 supports **embedded linear regression models** with:

* **co-located processing**: Models execute within the search pipeline for minimal latency
* **cluster state storage**: Model parameters stored as lightweight cluster metadata
* **fallback mechanism**: Automatic reversion to static weights when prediction fails

**Weight application system**
Predicted weights integrate with existing hybrid search components:

* **normalization processor enhancement**: Accepts dynamic weights alongside static configuration
* **score combination**: Applies predicted weights during arithmetic mean calculation
* **query DSL compatibility**: Works with existing hybrid query syntax

### Option 1: Embedded scorer [Recommended]

Model is trained inside OpenSearch. Trained model parameters are stored separately and model logic implemented as Java code. Combined logic and model parameters is what we call embedded scorer. 

Other component of this solution:

* normalization processor accepts weights for combination as dynamic parameters
* need categorization mechanism for query type: lexical/semantic
* predefined query template to make sure same query test between sub-queries

Following are query features we use for model training

* **Basic features**
    * query length
    * token count
    * has numbers (boolean)
    * has special characters (boolean)
* **Lexical search result features**
    * number of results for the lexical query.
    * maximum title score: maximum score of the titles of the retrieved top 10 documents. The scores are BM25 scores calculated individually per result set. That means that the BM25 score is not calculated on the whole index but only on the retrieved subset for the query, making the scores more comparable to each other and less prone to outliers that could result from high IDF values for very rare query terms.
    * sum of the title scores of the top 10 documents, again calculated per result set. We use the sum of the scores (and no average value) as an aggregate to measure how relevant all retrieved top 10 titles are. BM25 scores are not normalized, so using the sum instead of the average seemed reasonable.
* **Neural search result features**
    * maximum semantic score of the retrieved top 10 documents. This is the score we receive for a neural query based on the query’s similarity to the title.
    * average semantic score: In contrast to BM25 scores, the semantic scores are normalized and in the range of 0 to 1. Using the average score seems more reasonable than attempting to calculate the sum.
* Other less common domain specific features, evaluation is need if those features are effective and can be collected from dataset: currency, size, SKU, is question, is medical acronym, has citation, is stock ticker, has price

#### High level workflow

* Upload data set [Core OpenSearch]
* Add user queries, OpenSearch DSL query and judgments to search relevance workbench. They are stored as query set, search configuration and judgment ratings. [Search Relevance Workbench]
* Train hybrid query weights prediction model, pass query set, search configuration and judgment ratings ids. Model metadata is stored as part of the search relevance internal index (or maybe we just keep it in memory). This should be exportable json-like document [Search Relevance Workbench]
* Import model parameters and store them as part of the cluster state. Create embedded scorer using those stored parameters. [Core OpenSearch]
* At query time identify if query needs dynamic optimization, call embedded scorer (read model parameters) and apply weights to hybrid query [Neural search]

<img width="869" height="524" alt="Image" src="https://github.com/user-attachments/assets/58e1edb5-4efe-4434-877b-f12b1df898ec" />

**Pros**:

* fast due to co-location with neural-search plugin code, no transport calls and de/serializations
* no model deployment and connector needed, which is important for managed cloud environments with limited extensibility
* depending on how model parameters are stored, they can be manually editable with no need to retrain the model

**Cons**:

* limited set of query features are supported (only queries defined as part of model training)
* limited model types are supported due to complexity of internal model logic. linear regression
* need separate persistent mechanism to store data extracted from model
* less error prone comparing to pre-trained models because we need new component that receives the model and does the calculations
* categorization mechanism for query type (lexical/semantic) is limited
* limitations on query text variability (text must be same between sub-queries)

### Option 2:  External simple model 

Similar to Option 1 except the weights predicting model is accessed via ml-commons. Model can be simple like linear regression, that is deployed locally, or larger LLM that hosted remotely.

<img width="681" height="530" alt="Image" src="https://github.com/user-attachments/assets/91189b24-4053-4e66-b5a6-f89fed48b708" />

**Pros**:

* flexibility, virtually any model type is supported
* more error prone, no need in extra steps of converting model to java code (no embedded scorer) and store model parameters in OpenSearch
* simpler implementation, higher use of existing components 

**Cons**:

* extra latency due to remote predict calls to model
* limited set of query features are supported (only queries defined as part of model training)
* categorization mechanism for query type (lexical/semantic) is limited
* limitations on query text variability (text must be same between sub-queries)
* need extra setup for model connector 
* may not work in restricted deployment environments due to external model hosting requirements

### Option 3: External LLM

This option is making next step comparing to Option 2 - instead of simple model trained on exact dataset using query features we can use LLM and throw the whole query text there. 

<img width="439" height="564" alt="Image" src="https://github.com/user-attachments/assets/a5430b19-56ea-435f-8ac9-b76887b50f62" />

Other component of this solution:

* normalization processor accepts weights for combination as dynamic parameters
* predefined query template to make sure same query test between sub-queries
* prompt for LLM

**Pros**:

* flexibility, virtually any model type is supported
* simplest option: no need to train model, no need in extra steps of converting model to java code (no embedded scorer), no need to store model detail in OpenSearch
* less dependent on query text features
* potentially we can predict which techniques provide the best relevance 

**Cons**:

* extra latency due to remote predict calls to model, presumably higher then in Option 2 (100+ ms)
* potentially limited throughput, model can throttle requests due to high resource utilization
* limitations on query text variability (text must be same between sub-queries)
* need extra setup for model connector 
* may not work in restricted deployment environments due to external model hosting requirements

### Solution Comparison

Solutions are offering a tradeoff between flexibility and performance. 

Solutions for dynamic optimizer - comparison table


Criteria | Option 1: Embedded Scorer (Recommended) | Option 2: External Simple Model | Option 3: External LLM
-- | -- | -- | --
Performance characteristics |   |   |  
Latency | Low - Co-located with neural-search plugin | Medium - Network calls required | High - LLM inference time plus network overhead
Throughput | High | Medium - Limited by external service | Low - Potential throttling from LLM service
Resource utilization | Low - Minimal overhead | Medium | High - LLMs require significant resources
  |   |   |  
Implementation |   |   |  
Complexity | Medium - Need to convert models to Java code | Low - Uses standard model interfaces | Low - Uses standard LLM APIs
Model types supported | Limited - Primarily linear regression | High - Any supported model type | High - LLMs with prompt engineering
Feature engineering effort | High - Careful feature selection needed | High - Same as Option 1 | Low - LLM can process raw queries
  |   |   |  
Operational considerations |   |   |  
Managed environment compatibility | Yes | Limited - Depends on connector | Limited - Depends on connector
External dependencies | None | Required - Model hosting service | Required - LLM API service
Model management | Complex - Need persistence mechanism | Simple - Managed externally | Simple - Managed externally
Infrastructure requirements | Minimal | Moderate - Model hosting | High - LLM infrastructure
  |   |   |  
Capabilities |   |   |  
Model sophistication | Basic | Moderate | Advanced
Adaptability to query variations | Limited | Limited | High - LLMs handle text variations well
Contextualization | Low | Low | High - Can understand query intent
Feature utilization | Limited to engineered features | Limited to engineered features | Can extract features from raw text
  |   |   |  
Constraints |   |   |  
Query text consistency requirements | High - Text must be same between sub-queries | High - Text must be same between sub-queries | Medium - More tolerant of variations
Setup complexity | Low | Medium - Requires connector setup | High - LLM integration and prompt engineering
Maintainability | Medium - Need to update embedded code | High - External model updates are seamless | High - LLM updates managed by provider
Error handling complexity | High - Internal errors harder to debug | Medium | Medium

Based on how well each solution fits the criteria categories we can conclude these recommendations: 

**Option 1 (Embedded Scorer)** is recommended for most use cases due to:

* best performance characteristics with minimal latency
* no external dependencies making it compatible with all deployment scenarios including managed cloud environments
* simplest operational deployment

**Option 2** can be reconsidered when:

* more sophisticated models beyond linear regression are required
* external model management infrastructure already exists
* performance is not the primary concern

**Option 3** can be reconsidered when:

* query variations are significant
* deep understanding of query semantics is required
* performance can be traded for higher accuracy
* external LLM infrastructure is already in place

### Key Design Decisions

All following decisions are for recommended solution option.

1. **How model data is stored**

We can use cluster state. Model metadata is relatively small (few Kb, liner regression model for ESCI dataset was 880 bytes). This storage survives node crush and cluster restart. Can be retrieved and tweaked by user if needed.

1. **How identify sub-query class**

We can create a registry of the queries and corresponding types, e.g. match → lexical, neural/knn - semantic etc. In case of compound or complex query we ignore dynamic optimization and fall back to static weights. Another option to explore is register type for each query class and explore it with visitor pattern.

1. **How extract  query text from OpenSearch query DSL**

Registry of query types with keywords where the query text can be extracted. We fall back to static weights or fail in case there are multiple different query texts or some unknown query type.

1. **How compare relevance metrics during model training**

We rely on user provided judgments for dataset and queries. Any document query pair that does not have judgment rating considered to by irrelevant (effectively judgment rating is 0.0). If judgments are missing user can generate them using Search Relevance Workbench and generate LLM judgment functionality. 

### Open Questions

**Which ML simple model is most effective**

For initial version we need to pick one model type that is:

* relatively simple to be converted into Java code
* provide most relevant results for random/general datasets

While doing POC I tested following models using ESCI dataset

* linear regression
* logistic regression
* gradient_boosting
* random forest (tested for comparison, will be hard to convert to Java)

Following table represents data collected out of that POC<br>
Model Type | Accuracy (NDCG@10) | Training Time | Inference Latency | Interpretability | Implementation Complexity | POC Suitability
-- | -- | -- | -- | -- | -- | --
Linear Regression | 0.82 | <1 sec | <5ms | High | Simple | Excellent
Random Forest | 0.87 | 5-10 sec | 15-20ms | Medium | Moderate | Good
Neural Network | 0.89 | 30-60 sec | 25-30ms | Low | Complex | Poor
XGBoost | 0.88 | 10-15 sec | 10-15ms | Low-Medium | Moderate | Fair


### Short Term/Mid Term/Long Term implementation

In Short term we can start with Option 1 implementation where model is stored locally in the cluster. Few other limited solution options that make sense for Short term implementation:

* Use only basic query features (only those that can be extracted from the query text itself)
* Model type for embedded scorer is fixed, exact type will be identified based on benchmark data
* Judgment ratings (aka ground truth) are provided by user, we rely on quality of those judgments

In Mid/Long term we will add Option 2 as additional mode of dynamic optimization. This should increase variety of supported models for advanced users. Such change should be backward compatible, but will have limited support in Serverless. More features that are planned for later phases:

* complex query features for embedded scorer model

## Potential Issues

### Known limitations and Future extensions

With recommended solution options following limitations can be assumed

* support for limited model types trained on query features: 
    * Linear Regression
    * Logistic Regression
    * Polynomial Regression
    * Ridge/Lasso Regression
    * Simple Decision Trees

## Solution LLD

**Frontend**

We need a new screen in Search Relevance Workbench to start model training. User needs to input following information:

* index (existing OpenSearch index with ingested data)
* id for following entities that needs to be imported beforehand 
    * user queries
    * search configuration
    * judgments
* any model related information (may be not needed if we go with a simplest for of using a single model type)

Optimal way is to re-use existing Hybrid Search Optimizer Experiment screen. We can add “Optimization” mode section with two mutual exclusive options - “Global” which is what we have today and it will be selected by default. Second option is new “Dynamic” mode.

Following are mocks for new UI.

This would be Hybrid Search Optimizer Experiment initial screen. “Global” optimization mode has been pre-selected.

<img width="3144" height="1504" alt="Image" src="https://github.com/user-attachments/assets/ce0d374a-0bbc-4ac3-b840-760b77c50427" />

This is how screen changes when user selects Dynamic mode for optimization:

<img width="3352" height="1698" alt="Image" src="https://github.com/user-attachments/assets/79839b6c-cdea-426b-a484-4323ca093f52" />

### **Backend**

In Search Relevance Workbench backend we need to add following components:

* modify Experiment API in Search Relevance Workbench for training the model. This is async API already, and this is perfect fit because model training can be long running (~10 mins for linear regression model used in POC) and most likely times out. Model parameters are stored in the cluster metadata at the end of the training. We keep minimal record in Experiment index to allow user monitor the training progress.
* new search processor that will identify if incoming query is hybrid query with dynamic optimization flag, and in such scenario will extract query features and call embedded scorer to predict weights. Those weights are set in context of pipeline
* modifications in existing normalization processor, it needs to read the predicted weights and apply them in the process of score normalization and combination. If for some reasons that cannot be done system falls back to static weights provided as part of the pipeline.

Following are details for each of those initial version items 

#### **Model training**

In Search Relevance Workbench backend we use existing API **experiments**. 

for a simple case in initial version we can use simplified format omitting parameters that have only one possible value

```
PUT /_plugins/search_relevance/experiments
{
    "querySetId": "{{query_set_id}}",
    "searchConfigurationList": ["{{hybrid_search_config_id}}"],
    "size": 10,
    "judgmentList": ["{{judgment_list_id_1}}"],
    "type": "HYBRID_OPTIMIZER", 
    "optimizationMode": "dynamic"
}
```

|Parameter name	|Type	|Description	|Default value	|
|---	|---	|---	|---	|
|optimizationMode	|keyword	|defines the experiment type, allowed values: global | dynamic	|global	|
|	|	|	|	|

**Sample response** 

```
{
    "experimentId": "{{experimentId}}",
    "modelId": "{{gneratedModelId}}"
    "status": "CREATED"
}
```


To effectively run model training we need to do following steps:

* split training workload into reasonably small tasks
* run few tasks in parallel and schedule the rest of tasks using task queue
* keep draining that task queue until all tasks are executed
* finalize training results
* reduce model training results into form that can be saved into cluster state

We use existing scheduling framework in Search Relevance Workbench to schedule smaller training tasks and keep in-memory queue of pending tasks. 

Based on the existing mapping, needed extension is minimal. Model id can be stored as part of the experiment “results” structure.

```
{
  "properties": {
    "id": { "type": "keyword" },
    "timestamp": { "type": "date", "format": "strict_date_time" },
    "type": { "type": "keyword" },
    "status": { "type": "keyword" },
    "querySetId": { "type": "keyword" },
    "searchConfigurationList": { "type": "keyword" },
    "judgmentList": { "type": "keyword" },
    "size": {"type": "keyword"},
    "results": { "type": "object", "dynamic": false },
    "optimizationMode": { "type": "keyword" }
  }
}
```


**Questions for later versions**

* effective retry strategies for failed training sub-tasks (exponential backoff with limited retries) 
* keep count of failed training sub-tasks. If number crosses critical level threshold cancel training and mark whole process as failed

#### **Embedded Scorer**

This component is responsible for loading model parameters and spin up an java representation of the model. It can be implemented as part of phase result processor with following responsibilities:

* identify if incoming query is a hybrid query
* read model parameters from cluster state
* extract features from the incoming hybrid query
* predict weights based on extracted features and model parameters 

#### Dynamic query weights in normalization processor

Existing normalization processor need following changes

* if present, identify what’s the type of each sub-query (lexical vs semantic vs generic)
* pass predicted weights to scores combiner, where they are applied to normalized scores and final document score is calculated

For both components we can utilize existing Normalization processor. Only change in interface that’s needed is adding a model id for weights prediction. Following request example showing hybrid query with inline definition of search pipeline:


```
{
    "query": {
        "hybrid": {
            "queries": [
                {"match":},
                {"neural":}
            ]
        }
    },
    {
        "search_pipeline": {
        "description": "Hybrid search with ML-based weight optimization",
        "phase_results_processors": [
            {
                "normalization-processor": {
                "normalization": {
                    "technique": "min_max"
                },
                "combination": {
                    "technique": "arithmetic_mean"
                },
                "weight_prediction": {
                    "model_id": "{{model_id}}"
                }
            }
        ]
      }
   }
}
```

To identify the query class (lexical/semantical) we can prepare map of query types. As weak alternative we can request this information from user (not preferred as it relies on user’s expertise and good intentions).

## Backward Compatibility

This is new feature, no major concerns regarding BWC. Only potential point for concern is **optimizationMode** in experiments API. If this field is not provided we consider experiment a global optimization

We assume that for this feature following areas in Search Relevance Workbench and Neural Search remain stable:

* query set
* search configuration
* judgment ratings
* normalization processor

## Security

Main concerning area is APIs, there is where we accept user input. Initial scope is limited in terms of information we accept with request, these are mainly ids of existing system entities and text information like model id or model description. Impact of malicious input for those parameters can be minimized by following best practices and adding strict validation of parameters like length of the string, existence of system entities with provided ids etc. 
Access control for new API will be same as for other existing APIs in Search Relevance Workbench. 

## Benchmarking

Quality of predictions can be evaluated using existing tools for checking relevance metrics, they are based on BEIR datasets and corresponding evaluation tools in their repository. Team can use customized version of those tools https://github.com/martin-gaievski/info-retrieval-test/tree/dynamic_optimizer_feature_eng_esci_dataset. As a dataset for evaluation we recommend to use ESCI dataset (Amazon product search) https://github.com/amazon-science/esci-data. 

At high level we run search workload using globally predicted weights and compare them with results based on dynamically predicted weights. We use main relevance metrics to compare model effectiveness: NDCG, Recall, Precision, MAP.

## References

* Blog “Optimizing hybrid search in OpenSearch” https://opensearch.org/blog/hybrid-search-optimization/
* Hybrid Optimizer in Search Relevance Workbench https://docs.opensearch.org/latest/search-plugins/search-relevance/optimize-hybrid-search/
* RFC for Dynamic Hybrid Search Optimization in Search Relevance https://github.com/opensearch-project/search-relevance/issues/206
* RFC for Hybrid Optimizer in neural search https://github.com/opensearch-project/neural-search/issues/934

## Feedback Required

#### Feature engineering priorities: what query and result features would be most valuable for your use cases?

We've identified basic query features (length, token count, special characters) and search result features (BM25 scores, semantic similarities) for the initial framework. However, different domains likely benefit from different feature sets.

* What domain-specific features have you found effective for search relevance?
* Are there query characteristics (e.g., intent classification, entity recognition) that significantly impact optimal weight selection in your applications?
* How do you balance feature richness against inference latency requirements?

#### Query text consistency requirements: is the requirement for identical query text across sub-queries too restrictive for your hybrid search implementations?

Our current design requires that all sub-queries (lexical, semantic, etc.) use identical query text to enable consistent feature extraction. This simplifies the initial framework but may limit real-world applicability.

* Do your hybrid queries typically use the same text across sub-queries, or do you often modify text for different query types?
* Would support for query text variations (with more complex feature extraction) be worth the added implementation complexity?
* Are there alternative approaches to feature extraction that could handle query text differences while maintaining prediction accuracy?





Criteria	Option 1: Embedded Scorer (Recommended)	Option 2: External Simple Model	Option 3: External LLM
Performance characteristics
Latency	Low - Co-located with neural-search plugin	Medium - Network calls required	High - LLM inference time plus network overhead
Throughput	High	Medium - Limited by external service	Low - Potential throttling from LLM service
Resource utilization	Low - Minimal overhead	Medium	High - LLMs require significant resources

Implementation
Complexity	Medium - Need to convert models to Java code	Low - Uses standard model interfaces	Low - Uses standard LLM APIs
Model types supported	Limited - Primarily linear regression	High - Any supported model type	High - LLMs with prompt engineering
Feature engineering effort	High - Careful feature selection needed	High - Same as Option 1	Low - LLM can process raw queries

Operational considerations
Managed environment compatibility	Yes	Limited - Depends on connector	Limited - Depends on connector
External dependencies	None	Required - Model hosting service	Required - LLM API service
Model management	Complex - Need persistence mechanism	Simple - Managed externally	Simple - Managed externally
Infrastructure requirements	Minimal	Moderate - Model hosting	High - LLM infrastructure

Capabilities
Model sophistication	Basic	Moderate	Advanced
Adaptability to query variations	Limited	Limited	High - LLMs handle text variations well
Contextualization	Low	Low	High - Can understand query intent
Feature utilization	Limited to engineered features	Limited to engineered features	Can extract features from raw text

Constraints
Query text consistency requirements	High - Text must be same between sub-queries	High - Text must be same between sub-queries	Medium - More tolerant of variations
Setup complexity	Low	Medium - Requires connector setup	High - LLM integration and prompt engineering
Maintainability	Medium - Need to update embedded code	High - External model updates are seamless	High - LLM updates managed by provider
Error handling complexity	High - Internal errors harder to debug	Medium	Medium

Model Type	Accuracy (NDCG@10)	Training Time	Inference Latency	Interpretability	Implementation Complexity	POC Suitability
Linear Regression	0.82	<1 sec	<5ms	High	Simple	Excellent
Random Forest	0.87	5-10 sec	15-20ms	Medium	Moderate	Good
Neural Network	0.89	30-60 sec	25-30ms	Low	Complex	Poor
XGBoost	0.88	10-15 sec	10-15ms	Low-Medium	Moderate	Fair

Parameter name	Type	Description	Default value
optimizationMode	keyword	defines the experiment type, allowed values: global	dynamic

[RFC] HLD for Dynamic Weight Optimization for Hybrid Search #223

Description

Dynamic Weight Optimization for Hybrid Search

Introduction

Problem Statement

Requirements

Functional Requirements

Non functional requirements

Out of Scope

Current State

Solution Overview

Option 1: Embedded scorer [Recommended]

High level workflow

Option 2: External simple model

Option 3: External LLM

Solution Comparison

Key Design Decisions

Open Questions

Short Term/Mid Term/Long Term implementation

Potential Issues

Known limitations and Future extensions

Solution LLD

Backend

Model training

Embedded Scorer

Dynamic query weights in normalization processor

Backward Compatibility

Security

Benchmarking

References

Feedback Required

Feature engineering priorities: what query and result features would be most valuable for your use cases?

Query text consistency requirements: is the requirement for identical query text across sub-queries too restrictive for your hybrid search implementations?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions