[BUG] Serialization issues when used with ML pipelines

### What is the bug?
json serialize error whenever the ext param is included

### How can one reproduce the bug?
Call a pipeline like this:

```
POST test_index/_search
{
  "size": 5,
  "query": {
    "bool": {
      "must": { "multi_match": { "query": "test", "fields": ["title^4", "summary^2"] }},
      "filter": [{ "sltr": 
      { "_name": "logged_features", 
      "featureset": "test_JN_ltr_features", 
      "params": { 
        "keywords": "test query",
        "query_vector":[<big vector>]
        
      }}}]
    }
  },
  "search_pipeline":{
    "request_processors": [
      {
        "ml_inference": {
          "model_id": <model id>,
          "model_input": """{ "parameters": { "input": [ "${input_map.query_text}" ] } }""",
          "input_map": [
            {
              "query_text": "query.bool.filter[0].sltr.params.keywords"
            }
          ],
          "output_map": [
            {
              "query.bool.filter[0].sltr.params.query_vector": "$.inference_results[0].output[0].data"
            }
          ],
          "ignore_failure":true
        }
      }
    ]
},
  "ext": { 
    "ltr_log": { 
      "log_specs": { "name": "ltr_features", "named_query": "logged_features"}
    }
  },
  "_source": ["id","title"]
}
```

Results in this error:

```json
{
  "error": {
    "root_cause": [
      {
        "type": "exception",
        "reason": "com.fasterxml.jackson.core.JsonGenerationException: Can not start an object, expecting field name (context: Object)"
      }
    ],
    "type": "exception",
    "reason": "com.fasterxml.jackson.core.JsonGenerationException: Can not start an object, expecting field name (context: Object)",
    "caused_by": {
      "type": "json_generation_exception",
      "reason": "Can not start an object, expecting field name (context: Object)",
      "suppressed": [
        {
          "type": "illegal_state_exception",
          "reason": "Failed to close the XContentBuilder",
          "caused_by": {
            "type": "i_o_exception",
            "reason": "Unclosed object or array found"
          }
        }
      ]
    }
  },
  "status": 500
}
```

### What is the expected behavior?
_A clear and concise description of what you expected to happen._

### What is your host/environment?
_Operating system, version._

### Do you have any screenshots?
_If applicable, add screenshots to help explain your problem._

### Do you have any additional context?
An AI analysis of the issue:

> 
This Is a Bug in the LTR Plugin                                                                                                                                                     
  When OpenSearch's search pipeline re-serializes the ext section, it may be iterating over extensions and calling toXContent in a context where a field name is expected (inside an  
  object), but LoggingSearchExtBuilder immediately tries to start an object (a value), causing the JSON generation error.                                                             
                                                                                                                                                                                      
  Why It Works Without the Request Processor                                                                                                                                          
                                                                                                                                                                                      
  Without the ml_inference processor, OpenSearch doesn't need to re-serialize the search source - it parses it once and executes. With the processor, it must:                        
  1. Parse the request                                                                                                                                                                
  2. Serialize it to modify the path                                                                                                                                                  
  3. Re-parse the modified request                                                                                                                                                    
                                                                                                                                                                                      
  The re-serialization triggers a different code path that exposes this bug.                                                                                                          
                                                                                                                                                                                                                                                                         
  The fix would be to modify `LoggingSearchExtBuilder.toXContent` to include its own field name:                                                                                        
                  
```java                                                                                                                                                                    
  public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {                                                                                      
      builder.startObject(NAME);  // "ltr_log"                                                                                                                                        
      builder.field(LOG_SPECS.getPreferredName(), logSpecs);                                                                                                                          
      return builder.endObject();                                                                                                                                                     
  }                                                                                                                                                                                   
```
                                            
 Since the plugin is prepackaged, here are workarounds you can use within your query/workflow:                                                                                       
                                                                                                                                                                                      
  Option 1: Two-Step Query (Recommended)                                                                                                                                              
                                                                                                                                                                                      
  Split into two API calls - first get the embedding, then search with logging:                                                                                                       
                                                                                                                                                                                      
  - Step 1: Get embedding          
```python                                                                                                                                                   
  embedding_response = client.post("/_plugins/_ml/models/F7qwB5sB_J7OHdiaVl_R/_predict", {                                                                                            
      "parameters": {"input": ["ai"]}                                                                                                                                                 
  })                                                                                                                                                                                  
  query_vector = embedding_response["inference_results"][0]["output"][0]["data"]                                                                                                      
```
                                                                                                                                                                                      
 - Step 2: Search with LTR logging (no request processor needed)      
```python                                                                                                               
  search_response = client.post("gartner-askgartner-20251210-v2/_search", {                                                                                                           
      "size": 5,                                                                                                                                                                      
      "query": {                                                                                                                                                                      
          "bool": {                                                                                                                                                                   
              "must": {"multi_match": {"query": "ai", "fields": ["title^4"]}},                                                                                                        
              "filter": [{"sltr": {                                                                                                                                                   
                  "_name": "logged_features",                                                                                                                                         
                  "featureset": "test_JN_ltr_features_ag",                                                                                                                            
                  "params": {                                                                                                                                                         
                      "query_vector": query_vector,                                                                                                                                   
                      "keywords": "ai"                                                                                                                                                
                  }                                                                                                                                                                   
              }}]                                                                                                                                                                     
          }                                                                                                                                                                           
      },                                                                                                                                                                              
      "ext": {"ltr_log": {"log_specs": {"name": "logged_features", "named_query": "logged_features"}}},                                                                               
      "_source": ["resId", "title", "authorNames", "vendorPrimary"]                                                                                                                   
  })                                                                                                                                                                                  
```                                                                                                                                                                        
  Option 2: Use Request Processor Without Logging                                                                                                                                     
                                                                                                                                                                                      
  If you only need logging during training/evaluation (not production), use the pipeline for production queries and the two-step approach when you need feature logs.                 
                                                                                                                                                                                      
  Option 3: Response Processor for Logging (if available)                                                                                                                             
                                                                                                                                                                                      
  If OpenSearch ML supports a response processor that can extract feature scores, you might be able to log features after the query executes rather than using ltr_log. However, this 
  depends on your OpenSearch version and available processors.                                                                                                                        
                                                                                                                                                                                      
  ---                                                                                                                                                                                 
  Option 1 is the most reliable - it completely avoids the serialization bug by not mixing the request processor with ltr_log. The overhead of an extra API call is minimal compared  
  to the search itself.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Serialization issues when used with ML pipelines #289

What is the bug?

How can one reproduce the bug?

What is the expected behavior?

What is your host/environment?

Do you have any screenshots?

Do you have any additional context?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Serialization issues when used with ML pipelines #289

Description

What is the bug?

How can one reproduce the bug?

What is the expected behavior?

What is your host/environment?

Do you have any screenshots?

Do you have any additional context?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions