From b2aa8bf53c0c5afdd803f79cca54e3482afde0a4 Mon Sep 17 00:00:00 2001 From: Pavan Yekbote Date: Wed, 1 Oct 2025 17:22:51 -0700 Subject: [PATCH 01/12] feat: add details about tool output processors Signed-off-by: Pavan Yekbote --- _ml-commons-plugin/agents-tools/index.md | 2 + .../agents-tools/output-processors.md | 302 ++++++++++++++++++ 2 files changed, 304 insertions(+) create mode 100644 _ml-commons-plugin/agents-tools/output-processors.md diff --git a/_ml-commons-plugin/agents-tools/index.md b/_ml-commons-plugin/agents-tools/index.md index e0625bde53f..db8c4c96da4 100644 --- a/_ml-commons-plugin/agents-tools/index.md +++ b/_ml-commons-plugin/agents-tools/index.md @@ -18,3 +18,5 @@ An _agent_ orchestrates and runs ML models and tools. For a list of supported ag A _tool_ performs a set of specific tasks. Some examples of tools are the [`VectorDBTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/vector-db-tool/), which supports vector search, and the [`ListIndexTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/list-index-tool/), which executes the List Indices API. For a list of supported tools, see [Tools]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index/). +You can modify and transform tool outputs using [output processors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/output-processors/). Output processors allow you to chain multiple data transformations that execute sequentially on any tool's output. + diff --git a/_ml-commons-plugin/agents-tools/output-processors.md b/_ml-commons-plugin/agents-tools/output-processors.md new file mode 100644 index 00000000000..a900b78be7f --- /dev/null +++ b/_ml-commons-plugin/agents-tools/output-processors.md @@ -0,0 +1,302 @@ +--- +layout: default +title: Output processors +parent: Agents and tools +grand_parent: ML Commons APIs +nav_order: 30 +--- + +# Output processors +**Introduced 3.3** +{: .label .label-purple } + +Output processors allow you to modify and transform the output of any tool before it's returned to the agent or user. You can chain multiple output processors together to create complex data transformation pipelines that execute sequentially. + +## Overview + +Output processors provide a powerful way to: + +- **Transform data formats**: Convert between different data structures (strings, JSON, arrays) +- **Extract specific information**: Use JSONPath or regex patterns to pull out relevant data +- **Clean and filter content**: Remove unwanted fields or apply formatting rules +- **Standardize outputs**: Ensure consistent data formats across different tools + +Each tool can have multiple output processors that execute in the order they are defined. The output of one processor becomes the input for the next processor in the chain. + +## Configuration + +Add output processors to any tool by including an `output_processors` array in the tool's `parameters` section during agent registeration: + +Example: +```json +{ + "type": "ToolName", + "parameters": { + "output_processors": [ + { + "type": "processor_type", + "parameter1": "value1", + "parameter2": "value2" + }, + { + "type": "another_processor_type", + "parameter": "value" + } + ] + } +} +``` + +### Sequential execution + +Output processors execute in the order they appear in the array. Each processor receives the output from the previous processor (or the original tool output for the first processor): + +``` +Tool Output → Processor 1 → Processor 2 → Processor 3 → Final Output +``` + +### Complete example + +**Step 1: Register a flow agent with output processors** + +```json +POST /_plugins/_ml/agents/_register +{ + "name": "Index Summary Agent", + "type": "flow", + "description": "Agent that provides clean index summaries", + "tools": [ + { + "type": "ListIndexTool", + "parameters": { + "output_processors": [ + { + "type": "regex_replace", + "pattern": "^.*?\n", + "replacement": "" + }, + { + "type": "regex_capture", + "pattern": "(\\d+,\\w+,\\w+,([^,]+))" + } + ] + } + } + ] +} +``` + +**Step 2: Execute the agent** + +Using the `agent_id` returned in the previous step: + +```json +POST /_plugins/_ml/agents/{agent_id}/_execute +{ + "parameters": { + "question": "List the indices" + } +} +``` + +**Without output processors, the raw ListIndexTool would return:** +``` +row,health,status,index,uuid,pri,rep,docs.count,docs.deleted,store.size,pri.store.size +1,green,open,.plugins-ml-model-group,DCJHJc7pQ6Gid02PaSeXBQ,1,0,1,0,12.7kb,12.7kb +2,green,open,.plugins-ml-memory-message,6qVpepfRSCi9bQF_As_t2A,1,0,7,0,53kb,53kb +3,green,open,.plugins-ml-memory-meta,LqP3QMaURNKYDZ9p8dTq3Q,1,0,2,0,44.8kb,44.8kb +``` + +**With output processors, the agent returns:** +``` +1,green,open,.plugins-ml-model-group +2,green,open,.plugins-ml-memory-message +3,green,open,.plugins-ml-memory-meta +``` + +The output processors transform the verbose CSV output into a clean, readable format by: +1. **`regex_replace`**: Removing the CSV header row +2. **`regex_capture`**: Extracting only essential information (row number, health, status, and index name) + +## Supported Output Processor Types + +### to_string + +Converts the input to a JSON string representation. + +**Parameters:** +- `escape_json` (boolean, optional): Whether to escape JSON characters. Default: `false` + +**Configuration:** +```json +{ + "type": "to_string", + "escape_json": true +} +``` + +**Input/Output Example:** +``` +Input: {"name": "test", "value": 123} +Output: "{\"name\":\"test\",\"value\":123}" +``` + +### regex_replace + +Replaces text using regular expression patterns. + +**Parameters:** +- `pattern` (string, required): Regular expression pattern to match +- `replacement` (string, optional): Replacement text. Default: `""` +- `replace_all` (boolean, optional): Whether to replace all matches or just the first. Default: `true` + +**Configuration:** +```json +{ + "type": "regex_replace", + "pattern": "ERROR", + "replacement": "WARNING", + "replace_all": true +} +``` + +**Input/Output Example:** +``` +Input: "ERROR: Connection failed. ERROR: Timeout occurred." +Output: "WARNING: Connection failed. WARNING: Timeout occurred." +``` + +### jsonpath_filter + +Extracts data using JSONPath expressions. + +**Parameters:** +- `path` (string, required): JSONPath expression to extract data +- `default` (any, optional): Default value if path is not found + +**Configuration:** +```json +{ + "type": "jsonpath_filter", + "path": "$.data.items[*].name", + "default": [] +} +``` + +**Input/Output Example:** +``` +Input: {"data": {"items": [{"name": "item1"}, {"name": "item2"}]}} +Output: ["item1", "item2"] +``` + +### extract_json + +Extracts JSON objects or arrays from text strings. + +**Parameters:** +- `extract_type` (string, optional): Type of JSON to extract - `"object"`, `"array"`, or `"auto"`. Default: `"auto"` +- `default` (any, optional): Default value if JSON extraction fails + +**Configuration:** +```json +{ + "type": "extract_json", + "extract_type": "object", + "default": {} +} +``` + +**Input/Output Example:** +``` +Input: "The result is: {\"status\": \"success\", \"count\": 5} - processing complete" +Output: {"status": "success", "count": 5} +``` + +### regex_capture + +Captures specific groups from regex matches. + +**Parameters:** +- `pattern` (string, required): Regular expression pattern with capture groups +- `groups` (string or array, optional): Group numbers to capture. Can be a single number like `"1"` or array like `"[1, 2, 4]"`. Default: `"1"` + +**Configuration:** +```json +{ + "type": "regex_capture", + "pattern": "(\\d+),(\\w+),(\\w+),([^,]+)", + "groups": "[1, 4]" +} +``` + +**Input/Output Example:** +``` +Input: "1,green,open,.plugins-ml-model-group,DCJHJc7pQ6Gid02PaSeXBQ,1,0" +Output: ["1", ".plugins-ml-model-group"] +``` + +### remove_jsonpath + +Removes fields from JSON objects using JSONPath. + +**Parameters:** +- `path` (string, required): JSONPath expression identifying fields to remove + +**Configuration:** +```json +{ + "type": "remove_jsonpath", + "path": "$.sensitive_data" +} +``` + +**Input/Output Example:** +``` +Input: {"name": "user1", "sensitive_data": "secret", "public_info": "visible"} +Output: {"name": "user1", "public_info": "visible"} +``` + +### conditional + +Applies different processor chains based on conditions. + +**Parameters:** +- `path` (string, optional): JSONPath to extract value for condition evaluation +- `routes` (array, required): Array of condition-processor mappings +- `default` (array, optional): Default processors if no conditions match + +**Supported conditions:** +- Exact value match: `"value"` +- Numeric comparisons: `">10"`, `"<5"`, `">=", `"<="`, `"==5"` +- Existence checks: `"exists"`, `"null"`, `"not_exists"` +- Regex matching: `"regex:pattern"` +- Contains text: `"contains:substring"` + +**Configuration:** +```json +{ + "type": "conditional", + "path": "$.status", + "routes": [ + { + "green": [ + {"type": "regex_replace", "pattern": "status", "replacement": "healthy"} + ] + }, + { + "red": [ + {"type": "regex_replace", "pattern": "status", "replacement": "unhealthy"} + ] + } + ], + "default": [ + {"type": "regex_replace", "pattern": "status", "replacement": "unknown"} + ] +} +``` + +**Input/Output Example:** +``` +Input: {"index": "test-index", "status": "green", "docs": 100} +Output: {"index": "test-index", "healthy": "green", "docs": 100} +``` \ No newline at end of file From a7a17d5f1d1443a92a5085948837d487c5cabf6c Mon Sep 17 00:00:00 2001 From: Pavan Yekbote Date: Wed, 1 Oct 2025 19:49:54 -0700 Subject: [PATCH 02/12] address comments and move sections Signed-off-by: Pavan Yekbote --- .../agents-tools/output-processors.md | 233 ++++++++---------- 1 file changed, 107 insertions(+), 126 deletions(-) diff --git a/_ml-commons-plugin/agents-tools/output-processors.md b/_ml-commons-plugin/agents-tools/output-processors.md index a900b78be7f..a1a38125e9e 100644 --- a/_ml-commons-plugin/agents-tools/output-processors.md +++ b/_ml-commons-plugin/agents-tools/output-processors.md @@ -23,30 +23,6 @@ Output processors provide a powerful way to: Each tool can have multiple output processors that execute in the order they are defined. The output of one processor becomes the input for the next processor in the chain. -## Configuration - -Add output processors to any tool by including an `output_processors` array in the tool's `parameters` section during agent registeration: - -Example: -```json -{ - "type": "ToolName", - "parameters": { - "output_processors": [ - { - "type": "processor_type", - "parameter1": "value1", - "parameter2": "value2" - }, - { - "type": "another_processor_type", - "parameter": "value" - } - ] - } -} -``` - ### Sequential execution Output processors execute in the order they appear in the array. Each processor receives the output from the previous processor (or the original tool output for the first processor): @@ -55,68 +31,11 @@ Output processors execute in the order they appear in the array. Each processor Tool Output → Processor 1 → Processor 2 → Processor 3 → Final Output ``` -### Complete example - -**Step 1: Register a flow agent with output processors** - -```json -POST /_plugins/_ml/agents/_register -{ - "name": "Index Summary Agent", - "type": "flow", - "description": "Agent that provides clean index summaries", - "tools": [ - { - "type": "ListIndexTool", - "parameters": { - "output_processors": [ - { - "type": "regex_replace", - "pattern": "^.*?\n", - "replacement": "" - }, - { - "type": "regex_capture", - "pattern": "(\\d+,\\w+,\\w+,([^,]+))" - } - ] - } - } - ] -} -``` - -**Step 2: Execute the agent** - -Using the `agent_id` returned in the previous step: +## Configuration -```json -POST /_plugins/_ml/agents/{agent_id}/_execute -{ - "parameters": { - "question": "List the indices" - } -} -``` +Add output processors to any tool by including an `output_processors` array in the tool's `parameters` section during agent registeration. -**Without output processors, the raw ListIndexTool would return:** -``` -row,health,status,index,uuid,pri,rep,docs.count,docs.deleted,store.size,pri.store.size -1,green,open,.plugins-ml-model-group,DCJHJc7pQ6Gid02PaSeXBQ,1,0,1,0,12.7kb,12.7kb -2,green,open,.plugins-ml-memory-message,6qVpepfRSCi9bQF_As_t2A,1,0,7,0,53kb,53kb -3,green,open,.plugins-ml-memory-meta,LqP3QMaURNKYDZ9p8dTq3Q,1,0,2,0,44.8kb,44.8kb -``` - -**With output processors, the agent returns:** -``` -1,green,open,.plugins-ml-model-group -2,green,open,.plugins-ml-memory-message -3,green,open,.plugins-ml-memory-meta -``` - -The output processors transform the verbose CSV output into a clean, readable format by: -1. **`regex_replace`**: Removing the CSV header row -2. **`regex_capture`**: Extracting only essential information (row number, health, status, and index name) +For a complete example, see [Example usage with agents](#example-usage-with-agents). ## Supported Output Processor Types @@ -127,7 +46,7 @@ Converts the input to a JSON string representation. **Parameters:** - `escape_json` (boolean, optional): Whether to escape JSON characters. Default: `false` -**Configuration:** +**Example Configuration:** ```json { "type": "to_string", @@ -135,7 +54,7 @@ Converts the input to a JSON string representation. } ``` -**Input/Output Example:** +**Example Input/Output:** ``` Input: {"name": "test", "value": 123} Output: "{\"name\":\"test\",\"value\":123}" @@ -143,27 +62,49 @@ Output: "{\"name\":\"test\",\"value\":123}" ### regex_replace -Replaces text using regular expression patterns. +Replaces text using regular expression patterns. For regex syntax details, see [OpenSearch regex syntax](https://docs.opensearch.org/latest/query-dsl/regex-syntax/). **Parameters:** - `pattern` (string, required): Regular expression pattern to match - `replacement` (string, optional): Replacement text. Default: `""` - `replace_all` (boolean, optional): Whether to replace all matches or just the first. Default: `true` -**Configuration:** +**Example Configuration:** ```json { "type": "regex_replace", - "pattern": "ERROR", - "replacement": "WARNING", - "replace_all": true + "pattern": "^.*?\n", + "replacement": "" +} +``` + +**Example Input/Output:** +``` +Input: "row,health,status,index\n1,green,open,.plugins-ml-model\n2,red,closed,test-index" +Output: "1,green,open,.plugins-ml-model\n2,red,closed,test-index" +``` + +### regex_capture + +Captures specific groups from regex matches. For regex syntax details, see [OpenSearch regex syntax](https://docs.opensearch.org/latest/query-dsl/regex-syntax/). + +**Parameters:** +- `pattern` (string, required): Regular expression pattern with capture groups +- `groups` (string or array, optional): Group numbers to capture. Can be a single number like `"1"` or array like `"[1, 2, 4]"`. Default: `"1"` + +**Example Configuration:** +```json +{ + "type": "regex_capture", + "pattern": "(\\d+),(\\w+),(\\w+),([^,]+)", + "groups": "[1, 4]" } ``` -**Input/Output Example:** +**Example Input/Output:** ``` -Input: "ERROR: Connection failed. ERROR: Timeout occurred." -Output: "WARNING: Connection failed. WARNING: Timeout occurred." +Input: "1,green,open,.plugins-ml-model-group,DCJHJc7pQ6Gid02PaSeXBQ,1,0" +Output: ["1", ".plugins-ml-model-group"] ``` ### jsonpath_filter @@ -174,7 +115,7 @@ Extracts data using JSONPath expressions. - `path` (string, required): JSONPath expression to extract data - `default` (any, optional): Default value if path is not found -**Configuration:** +**Example Configuration:** ```json { "type": "jsonpath_filter", @@ -183,7 +124,7 @@ Extracts data using JSONPath expressions. } ``` -**Input/Output Example:** +**Example Input/Output:** ``` Input: {"data": {"items": [{"name": "item1"}, {"name": "item2"}]}} Output: ["item1", "item2"] @@ -197,7 +138,7 @@ Extracts JSON objects or arrays from text strings. - `extract_type` (string, optional): Type of JSON to extract - `"object"`, `"array"`, or `"auto"`. Default: `"auto"` - `default` (any, optional): Default value if JSON extraction fails -**Configuration:** +**Example Configuration:** ```json { "type": "extract_json", @@ -206,35 +147,12 @@ Extracts JSON objects or arrays from text strings. } ``` -**Input/Output Example:** +**Example Input/Output:** ``` Input: "The result is: {\"status\": \"success\", \"count\": 5} - processing complete" Output: {"status": "success", "count": 5} ``` -### regex_capture - -Captures specific groups from regex matches. - -**Parameters:** -- `pattern` (string, required): Regular expression pattern with capture groups -- `groups` (string or array, optional): Group numbers to capture. Can be a single number like `"1"` or array like `"[1, 2, 4]"`. Default: `"1"` - -**Configuration:** -```json -{ - "type": "regex_capture", - "pattern": "(\\d+),(\\w+),(\\w+),([^,]+)", - "groups": "[1, 4]" -} -``` - -**Input/Output Example:** -``` -Input: "1,green,open,.plugins-ml-model-group,DCJHJc7pQ6Gid02PaSeXBQ,1,0" -Output: ["1", ".plugins-ml-model-group"] -``` - ### remove_jsonpath Removes fields from JSON objects using JSONPath. @@ -242,7 +160,7 @@ Removes fields from JSON objects using JSONPath. **Parameters:** - `path` (string, required): JSONPath expression identifying fields to remove -**Configuration:** +**Example Configuration:** ```json { "type": "remove_jsonpath", @@ -250,7 +168,7 @@ Removes fields from JSON objects using JSONPath. } ``` -**Input/Output Example:** +**Example Input/Output:** ``` Input: {"name": "user1", "sensitive_data": "secret", "public_info": "visible"} Output: {"name": "user1", "public_info": "visible"} @@ -272,7 +190,7 @@ Applies different processor chains based on conditions. - Regex matching: `"regex:pattern"` - Contains text: `"contains:substring"` -**Configuration:** +**Example Configuration:** ```json { "type": "conditional", @@ -295,8 +213,71 @@ Applies different processor chains based on conditions. } ``` -**Input/Output Example:** +**Example Input/Output:** ``` Input: {"index": "test-index", "status": "green", "docs": 100} Output: {"index": "test-index", "healthy": "green", "docs": 100} -``` \ No newline at end of file +``` + +### Example usage with agents + +**Step 1: Register a flow agent with output processors** + +```json +POST /_plugins/_ml/agents/_register +{ + "name": "Index Summary Agent", + "type": "flow", + "description": "Agent that provides clean index summaries", + "tools": [ + { + "type": "ListIndexTool", + "parameters": { + "output_processors": [ + { + "type": "regex_replace", + "pattern": "^.*?\n", + "replacement": "" + }, + { + "type": "regex_capture", + "pattern": "(\\d+,\\w+,\\w+,([^,]+))" + } + ] + } + } + ] +} +``` + +**Step 2: Execute the agent** + +Using the `agent_id` returned in the previous step: + +```json +POST /_plugins/_ml/agents/{agent_id}/_execute +{ + "parameters": { + "question": "List the indices" + } +} +``` + +**Without output processors, the raw ListIndexTool would return:** +``` +row,health,status,index,uuid,pri,rep,docs.count,docs.deleted,store.size,pri.store.size +1,green,open,.plugins-ml-model-group,DCJHJc7pQ6Gid02PaSeXBQ,1,0,1,0,12.7kb,12.7kb +2,green,open,.plugins-ml-memory-message,6qVpepfRSCi9bQF_As_t2A,1,0,7,0,53kb,53kb +3,green,open,.plugins-ml-memory-meta,LqP3QMaURNKYDZ9p8dTq3Q,1,0,2,0,44.8kb,44.8kb +``` + +**With output processors, the agent returns:** +``` +1,green,open,.plugins-ml-model-group +2,green,open,.plugins-ml-memory-message +3,green,open,.plugins-ml-memory-meta +``` + +The output processors transform the verbose CSV output into a clean, readable format by: +1. **`regex_replace`**: Removing the CSV header row +2. **`regex_capture`**: Extracting only essential information (row number, health, status, and index name) From 1b3bc4201749d79db268165b8723291585606394 Mon Sep 17 00:00:00 2001 From: Pavan Yekbote Date: Mon, 6 Oct 2025 17:54:16 -0700 Subject: [PATCH 03/12] add doc about 2 more processors Signed-off-by: Pavan Yekbote --- .../agents-tools/output-processors.md | 81 ++++++++++++++++++- 1 file changed, 80 insertions(+), 1 deletion(-) diff --git a/_ml-commons-plugin/agents-tools/output-processors.md b/_ml-commons-plugin/agents-tools/output-processors.md index a1a38125e9e..f0bcd9af12b 100644 --- a/_ml-commons-plugin/agents-tools/output-processors.md +++ b/_ml-commons-plugin/agents-tools/output-processors.md @@ -37,7 +37,9 @@ Add output processors to any tool by including an `output_processors` array in t For a complete example, see [Example usage with agents](#example-usage-with-agents). -## Supported Output Processor Types +## Supported output processor types + +The following processors are available for transforming tool outputs: ### to_string @@ -219,6 +221,83 @@ Input: {"index": "test-index", "status": "green", "docs": 100} Output: {"index": "test-index", "healthy": "green", "docs": 100} ``` +### process_and_set + +Applies a chain of processors to the input and sets the result at a specified JSONPath location. + +**Parameters:** +- `path` (string, required): JSONPath expression specifying where to set the processed result +- `processors` (array, required): List of processor configurations to apply sequentially + +**Path behavior:** +- If the path exists, it will be updated with the processed value +- If the path doesn't exist, attempts to create it (works for simple nested fields) +- Parent path must exist for new field creation to succeed + +**Example Configuration:** +```json +{ + "type": "process_and_set", + "path": "$.summary.clean_name", + "processors": [ + { + "type": "to_string" + }, + { + "type": "regex_replace", + "pattern": "[^a-zA-Z0-9]", + "replacement": "_" + } + ] +} +``` + +**Example Input/Output:** +``` +Input: {"name": "Test Index!", "status": "active"} +Output: {"name": "Test Index!", "status": "active", "summary": {"clean_name": "Test_Index_"}} +``` + +### set_field + +Sets a field to a specified static value or copies a value from another field. + +**Parameters:** +- `path` (string, required): JSONPath expression specifying where to set the value +- `value` (any, conditionally required): Static value to set. Either `value` or `source_path` must be provided +- `source_path` (string, conditionally required): JSONPath to copy value from. Either `value` or `source_path` must be provided +- `default` (any, optional): Default value when `source_path` doesn't exist. Only used with `source_path` + +**Path behavior:** +- If the path exists, it will be updated with the new value +- If the path doesn't exist, attempts to create it (works for simple nested fields) +- Parent path must exist for new field creation to succeed + +**Example Configuration (static value):** +```json +{ + "type": "set_field", + "path": "$.metadata.processed_at", + "value": "2024-03-15T10:30:00Z" +} +``` + +**Example Configuration (copy field):** +```json +{ + "type": "set_field", + "path": "$.userId", + "source_path": "$.user.id", + "default": "unknown" +} +``` + +**Example Input/Output:** +``` +Input: {"user": {"id": 123}, "name": "John"} +Output: {"user": {"id": 123}, "name": "John", "userId": 123, "metadata": {"processed_at": "2024-03-15T10:30:00Z"}} +``` + ### Example usage with agents **Step 1: Register a flow agent with output processors** From b3d218e24e49e3570d0fb1b218c3063277f54e14 Mon Sep 17 00:00:00 2001 From: Pavan Yekbote Date: Wed, 8 Oct 2025 12:20:10 -0700 Subject: [PATCH 04/12] add table Signed-off-by: Pavan Yekbote --- .../agents-tools/output-processors.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/_ml-commons-plugin/agents-tools/output-processors.md b/_ml-commons-plugin/agents-tools/output-processors.md index f0bcd9af12b..5a6b2546e42 100644 --- a/_ml-commons-plugin/agents-tools/output-processors.md +++ b/_ml-commons-plugin/agents-tools/output-processors.md @@ -39,7 +39,19 @@ For a complete example, see [Example usage with agents](#example-usage-with-agen ## Supported output processor types -The following processors are available for transforming tool outputs: +The following table lists all supported output processors. + +Processor | Description +:--- | :--- +[`to_string`](#to_string) | Converts the input to a JSON string representation. +[`regex_replace`](#regex_replace) | Replaces text using regular expression patterns. +[`regex_capture`](#regex_capture) | Captures specific groups from regex matches. +[`jsonpath_filter`](#jsonpath_filter) | Extracts data using JSONPath expressions. +[`extract_json`](#extract_json) | Extracts JSON objects or arrays from text strings. +[`remove_jsonpath`](#remove_jsonpath) | Removes fields from JSON objects using JSONPath. +[`conditional`](#conditional) | Applies different processor chains based on conditions. +[`process_and_set`](#process_and_set) | Applies a chain of processors to the input and sets the result at a specified JSONPath location. +[`set_field`](#set_field) | Sets a field to a specified static value or copies a value from another field. ### to_string From 4b7a4d2df0ceb452ad9faed30a32fc502614565d Mon Sep 17 00:00:00 2001 From: Pavan Yekbote Date: Wed, 8 Oct 2025 12:25:12 -0700 Subject: [PATCH 05/12] address style concerns Signed-off-by: Pavan Yekbote --- _ml-commons-plugin/agents-tools/output-processors.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/_ml-commons-plugin/agents-tools/output-processors.md b/_ml-commons-plugin/agents-tools/output-processors.md index 5a6b2546e42..738afb69886 100644 --- a/_ml-commons-plugin/agents-tools/output-processors.md +++ b/_ml-commons-plugin/agents-tools/output-processors.md @@ -33,7 +33,7 @@ Tool Output → Processor 1 → Processor 2 → Processor 3 → Final Output ## Configuration -Add output processors to any tool by including an `output_processors` array in the tool's `parameters` section during agent registeration. +Add output processors to any tool by including an `output_processors` array in the tool's `parameters` section during agent registration. For a complete example, see [Example usage with agents](#example-usage-with-agents). @@ -58,7 +58,7 @@ Processor | Description Converts the input to a JSON string representation. **Parameters:** -- `escape_json` (boolean, optional): Whether to escape JSON characters. Default: `false` +- `escape_json` (Boolean, optional): Whether to escape JSON characters. Default: `false` **Example Configuration:** ```json @@ -76,12 +76,12 @@ Output: "{\"name\":\"test\",\"value\":123}" ### regex_replace -Replaces text using regular expression patterns. For regex syntax details, see [OpenSearch regex syntax](https://docs.opensearch.org/latest/query-dsl/regex-syntax/). +Replaces text using regular expression patterns. For regex syntax details, see [OpenSearch regex syntax]({{site.url}}{{site.baseurl}}/query-dsl/regex-syntax/). **Parameters:** - `pattern` (string, required): Regular expression pattern to match - `replacement` (string, optional): Replacement text. Default: `""` -- `replace_all` (boolean, optional): Whether to replace all matches or just the first. Default: `true` +- `replace_all` (Boolean, optional): Whether to replace all matches or only the first. Default: `true` **Example Configuration:** ```json @@ -100,7 +100,7 @@ Output: "1,green,open,.plugins-ml-model\n2,red,closed,test-index" ### regex_capture -Captures specific groups from regex matches. For regex syntax details, see [OpenSearch regex syntax](https://docs.opensearch.org/latest/query-dsl/regex-syntax/). +Captures specific groups from regex matches. For regex syntax details, see [OpenSearch regex syntax]({{site.url}}{{site.baseurl}}/query-dsl/regex-syntax/). **Parameters:** - `pattern` (string, required): Regular expression pattern with capture groups From be406fbce4c9498e582e5ce07cbe023ad22a952a Mon Sep 17 00:00:00 2001 From: Pavan Yekbote Date: Fri, 10 Oct 2025 13:26:57 -0700 Subject: [PATCH 06/12] feat: change to processor chain, add details about models and add to api spec Signed-off-by: Pavan Yekbote --- .../agents-tools/output-processors.md | 181 ++++++++++++++++-- .../api/agent-apis/register-agent.md | 1 + .../api/train-predict/predict.md | 12 ++ 3 files changed, 176 insertions(+), 18 deletions(-) diff --git a/_ml-commons-plugin/agents-tools/output-processors.md b/_ml-commons-plugin/agents-tools/output-processors.md index 738afb69886..dc1c45973af 100644 --- a/_ml-commons-plugin/agents-tools/output-processors.md +++ b/_ml-commons-plugin/agents-tools/output-processors.md @@ -1,45 +1,42 @@ --- layout: default -title: Output processors -parent: Agents and tools -grand_parent: ML Commons APIs +title: Processor Chain +parent: Machine learning nav_order: 30 --- -# Output processors +# Processor Chain **Introduced 3.3** {: .label .label-purple } -Output processors allow you to modify and transform the output of any tool before it's returned to the agent or user. You can chain multiple output processors together to create complex data transformation pipelines that execute sequentially. +Processor chains enable flexible data transformation pipelines that can process both input and output data. Chain multiple processors together to create sequential transformations where each processor's output becomes the next processor's input. ## Overview -Output processors provide a powerful way to: +Processors provide a powerful way to: - **Transform data formats**: Convert between different data structures (strings, JSON, arrays) - **Extract specific information**: Use JSONPath or regex patterns to pull out relevant data - **Clean and filter content**: Remove unwanted fields or apply formatting rules -- **Standardize outputs**: Ensure consistent data formats across different tools - -Each tool can have multiple output processors that execute in the order they are defined. The output of one processor becomes the input for the next processor in the chain. +- **Standardize data**: Ensure consistent data formats across different components ### Sequential execution -Output processors execute in the order they appear in the array. Each processor receives the output from the previous processor (or the original tool output for the first processor): - -``` -Tool Output → Processor 1 → Processor 2 → Processor 3 → Final Output -``` +Processors execute in the order they appear in the array. Each processor receives the output from the previous processor. ## Configuration -Add output processors to any tool by including an `output_processors` array in the tool's `parameters` section during agent registration. +Processors can be configured in different contexts: -For a complete example, see [Example usage with agents](#example-usage-with-agents). +- **Tool outputs**: Add an `output_processors` array in the tool's `parameters` section +- **Model ouputs**: Add an `ouput_processors` array in the model's `parameters` section during a `_predict` call +- **Model inputs**: Add an `input_processors` array in the model's `parameters` section of a `_predict` call -## Supported output processor types +For complete examples, see [Example usage with agents](#example-usage-with-agents) and [Example usage with models](#example-usage-with-models). -The following table lists all supported output processors. +## Supported processor types + +The following table lists all supported processors. Processor | Description :--- | :--- @@ -372,3 +369,151 @@ row,health,status,index,uuid,pri,rep,docs.count,docs.deleted,store.size,pri.stor The output processors transform the verbose CSV output into a clean, readable format by: 1. **`regex_replace`**: Removing the CSV header row 2. **`regex_capture`**: Extracting only essential information (row number, health, status, and index name) + +## Example usage with models + +The following examples demonstrate how to use processor chains with models during prediction calls. + +### Input processors example + +This example shows how to modify model input using `input_processors` to replace text before processing: + +```json +POST _plugins/_ml/models/{model_id}/_predict +{ + "parameters": { + "system_prompt": "You are a helpful assistant.", + "prompt": "Can you summarize Prince Hamlet of William Shakespeare in around 100 words?", + "input_processors": [ + { + "type": "regex_replace", + "pattern": "100", + "replacement": "20" + } + ] + } +} +``` + +In this example, the `regex_replace` processor modifies the prompt before it's sent to the model, changing "100 words" to "20 words". + +### Output processors example + +This example shows how to process model output using `output_processors` to extract and format JSON data: + +```json +POST _plugins/_ml/models/{model_id}/_predict +{ + "parameters": { + "messages": [ + { + "role": "system", + "content": [ + { + "type": "text", + "text": "${parameters.system_prompt}" + } + ] + }, + { + "role": "user", + "content": [ + { + "type": "text", + "text": "Can you convert this into a json object: user name is Bob, he likes swimming" + } + ] + } + ], + "system_prompt": "You are a helpful assistant", + "output_processors": [ + { + "type": "jsonpath_filter", + "path": "$.choices[0].message.content" + }, + { + "type": "extract_json", + "extract_type": "auto" + } + ] + } +} +``` + +In this example, the output processors: +1. Extract the content from the model response using JSONPath +2. Parse and extract the JSON object from the text response + +**Without output processors, the raw response would be:** +```json +{ + "inference_results": [ + { + "output": [ + { + "name": "response", + "dataAsMap": { + "id": "test-id", + "object": "chat.completion", + "created": 1.759580469E9, + "model": "gpt-4o-mini-2024-07-18", + "choices": [ + { + "index": 0.0, + "message": { + "role": "assistant", + "content": "Sure! Here is the information you provided converted into a JSON object:\n\n```json\n{\n \"user\": {\n \"name\": \"Bob\",\n \"likes\": \"swimming\"\n }\n}\n```", + "refusal": null, + "annotations": [] + }, + "logprobs": null, + "finish_reason": "stop" + } + ], + "usage": { + "prompt_tokens": 33.0, + "completion_tokens": 42.0, + "total_tokens": 75.0, + "prompt_tokens_details": { + "cached_tokens": 0.0, + "audio_tokens": 0.0 + }, + "completion_tokens_details": { + "reasoning_tokens": 0.0, + "audio_tokens": 0.0, + "accepted_prediction_tokens": 0.0, + "rejected_prediction_tokens": 0.0 + } + }, + "service_tier": "default", + "system_fingerprint": "test-fingerprint" + } + } + ], + "status_code": 200 + } + ] +} +``` + +**With output processors, the response becomes:** +```json +{ + "inference_results": [ + { + "output": [ + { + "name": "response", + "dataAsMap": { + "user": { + "name": "Bob", + "likes": "swimming" + } + } + } + ], + "status_code": 200 + } + ] +} +``` diff --git a/_ml-commons-plugin/api/agent-apis/register-agent.md b/_ml-commons-plugin/api/agent-apis/register-agent.md index f4d2e766da6..3c8017e97cc 100644 --- a/_ml-commons-plugin/api/agent-apis/register-agent.md +++ b/_ml-commons-plugin/api/agent-apis/register-agent.md @@ -60,6 +60,7 @@ Field | Data type | Required/Optional | Description `name`| String | Optional | The tool name. The tool name defaults to the `type` parameter value. If you need to include multiple tools of the same type in an agent, specify different names for the tools. | `description`| String | Optional | The tool description. Defaults to a built-in description for the specified type. | `parameters` | Object | Optional | The parameters for this tool. The parameters are highly dependent on the tool type. You can find information about specific tool types in [Tools]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index/). +`parameters.output_processors` | Array | Optional | A list of processors to transform the tool's output. For more information, see [Processor Chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/output-processors/). `attributes.input_schema` | Object | Optional | The expected input format for this tool defined as a [JSON schema](https://json-schema.org/). Used to define the structure the LLM should follow when calling the tool. `attributes.strict` | Boolean | Optional | Whether function calling reliably adheres to the input schema or not. diff --git a/_ml-commons-plugin/api/train-predict/predict.md b/_ml-commons-plugin/api/train-predict/predict.md index cc738d984e8..50d16a0aea6 100644 --- a/_ml-commons-plugin/api/train-predict/predict.md +++ b/_ml-commons-plugin/api/train-predict/predict.md @@ -18,6 +18,18 @@ For information about user access for this API, see [Model access control consid POST /_plugins/_ml/_predict// ``` +## Request body fields + +The following table lists the available request fields. + +Field | Data type | Required/Optional | Description +:--- | :--- | :--- | :--- +`parameters` | Object | Optional | Model-specific parameters for prediction. +`parameters.input_processors` | Array | Optional | A list of processors to transform the input data before sending it to the model. For more information, see [Processor Chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/output-processors/). +`parameters.output_processors` | Array | Optional | A list of processors to transform the model's output data. For more information, see [Processor Chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/output-processors/). + +For remote models, the actual input fields depend on the model's connector configuration. For more information, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/). + ## Example request ```json From 37c61e68c3de56e5d5f70440aa5fbfde8fef8c18 Mon Sep 17 00:00:00 2001 From: Pavan Yekbote Date: Fri, 10 Oct 2025 13:51:21 -0700 Subject: [PATCH 07/12] fix links and page order Signed-off-by: Pavan Yekbote --- _ml-commons-plugin/agents-tools/index.md | 2 +- _ml-commons-plugin/api/agent-apis/register-agent.md | 2 +- _ml-commons-plugin/api/train-predict/predict.md | 4 ++-- .../{agents-tools/output-processors.md => processor-chain.md} | 4 ++-- 4 files changed, 6 insertions(+), 6 deletions(-) rename _ml-commons-plugin/{agents-tools/output-processors.md => processor-chain.md} (99%) diff --git a/_ml-commons-plugin/agents-tools/index.md b/_ml-commons-plugin/agents-tools/index.md index db8c4c96da4..7d7d8bf3ce5 100644 --- a/_ml-commons-plugin/agents-tools/index.md +++ b/_ml-commons-plugin/agents-tools/index.md @@ -18,5 +18,5 @@ An _agent_ orchestrates and runs ML models and tools. For a list of supported ag A _tool_ performs a set of specific tasks. Some examples of tools are the [`VectorDBTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/vector-db-tool/), which supports vector search, and the [`ListIndexTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/list-index-tool/), which executes the List Indices API. For a list of supported tools, see [Tools]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index/). -You can modify and transform tool outputs using [output processors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/output-processors/). Output processors allow you to chain multiple data transformations that execute sequentially on any tool's output. +You can modify and transform tool outputs using [processor chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). diff --git a/_ml-commons-plugin/api/agent-apis/register-agent.md b/_ml-commons-plugin/api/agent-apis/register-agent.md index 3c8017e97cc..3978ab3d360 100644 --- a/_ml-commons-plugin/api/agent-apis/register-agent.md +++ b/_ml-commons-plugin/api/agent-apis/register-agent.md @@ -60,7 +60,7 @@ Field | Data type | Required/Optional | Description `name`| String | Optional | The tool name. The tool name defaults to the `type` parameter value. If you need to include multiple tools of the same type in an agent, specify different names for the tools. | `description`| String | Optional | The tool description. Defaults to a built-in description for the specified type. | `parameters` | Object | Optional | The parameters for this tool. The parameters are highly dependent on the tool type. You can find information about specific tool types in [Tools]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index/). -`parameters.output_processors` | Array | Optional | A list of processors to transform the tool's output. For more information, see [Processor Chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/output-processors/). +`parameters.output_processors` | Array | Optional | A list of processors to transform the tool's output. For more information, see [Processor Chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). `attributes.input_schema` | Object | Optional | The expected input format for this tool defined as a [JSON schema](https://json-schema.org/). Used to define the structure the LLM should follow when calling the tool. `attributes.strict` | Boolean | Optional | Whether function calling reliably adheres to the input schema or not. diff --git a/_ml-commons-plugin/api/train-predict/predict.md b/_ml-commons-plugin/api/train-predict/predict.md index 50d16a0aea6..a2fc248f010 100644 --- a/_ml-commons-plugin/api/train-predict/predict.md +++ b/_ml-commons-plugin/api/train-predict/predict.md @@ -25,8 +25,8 @@ The following table lists the available request fields. Field | Data type | Required/Optional | Description :--- | :--- | :--- | :--- `parameters` | Object | Optional | Model-specific parameters for prediction. -`parameters.input_processors` | Array | Optional | A list of processors to transform the input data before sending it to the model. For more information, see [Processor Chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/output-processors/). -`parameters.output_processors` | Array | Optional | A list of processors to transform the model's output data. For more information, see [Processor Chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/output-processors/). +`parameters.input_processors` | Array | Optional | A list of processors to transform the input data before sending it to the model. For more information, see [Processor Chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). +`parameters.output_processors` | Array | Optional | A list of processors to transform the model's output data. For more information, see [Processor Chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). For remote models, the actual input fields depend on the model's connector configuration. For more information, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/). diff --git a/_ml-commons-plugin/agents-tools/output-processors.md b/_ml-commons-plugin/processor-chain.md similarity index 99% rename from _ml-commons-plugin/agents-tools/output-processors.md rename to _ml-commons-plugin/processor-chain.md index dc1c45973af..967dae3d45b 100644 --- a/_ml-commons-plugin/agents-tools/output-processors.md +++ b/_ml-commons-plugin/processor-chain.md @@ -1,8 +1,8 @@ --- layout: default title: Processor Chain -parent: Machine learning -nav_order: 30 +has_children: false +nav_order: 65 --- # Processor Chain From 5df18a3ef61cc6280ec803790bbdd8bf724cda70 Mon Sep 17 00:00:00 2001 From: Pavan Yekbote Date: Fri, 10 Oct 2025 15:25:34 -0700 Subject: [PATCH 08/12] fix: add new processor and fix regex link Signed-off-by: Pavan Yekbote --- _ml-commons-plugin/processor-chain.md | 58 ++++++++++++++++++++++++--- 1 file changed, 52 insertions(+), 6 deletions(-) diff --git a/_ml-commons-plugin/processor-chain.md b/_ml-commons-plugin/processor-chain.md index 967dae3d45b..8b87686ffd1 100644 --- a/_ml-commons-plugin/processor-chain.md +++ b/_ml-commons-plugin/processor-chain.md @@ -5,7 +5,7 @@ has_children: false nav_order: 65 --- -# Processor Chain +# Processor chain **Introduced 3.3** {: .label .label-purple } @@ -29,7 +29,7 @@ Processors execute in the order they appear in the array. Each processor receive Processors can be configured in different contexts: - **Tool outputs**: Add an `output_processors` array in the tool's `parameters` section -- **Model ouputs**: Add an `ouput_processors` array in the model's `parameters` section during a `_predict` call +- **Model outputs**: Add an `ouput_processors` array in the model's `parameters` section during a `_predict` call - **Model inputs**: Add an `input_processors` array in the model's `parameters` section of a `_predict` call For complete examples, see [Example usage with agents](#example-usage-with-agents) and [Example usage with models](#example-usage-with-models). @@ -49,6 +49,7 @@ Processor | Description [`conditional`](#conditional) | Applies different processor chains based on conditions. [`process_and_set`](#process_and_set) | Applies a chain of processors to the input and sets the result at a specified JSONPath location. [`set_field`](#set_field) | Sets a field to a specified static value or copies a value from another field. +[`for_each`](#for_each) | Iterates through array elements and applies a chain of processors to each element. ### to_string @@ -73,7 +74,7 @@ Output: "{\"name\":\"test\",\"value\":123}" ### regex_replace -Replaces text using regular expression patterns. For regex syntax details, see [OpenSearch regex syntax]({{site.url}}{{site.baseurl}}/query-dsl/regex-syntax/). +Replaces text using regular expression patterns. For regex syntax details, see [Java regex syntax](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). **Parameters:** - `pattern` (string, required): Regular expression pattern to match @@ -97,7 +98,7 @@ Output: "1,green,open,.plugins-ml-model\n2,red,closed,test-index" ### regex_capture -Captures specific groups from regex matches. For regex syntax details, see [OpenSearch regex syntax]({{site.url}}{{site.baseurl}}/query-dsl/regex-syntax/). +Captures specific groups from regex matches. For regex syntax details, see [Java regex syntax](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). **Parameters:** - `pattern` (string, required): Regular expression pattern with capture groups @@ -169,13 +170,13 @@ Output: {"status": "success", "count": 5} Removes fields from JSON objects using JSONPath. **Parameters:** -- `path` (string, required): JSONPath expression identifying fields to remove +- `paths` (array, required): Array of JSONPath expressions identifying fields to remove **Example Configuration:** ```json { "type": "remove_jsonpath", - "path": "$.sensitive_data" + "paths": "[$.sensitive_data]" } ``` @@ -307,6 +308,51 @@ Input: {"user": {"id": 123}, "name": "John"} Output: {"user": {"id": 123}, "name": "John", "userId": 123, "metadata": {"processed_at": "2024-03-15T10:30:00Z"}} ``` +### for_each + +Iterates through array elements and applies a chain of processors to each element. Useful for transforming array elements uniformly, such as adding missing fields, filtering content, or normalizing data structures. + +**Parameters:** +- `path` (string, required): JSONPath expression pointing to the array to iterate over. Must use `[*]` notation for array elements +- `processors` (array, required): List of processor configurations to apply to each array element + +**Behavior:** +- Each element is processed independently with the configured processor chain +- The output of the processor chain replaces the original element +- If the path doesn't exist or doesn't point to an array, returns input unchanged +- If processing an element fails, the original element is kept + +**Example Configuration:** +```json +{ + "type": "for_each", + "path": "$.items[*]", + "processors": [ + { + "type": "set_field", + "path": "$.processed", + "value": true + } + ] +} +``` + +**Example Input/Output:** +``` +Input: { + "items": [ + {"name": "item1", "value": 10}, + {"name": "item2", "value": 20} + ] +} +Output: { + "items": [ + {"name": "item1", "value": 10, "processed": true}, + {"name": "item2", "value": 20, "processed": true} + ] +} +``` + ### Example usage with agents **Step 1: Register a flow agent with output processors** From 65978620030e09332da682e01d36f88949b8e733 Mon Sep 17 00:00:00 2001 From: Nathan Bower Date: Mon, 13 Oct 2025 11:15:26 -0400 Subject: [PATCH 09/12] Apply suggestions from code review Signed-off-by: Nathan Bower --- _ml-commons-plugin/agents-tools/index.md | 2 +- .../api/agent-apis/register-agent.md | 2 +- .../api/train-predict/predict.md | 4 +- _ml-commons-plugin/processor-chain.md | 178 +++++++++--------- 4 files changed, 93 insertions(+), 93 deletions(-) diff --git a/_ml-commons-plugin/agents-tools/index.md b/_ml-commons-plugin/agents-tools/index.md index 7d7d8bf3ce5..a3ceed6ff37 100644 --- a/_ml-commons-plugin/agents-tools/index.md +++ b/_ml-commons-plugin/agents-tools/index.md @@ -18,5 +18,5 @@ An _agent_ orchestrates and runs ML models and tools. For a list of supported ag A _tool_ performs a set of specific tasks. Some examples of tools are the [`VectorDBTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/vector-db-tool/), which supports vector search, and the [`ListIndexTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/list-index-tool/), which executes the List Indices API. For a list of supported tools, see [Tools]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index/). -You can modify and transform tool outputs using [processor chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). +You can modify and transform tool outputs using a [processor chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). diff --git a/_ml-commons-plugin/api/agent-apis/register-agent.md b/_ml-commons-plugin/api/agent-apis/register-agent.md index 3978ab3d360..c8685ca6449 100644 --- a/_ml-commons-plugin/api/agent-apis/register-agent.md +++ b/_ml-commons-plugin/api/agent-apis/register-agent.md @@ -60,7 +60,7 @@ Field | Data type | Required/Optional | Description `name`| String | Optional | The tool name. The tool name defaults to the `type` parameter value. If you need to include multiple tools of the same type in an agent, specify different names for the tools. | `description`| String | Optional | The tool description. Defaults to a built-in description for the specified type. | `parameters` | Object | Optional | The parameters for this tool. The parameters are highly dependent on the tool type. You can find information about specific tool types in [Tools]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index/). -`parameters.output_processors` | Array | Optional | A list of processors to transform the tool's output. For more information, see [Processor Chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). +`parameters.output_processors` | Array | Optional | A list of processors used to transform the tool's output. For more information, see [Processor chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). `attributes.input_schema` | Object | Optional | The expected input format for this tool defined as a [JSON schema](https://json-schema.org/). Used to define the structure the LLM should follow when calling the tool. `attributes.strict` | Boolean | Optional | Whether function calling reliably adheres to the input schema or not. diff --git a/_ml-commons-plugin/api/train-predict/predict.md b/_ml-commons-plugin/api/train-predict/predict.md index a2fc248f010..d3a1c21b80f 100644 --- a/_ml-commons-plugin/api/train-predict/predict.md +++ b/_ml-commons-plugin/api/train-predict/predict.md @@ -25,8 +25,8 @@ The following table lists the available request fields. Field | Data type | Required/Optional | Description :--- | :--- | :--- | :--- `parameters` | Object | Optional | Model-specific parameters for prediction. -`parameters.input_processors` | Array | Optional | A list of processors to transform the input data before sending it to the model. For more information, see [Processor Chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). -`parameters.output_processors` | Array | Optional | A list of processors to transform the model's output data. For more information, see [Processor Chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). +`parameters.input_processors` | Array | Optional | A list of processors used to transform the input data before sending it to the model. For more information, see [Processor chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). +`parameters.output_processors` | Array | Optional | A list of processors used to transform the model's output data. For more information, see [Processor chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). For remote models, the actual input fields depend on the model's connector configuration. For more information, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/). diff --git a/_ml-commons-plugin/processor-chain.md b/_ml-commons-plugin/processor-chain.md index 8b87686ffd1..74212357655 100644 --- a/_ml-commons-plugin/processor-chain.md +++ b/_ml-commons-plugin/processor-chain.md @@ -1,6 +1,6 @@ --- layout: default -title: Processor Chain +title: Processor chain has_children: false nav_order: 65 --- @@ -15,22 +15,22 @@ Processor chains enable flexible data transformation pipelines that can process Processors provide a powerful way to: -- **Transform data formats**: Convert between different data structures (strings, JSON, arrays) -- **Extract specific information**: Use JSONPath or regex patterns to pull out relevant data -- **Clean and filter content**: Remove unwanted fields or apply formatting rules -- **Standardize data**: Ensure consistent data formats across different components +- **Transform data formats**: Convert between different data structures (strings, JSON, arrays). +- **Extract specific information**: Use JSONPath or regex patterns to extract relevant data. +- **Clean and filter content**: Remove unwanted fields or apply formatting rules. +- **Standardize data**: Ensure consistent data formats across different components. ### Sequential execution -Processors execute in the order they appear in the array. Each processor receives the output from the previous processor. +Processors execute in the order in which they appear in the array. Each processor receives the output from the previous processor. ## Configuration Processors can be configured in different contexts: -- **Tool outputs**: Add an `output_processors` array in the tool's `parameters` section -- **Model outputs**: Add an `ouput_processors` array in the model's `parameters` section during a `_predict` call -- **Model inputs**: Add an `input_processors` array in the model's `parameters` section of a `_predict` call +- **Tool outputs**: Add an `output_processors` array in the tool's `parameters` section. +- **Model outputs**: Add an `ouput_processors` array in the model's `parameters` section during a `_predict` call. +- **Model inputs**: Add an `input_processors` array in the model's `parameters` section of a `_predict` call. For complete examples, see [Example usage with agents](#example-usage-with-agents) and [Example usage with models](#example-usage-with-models). @@ -55,10 +55,10 @@ Processor | Description Converts the input to a JSON string representation. -**Parameters:** -- `escape_json` (Boolean, optional): Whether to escape JSON characters. Default: `false` +**Parameters**: +- `escape_json` (Boolean, optional): Whether to escape JSON characters. Default is `false`. -**Example Configuration:** +**Example configuration**: ```json { "type": "to_string", @@ -66,7 +66,7 @@ Converts the input to a JSON string representation. } ``` -**Example Input/Output:** +**Example input/output**: ``` Input: {"name": "test", "value": 123} Output: "{\"name\":\"test\",\"value\":123}" @@ -76,12 +76,12 @@ Output: "{\"name\":\"test\",\"value\":123}" Replaces text using regular expression patterns. For regex syntax details, see [Java regex syntax](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). -**Parameters:** -- `pattern` (string, required): Regular expression pattern to match -- `replacement` (string, optional): Replacement text. Default: `""` -- `replace_all` (Boolean, optional): Whether to replace all matches or only the first. Default: `true` +**Parameters**: +- `pattern` (string, required): A regular expression pattern to match. +- `replacement` (string, optional): Replacement text. Default is `""`. +- `replace_all` (Boolean, optional): Whether to replace all matches or only the first. Default is `true`. -**Example Configuration:** +**Example configuration**: ```json { "type": "regex_replace", @@ -90,7 +90,7 @@ Replaces text using regular expression patterns. For regex syntax details, see [ } ``` -**Example Input/Output:** +**Example input/output**: ``` Input: "row,health,status,index\n1,green,open,.plugins-ml-model\n2,red,closed,test-index" Output: "1,green,open,.plugins-ml-model\n2,red,closed,test-index" @@ -100,11 +100,11 @@ Output: "1,green,open,.plugins-ml-model\n2,red,closed,test-index" Captures specific groups from regex matches. For regex syntax details, see [Java regex syntax](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). -**Parameters:** -- `pattern` (string, required): Regular expression pattern with capture groups -- `groups` (string or array, optional): Group numbers to capture. Can be a single number like `"1"` or array like `"[1, 2, 4]"`. Default: `"1"` +**Parameters**: +- `pattern` (string, required): A regular expression pattern with capture groups. +- `groups` (string or array, optional): Group numbers to capture. Can be a single number like `"1"` or array like `"[1, 2, 4]"`. Default is `"1"`. -**Example Configuration:** +**Example configuration**: ```json { "type": "regex_capture", @@ -113,7 +113,7 @@ Captures specific groups from regex matches. For regex syntax details, see [Java } ``` -**Example Input/Output:** +**Example input/output**: ``` Input: "1,green,open,.plugins-ml-model-group,DCJHJc7pQ6Gid02PaSeXBQ,1,0" Output: ["1", ".plugins-ml-model-group"] @@ -123,11 +123,11 @@ Output: ["1", ".plugins-ml-model-group"] Extracts data using JSONPath expressions. -**Parameters:** -- `path` (string, required): JSONPath expression to extract data -- `default` (any, optional): Default value if path is not found +**Parameters**: +- `path` (string, required): The JSONPath expression used to extract data. +- `default` (any, optional): The default value if the path is not found. -**Example Configuration:** +**Example configuration**: ```json { "type": "jsonpath_filter", @@ -136,7 +136,7 @@ Extracts data using JSONPath expressions. } ``` -**Example Input/Output:** +**Example input/output**: ``` Input: {"data": {"items": [{"name": "item1"}, {"name": "item2"}]}} Output: ["item1", "item2"] @@ -146,11 +146,11 @@ Output: ["item1", "item2"] Extracts JSON objects or arrays from text strings. -**Parameters:** -- `extract_type` (string, optional): Type of JSON to extract - `"object"`, `"array"`, or `"auto"`. Default: `"auto"` -- `default` (any, optional): Default value if JSON extraction fails +**Parameters**: +- `extract_type` (string, optional): The type of JSON to extract: `"object"`, `"array"`, or `"auto"`. Default is `"auto"`. +- `default` (any, optional): The default value if JSON extraction fails. -**Example Configuration:** +**Example configuration**: ```json { "type": "extract_json", @@ -159,7 +159,7 @@ Extracts JSON objects or arrays from text strings. } ``` -**Example Input/Output:** +**Example input/output**: ``` Input: "The result is: {\"status\": \"success\", \"count\": 5} - processing complete" Output: {"status": "success", "count": 5} @@ -169,10 +169,10 @@ Output: {"status": "success", "count": 5} Removes fields from JSON objects using JSONPath. -**Parameters:** -- `paths` (array, required): Array of JSONPath expressions identifying fields to remove +**Parameters**: +- `paths` (array, required): An array of JSONPath expressions identifying fields to remove. -**Example Configuration:** +**Example configuration**: ```json { "type": "remove_jsonpath", @@ -180,7 +180,7 @@ Removes fields from JSON objects using JSONPath. } ``` -**Example Input/Output:** +**Example input/output**: ``` Input: {"name": "user1", "sensitive_data": "secret", "public_info": "visible"} Output: {"name": "user1", "public_info": "visible"} @@ -190,19 +190,19 @@ Output: {"name": "user1", "public_info": "visible"} Applies different processor chains based on conditions. -**Parameters:** -- `path` (string, optional): JSONPath to extract value for condition evaluation -- `routes` (array, required): Array of condition-processor mappings -- `default` (array, optional): Default processors if no conditions match +**Parameters**: +- `path` (string, optional): The JSONPath expression used to extract the value for condition evaluation. +- `routes` (array, required): An array of condition-processor mappings. +- `default` (array, optional): The default processors if no conditions match. -**Supported conditions:** +**Supported conditions**: - Exact value match: `"value"` -- Numeric comparisons: `">10"`, `"<5"`, `">=", `"<="`, `"==5"` +- Numeric comparisons: `">10"`, `"<5"`, `">="`, `"<="`, `"==5"` - Existence checks: `"exists"`, `"null"`, `"not_exists"` - Regex matching: `"regex:pattern"` - Contains text: `"contains:substring"` -**Example Configuration:** +**Example configuration**: ```json { "type": "conditional", @@ -225,7 +225,7 @@ Applies different processor chains based on conditions. } ``` -**Example Input/Output:** +**Example input/output**: ``` Input: {"index": "test-index", "status": "green", "docs": 100} Output: {"index": "test-index", "healthy": "green", "docs": 100} @@ -235,16 +235,16 @@ Output: {"index": "test-index", "healthy": "green", "docs": 100} Applies a chain of processors to the input and sets the result at a specified JSONPath location. -**Parameters:** -- `path` (string, required): JSONPath expression specifying where to set the processed result -- `processors` (array, required): List of processor configurations to apply sequentially +**Parameters**: +- `path` (string, required): The JSONPath expression specifying where to set the processed result. +- `processors` (array, required): A list of processor configurations to apply sequentially. -**Path behavior:** -- If the path exists, it will be updated with the processed value -- If the path doesn't exist, attempts to create it (works for simple nested fields) -- Parent path must exist for new field creation to succeed +**Path behavior**: +- If the path exists, it will be updated with the processed value. +- If the path doesn't exist, attempts to create it (works for simple nested fields). +- A parent path must exist for new field creation to succeed. -**Example Configuration:** +**Example configuration**: ```json { "type": "process_and_set", @@ -262,7 +262,7 @@ Applies a chain of processors to the input and sets the result at a specified JS } ``` -**Example Input/Output:** +**Example input/output**: ``` Input: {"name": "Test Index!", "status": "active"} Output: {"name": "Test Index!", "status": "active", "summary": {"clean_name": "Test_Index_"}} @@ -272,18 +272,18 @@ Output: {"name": "Test Index!", "status": "active", "summary": {"clean_name": "T Sets a field to a specified static value or copies a value from another field. -**Parameters:** -- `path` (string, required): JSONPath expression specifying where to set the value -- `value` (any, conditionally required): Static value to set. Either `value` or `source_path` must be provided -- `source_path` (string, conditionally required): JSONPath to copy value from. Either `value` or `source_path` must be provided -- `default` (any, optional): Default value when `source_path` doesn't exist. Only used with `source_path` +**Parameters**: +- `path` (string, required): The JSONPath expression specifying where to set the value. +- `value` (any, conditionally required): The static value to set. Either `value` or `source_path` must be provided. +- `source_path` (string, conditionally required): The JSONPath expression to copy the value from. Either `value` or `source_path` must be provided. +- `default` (any, optional): The default value when `source_path` doesn't exist. Only used with `source_path`. **Path behavior:** -- If the path exists, it will be updated with the new value -- If the path doesn't exist, attempts to create it (works for simple nested fields) -- Parent path must exist for new field creation to succeed +- If the path exists, it will be updated with the new value. +- If the path doesn't exist, attempts to create it (works for simple nested fields). +- A parent path must exist for new field creation to succeed. -**Example Configuration (static value):** +**Example configuration (static value)**: ```json { "type": "set_field", @@ -292,7 +292,7 @@ Sets a field to a specified static value or copies a value from another field. } ``` -**Example Configuration (copy field):** +**Example configuration (copy field)**: ```json { "type": "set_field", @@ -302,7 +302,7 @@ Sets a field to a specified static value or copies a value from another field. } ``` -**Example Input/Output:** +**Example input/output**: ``` Input: {"user": {"id": 123}, "name": "John"} Output: {"user": {"id": 123}, "name": "John", "userId": 123, "metadata": {"processed_at": "2024-03-15T10:30:00Z"}} @@ -310,19 +310,19 @@ Output: {"user": {"id": 123}, "name": "John", "userId": 123, "metadata": {"proce ### for_each -Iterates through array elements and applies a chain of processors to each element. Useful for transforming array elements uniformly, such as adding missing fields, filtering content, or normalizing data structures. +Iterates through array elements and applies a chain of processors to each element. Useful for transforming array elements uniformly, such as when adding missing fields, filtering content, or normalizing data structures. -**Parameters:** -- `path` (string, required): JSONPath expression pointing to the array to iterate over. Must use `[*]` notation for array elements -- `processors` (array, required): List of processor configurations to apply to each array element +**Parameters**: +- `path` (string, required): The JSONPath expression pointing to the array to iterate over. Must use `[*]` notation for array elements. +- `processors` (array, required): A list of processor configurations to apply to each array element. -**Behavior:** -- Each element is processed independently with the configured processor chain -- The output of the processor chain replaces the original element -- If the path doesn't exist or doesn't point to an array, returns input unchanged -- If processing an element fails, the original element is kept +**Behavior**: +- Each element is processed independently using the configured processor chain. +- The output of the processor chain replaces the original element. +- If the path doesn't exist or doesn't point to an array, the input is returned unchanged. +- If the processing of an element fails, the original element is kept. -**Example Configuration:** +**Example configuration**: ```json { "type": "for_each", @@ -337,7 +337,7 @@ Iterates through array elements and applies a chain of processors to each elemen } ``` -**Example Input/Output:** +**Example input/output**: ``` Input: { "items": [ @@ -397,7 +397,7 @@ POST /_plugins/_ml/agents/{agent_id}/_execute } ``` -**Without output processors, the raw ListIndexTool would return:** +Without output processors, the raw `ListIndexTool` would return the following: ``` row,health,status,index,uuid,pri,rep,docs.count,docs.deleted,store.size,pri.store.size 1,green,open,.plugins-ml-model-group,DCJHJc7pQ6Gid02PaSeXBQ,1,0,1,0,12.7kb,12.7kb @@ -405,7 +405,7 @@ row,health,status,index,uuid,pri,rep,docs.count,docs.deleted,store.size,pri.stor 3,green,open,.plugins-ml-memory-meta,LqP3QMaURNKYDZ9p8dTq3Q,1,0,2,0,44.8kb,44.8kb ``` -**With output processors, the agent returns:** +With output processors, the agent returns the following: ``` 1,green,open,.plugins-ml-model-group 2,green,open,.plugins-ml-memory-message @@ -413,16 +413,16 @@ row,health,status,index,uuid,pri,rep,docs.count,docs.deleted,store.size,pri.stor ``` The output processors transform the verbose CSV output into a clean, readable format by: -1. **`regex_replace`**: Removing the CSV header row -2. **`regex_capture`**: Extracting only essential information (row number, health, status, and index name) +1. **`regex_replace`**: Removing the CSV header row. +2. **`regex_capture`**: Extracting only essential information (row number, health, status, and index name). ## Example usage with models The following examples demonstrate how to use processor chains with models during prediction calls. -### Input processors example +### Example: Input processors -This example shows how to modify model input using `input_processors` to replace text before processing: +This example shows you how to modify model input using `input_processors` to replace text before processing: ```json POST _plugins/_ml/models/{model_id}/_predict @@ -443,9 +443,9 @@ POST _plugins/_ml/models/{model_id}/_predict In this example, the `regex_replace` processor modifies the prompt before it's sent to the model, changing "100 words" to "20 words". -### Output processors example +### Example: Output processors -This example shows how to process model output using `output_processors` to extract and format JSON data: +This example shows you how to process model output using `output_processors` to extract and format JSON data: ```json POST _plugins/_ml/models/{model_id}/_predict @@ -487,10 +487,10 @@ POST _plugins/_ml/models/{model_id}/_predict ``` In this example, the output processors: -1. Extract the content from the model response using JSONPath -2. Parse and extract the JSON object from the text response +1. Extract the content from the model response using JSONPath. +2. Parse and extract the JSON object from the text response. -**Without output processors, the raw response would be:** +Without output processors, the raw response would be the following: ```json { "inference_results": [ @@ -542,7 +542,7 @@ In this example, the output processors: } ``` -**With output processors, the response becomes:** +With output processors, the response becomes the following: ```json { "inference_results": [ From 0645d526ec46433fd181408c6ae8d67659cdae01 Mon Sep 17 00:00:00 2001 From: Nathan Bower Date: Mon, 13 Oct 2025 13:57:02 -0400 Subject: [PATCH 10/12] Apply suggestions from code review Signed-off-by: Nathan Bower --- _ml-commons-plugin/processor-chain.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_ml-commons-plugin/processor-chain.md b/_ml-commons-plugin/processor-chain.md index 74212357655..46d110284af 100644 --- a/_ml-commons-plugin/processor-chain.md +++ b/_ml-commons-plugin/processor-chain.md @@ -241,7 +241,7 @@ Applies a chain of processors to the input and sets the result at a specified JS **Path behavior**: - If the path exists, it will be updated with the processed value. -- If the path doesn't exist, attempts to create it (works for simple nested fields). +- If the path doesn't exist, the processor chain attempts to create it (works for simple nested fields). - A parent path must exist for new field creation to succeed. **Example configuration**: @@ -280,7 +280,7 @@ Sets a field to a specified static value or copies a value from another field. **Path behavior:** - If the path exists, it will be updated with the new value. -- If the path doesn't exist, attempts to create it (works for simple nested fields). +- If the path doesn't exist, the processor chain attempts to create it (works for simple nested fields). - A parent path must exist for new field creation to succeed. **Example configuration (static value)**: From d1be395a5254a68bf8732c4c91a50c5d020de4b1 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Mon, 13 Oct 2025 14:16:05 -0400 Subject: [PATCH 11/12] Doc review Signed-off-by: Fanit Kolchina --- .../api/train-predict/predict.md | 2 +- _ml-commons-plugin/opensearch-assistant.md | 2 +- _ml-commons-plugin/processor-chain.md | 444 +++++++++++------- 3 files changed, 272 insertions(+), 176 deletions(-) diff --git a/_ml-commons-plugin/api/train-predict/predict.md b/_ml-commons-plugin/api/train-predict/predict.md index d3a1c21b80f..7974b108f1f 100644 --- a/_ml-commons-plugin/api/train-predict/predict.md +++ b/_ml-commons-plugin/api/train-predict/predict.md @@ -28,7 +28,7 @@ Field | Data type | Required/Optional | Description `parameters.input_processors` | Array | Optional | A list of processors used to transform the input data before sending it to the model. For more information, see [Processor chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). `parameters.output_processors` | Array | Optional | A list of processors used to transform the model's output data. For more information, see [Processor chain]({{site.url}}{{site.baseurl}}/ml-commons-plugin/processor-chain/). -For remote models, the actual input fields depend on the model's connector configuration. For more information, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/). +For externally hosted models, the actual input fields depend on the model's connector configuration. For more information, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/). ## Example request diff --git a/_ml-commons-plugin/opensearch-assistant.md b/_ml-commons-plugin/opensearch-assistant.md index 0a058d73a02..bc030358f29 100644 --- a/_ml-commons-plugin/opensearch-assistant.md +++ b/_ml-commons-plugin/opensearch-assistant.md @@ -3,7 +3,7 @@ layout: default title: OpenSearch Assistant Toolkit has_children: false has_toc: false -nav_order: 28 +nav_order: 70 --- # OpenSearch Assistant Toolkit diff --git a/_ml-commons-plugin/processor-chain.md b/_ml-commons-plugin/processor-chain.md index 74212357655..d70c134e54a 100644 --- a/_ml-commons-plugin/processor-chain.md +++ b/_ml-commons-plugin/processor-chain.md @@ -1,35 +1,40 @@ --- layout: default -title: Processor chain +title: Processor chains has_children: false -nav_order: 65 +nav_order: 50 --- -# Processor chain +# Processor chains **Introduced 3.3** {: .label .label-purple } Processor chains enable flexible data transformation pipelines that can process both input and output data. Chain multiple processors together to create sequential transformations where each processor's output becomes the next processor's input. -## Overview - -Processors provide a powerful way to: +Processors provide a way to: - **Transform data formats**: Convert between different data structures (strings, JSON, arrays). - **Extract specific information**: Use JSONPath or regex patterns to extract relevant data. - **Clean and filter content**: Remove unwanted fields or apply formatting rules. - **Standardize data**: Ensure consistent data formats across different components. -### Sequential execution - Processors execute in the order in which they appear in the array. Each processor receives the output from the previous processor. +{: .note} + +Processor chains are specifically designed for ML workflows and differ from processors in ingest and search pipelines: + +- [**Ingest pipelines**]({{site.url}}{{site.baseurl}}/ingest-pipelines/): Transform documents during indexing into OpenSearch. +- [**Search pipelines**]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/): Transform queries and search results during search operations. +- **Processor chains**: Transform data within ML Commons workflows (agent tools, model inputs/outputs). + +Processor chains provide specialized data transformation capabilities tailored for AI/ML use cases, such as cleaning model responses, extracting structured data from LLM outputs, and preparing inputs for model inference. ## Configuration Processors can be configured in different contexts: - **Tool outputs**: Add an `output_processors` array in the tool's `parameters` section. -- **Model outputs**: Add an `ouput_processors` array in the model's `parameters` section during a `_predict` call. +- **Model outputs**: Add an `output_processors` array in the model's `parameters` section during a `_predict` call. - **Model inputs**: Add an `input_processors` array in the model's `parameters` section of a `_predict` call. For complete examples, see [Example usage with agents](#example-usage-with-agents) and [Example usage with models](#example-usage-with-models). @@ -40,83 +45,154 @@ The following table lists all supported processors. Processor | Description :--- | :--- -[`to_string`](#to_string) | Converts the input to a JSON string representation. -[`regex_replace`](#regex_replace) | Replaces text using regular expression patterns. -[`regex_capture`](#regex_capture) | Captures specific groups from regex matches. -[`jsonpath_filter`](#jsonpath_filter) | Extracts data using JSONPath expressions. -[`extract_json`](#extract_json) | Extracts JSON objects or arrays from text strings. -[`remove_jsonpath`](#remove_jsonpath) | Removes fields from JSON objects using JSONPath. [`conditional`](#conditional) | Applies different processor chains based on conditions. +[`extract_json`](#extract_json) | Extracts JSON objects or arrays from text strings. +[`for_each`](#for_each) | Iterates through array elements and applies a chain of processors to each element. +[`jsonpath_filter`](#jsonpath_filter) | Extracts data using JSONPath expressions. [`process_and_set`](#process_and_set) | Applies a chain of processors to the input and sets the result at a specified JSONPath location. +[`regex_capture`](#regex_capture) | Captures specific groups from regex matches. +[`regex_replace`](#regex_replace) | Replaces text using regular expression patterns. +[`remove_jsonpath`](#remove_jsonpath) | Removes fields from JSON objects using JSONPath. [`set_field`](#set_field) | Sets a field to a specified static value or copies a value from another field. -[`for_each`](#for_each) | Iterates through array elements and applies a chain of processors to each element. +[`to_string`](#to_string) | Converts the input to a JSON string representation. -### to_string +### conditional -Converts the input to a JSON string representation. +Applies different processor chains based on conditions. **Parameters**: -- `escape_json` (Boolean, optional): Whether to escape JSON characters. Default is `false`. + +- `path` (string, optional): The JSONPath expression used to extract the value for condition evaluation. +- `routes` (array, required): An array of condition-processor mappings. +- `default` (array, optional): The default processors if no conditions match. + +**Supported conditions**: + +- Exact value match: `"value"` +- Numeric comparisons: `">10"`, `"<5"`, `">="`, `"<="`, `"==5"` +- Existence checks: `"exists"`, `"null"`, `"not_exists"` +- Regex matching: `"regex:pattern"` +- Contains text: `"contains:substring"` **Example configuration**: + ```json { - "type": "to_string", - "escape_json": true + "type": "conditional", + "path": "$.status", + "routes": [ + { + "green": [ + {"type": "regex_replace", "pattern": "status", "replacement": "healthy"} + ] + }, + { + "red": [ + {"type": "regex_replace", "pattern": "status", "replacement": "unhealthy"} + ] + } + ], + "default": [ + {"type": "regex_replace", "pattern": "status", "replacement": "unknown"} + ] } ``` -**Example input/output**: +**Example input**: + +```json +{"index": "test-index", "status": "green", "docs": 100} ``` -Input: {"name": "test", "value": 123} -Output: "{\"name\":\"test\",\"value\":123}" + +**Example output**: + +```json +{"index": "test-index", "healthy": "green", "docs": 100} ``` -### regex_replace +### extract_json -Replaces text using regular expression patterns. For regex syntax details, see [Java regex syntax](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). +Extracts JSON objects or arrays from text strings. **Parameters**: -- `pattern` (string, required): A regular expression pattern to match. -- `replacement` (string, optional): Replacement text. Default is `""`. -- `replace_all` (Boolean, optional): Whether to replace all matches or only the first. Default is `true`. + +- `extract_type` (string, optional): The type of JSON to extract: `"object"`, `"array"`, or `"auto"`. Default is `"auto"`. +- `default` (any, optional): The default value if JSON extraction fails. **Example configuration**: + ```json { - "type": "regex_replace", - "pattern": "^.*?\n", - "replacement": "" + "type": "extract_json", + "extract_type": "object", + "default": {} } ``` -**Example input/output**: +**Example input**: + +```json +"The result is: {\"status\": \"success\", \"count\": 5} - processing complete" ``` -Input: "row,health,status,index\n1,green,open,.plugins-ml-model\n2,red,closed,test-index" -Output: "1,green,open,.plugins-ml-model\n2,red,closed,test-index" + +**Example output**: + +```json +{"status": "success", "count": 5} ``` -### regex_capture +### for_each -Captures specific groups from regex matches. For regex syntax details, see [Java regex syntax](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). +Iterates through array elements and applies a chain of processors to each element. Useful for transforming array elements uniformly, such as when adding missing fields, filtering content, or normalizing data structures. **Parameters**: -- `pattern` (string, required): A regular expression pattern with capture groups. -- `groups` (string or array, optional): Group numbers to capture. Can be a single number like `"1"` or array like `"[1, 2, 4]"`. Default is `"1"`. + +- `path` (string, required): The JSONPath expression pointing to the array to iterate over. Must use `[*]` notation for array elements. +- `processors` (array, required): A list of processor configurations to apply to each array element. + +**Behavior**: + +- Each element is processed independently using the configured processor chain. +- The output of the processor chain replaces the original element. +- If the path doesn't exist or doesn't point to an array, the input is returned unchanged. +- If the processing of an element fails, the original element is kept. **Example configuration**: + ```json { - "type": "regex_capture", - "pattern": "(\\d+),(\\w+),(\\w+),([^,]+)", - "groups": "[1, 4]" + "type": "for_each", + "path": "$.items[*]", + "processors": [ + { + "type": "set_field", + "path": "$.processed", + "value": true + } + ] } ``` -**Example input/output**: +**Example input**: + +```json +{ + "items": [ + {"name": "item1", "value": 10}, + {"name": "item2", "value": 20} + ] +} ``` -Input: "1,green,open,.plugins-ml-model-group,DCJHJc7pQ6Gid02PaSeXBQ,1,0" -Output: ["1", ".plugins-ml-model-group"] + +**Example output**: + +```json +{ + "items": [ + {"name": "item1", "value": 10, "processed": true}, + {"name": "item2", "value": 20, "processed": true} + ] +} ``` ### jsonpath_filter @@ -124,10 +200,12 @@ Output: ["1", ".plugins-ml-model-group"] Extracts data using JSONPath expressions. **Parameters**: + - `path` (string, required): The JSONPath expression used to extract data. - `default` (any, optional): The default value if the path is not found. **Example configuration**: + ```json { "type": "jsonpath_filter", @@ -136,136 +214,153 @@ Extracts data using JSONPath expressions. } ``` -**Example input/output**: +**Example input**: + +```json +{"data": {"items": [{"name": "item1"}, {"name": "item2"}]}} ``` -Input: {"data": {"items": [{"name": "item1"}, {"name": "item2"}]}} -Output: ["item1", "item2"] + +**Example output**: + +```json +["item1", "item2"] ``` -### extract_json +### process_and_set -Extracts JSON objects or arrays from text strings. +Applies a chain of processors to the input and sets the result at a specified JSONPath location. **Parameters**: -- `extract_type` (string, optional): The type of JSON to extract: `"object"`, `"array"`, or `"auto"`. Default is `"auto"`. -- `default` (any, optional): The default value if JSON extraction fails. + +- `path` (string, required): The JSONPath expression specifying where to set the processed result. +- `processors` (array, required): A list of processor configurations to apply sequentially. + +**Path behavior**: + +- If the path exists, it will be updated with the processed value. +- If the path doesn't exist, attempts to create it (works for simple nested fields). +- A parent path must exist for new field creation to succeed. **Example configuration**: + ```json { - "type": "extract_json", - "extract_type": "object", - "default": {} + "type": "process_and_set", + "path": "$.summary.clean_name", + "processors": [ + { + "type": "to_string" + }, + { + "type": "regex_replace", + "pattern": "[^a-zA-Z0-9]", + "replacement": "_" + } + ] } ``` -**Example input/output**: +**Example input**: + +```json +{"name": "Test Index!", "status": "active"} ``` -Input: "The result is: {\"status\": \"success\", \"count\": 5} - processing complete" -Output: {"status": "success", "count": 5} + +**Example output**: + +```json +{"name": "Test Index!", "status": "active", "summary": {"clean_name": "Test_Index_"}} ``` -### remove_jsonpath +### regex_capture -Removes fields from JSON objects using JSONPath. +Captures specific groups from regex matches. For regex syntax details, see [Java regex syntax](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). **Parameters**: -- `paths` (array, required): An array of JSONPath expressions identifying fields to remove. + +- `pattern` (string, required): A regular expression pattern with capture groups. +- `groups` (string or array, optional): Group numbers to capture. Can be a single number like `"1"` or array like `"[1, 2, 4]"`. Default is `"1"`. **Example configuration**: + ```json { - "type": "remove_jsonpath", - "paths": "[$.sensitive_data]" + "type": "regex_capture", + "pattern": "(\\d+),(\\w+),(\\w+),([^,]+)", + "groups": "[1, 4]" } ``` -**Example input/output**: +**Example input**: + +```json +"1,green,open,.plugins-ml-model-group,DCJHJc7pQ6Gid02PaSeXBQ,1,0" ``` -Input: {"name": "user1", "sensitive_data": "secret", "public_info": "visible"} -Output: {"name": "user1", "public_info": "visible"} + +**Example output**: + +```json +["1", ".plugins-ml-model-group"] ``` -### conditional +### regex_replace -Applies different processor chains based on conditions. +Replaces text using regular expression patterns. For regex syntax details, see [Java regex syntax](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). **Parameters**: -- `path` (string, optional): The JSONPath expression used to extract the value for condition evaluation. -- `routes` (array, required): An array of condition-processor mappings. -- `default` (array, optional): The default processors if no conditions match. - -**Supported conditions**: -- Exact value match: `"value"` -- Numeric comparisons: `">10"`, `"<5"`, `">="`, `"<="`, `"==5"` -- Existence checks: `"exists"`, `"null"`, `"not_exists"` -- Regex matching: `"regex:pattern"` -- Contains text: `"contains:substring"` +- `pattern` (string, required): A regular expression pattern to match. +- `replacement` (string, optional): Replacement text. Default is `""`. +- `replace_all` (Boolean, optional): Whether to replace all matches or only the first. Default is `true`. **Example configuration**: + ```json { - "type": "conditional", - "path": "$.status", - "routes": [ - { - "green": [ - {"type": "regex_replace", "pattern": "status", "replacement": "healthy"} - ] - }, - { - "red": [ - {"type": "regex_replace", "pattern": "status", "replacement": "unhealthy"} - ] - } - ], - "default": [ - {"type": "regex_replace", "pattern": "status", "replacement": "unknown"} - ] + "type": "regex_replace", + "pattern": "^.*?\n", + "replacement": "" } ``` -**Example input/output**: +**Example input**: + +```json +"row,health,status,index\n1,green,open,.plugins-ml-model\n2,red,closed,test-index" ``` -Input: {"index": "test-index", "status": "green", "docs": 100} -Output: {"index": "test-index", "healthy": "green", "docs": 100} + +**Example output**: + +```json +"1,green,open,.plugins-ml-model\n2,red,closed,test-index" ``` -### process_and_set +### remove_jsonpath -Applies a chain of processors to the input and sets the result at a specified JSONPath location. +Removes fields from JSON objects using JSONPath. **Parameters**: -- `path` (string, required): The JSONPath expression specifying where to set the processed result. -- `processors` (array, required): A list of processor configurations to apply sequentially. -**Path behavior**: -- If the path exists, it will be updated with the processed value. -- If the path doesn't exist, attempts to create it (works for simple nested fields). -- A parent path must exist for new field creation to succeed. +- `paths` (array, required): An array of JSONPath expressions identifying fields to remove. **Example configuration**: + ```json { - "type": "process_and_set", - "path": "$.summary.clean_name", - "processors": [ - { - "type": "to_string" - }, - { - "type": "regex_replace", - "pattern": "[^a-zA-Z0-9]", - "replacement": "_" - } - ] + "type": "remove_jsonpath", + "paths": "[$.sensitive_data]" } ``` -**Example input/output**: +**Example input**: + +```json +{"name": "user1", "sensitive_data": "secret", "public_info": "visible"} ``` -Input: {"name": "Test Index!", "status": "active"} -Output: {"name": "Test Index!", "status": "active", "summary": {"clean_name": "Test_Index_"}} + +**Example output**: + +```json +{"name": "user1", "public_info": "visible"} ``` ### set_field @@ -273,17 +368,20 @@ Output: {"name": "Test Index!", "status": "active", "summary": {"clean_name": "T Sets a field to a specified static value or copies a value from another field. **Parameters**: + - `path` (string, required): The JSONPath expression specifying where to set the value. - `value` (any, conditionally required): The static value to set. Either `value` or `source_path` must be provided. - `source_path` (string, conditionally required): The JSONPath expression to copy the value from. Either `value` or `source_path` must be provided. - `default` (any, optional): The default value when `source_path` doesn't exist. Only used with `source_path`. -**Path behavior:** +**Path behavior**: + - If the path exists, it will be updated with the new value. - If the path doesn't exist, attempts to create it (works for simple nested fields). - A parent path must exist for new field creation to succeed. **Example configuration (static value)**: + ```json { "type": "set_field", @@ -293,6 +391,7 @@ Sets a field to a specified static value or copies a value from another field. ``` **Example configuration (copy field)**: + ```json { "type": "set_field", @@ -302,60 +401,52 @@ Sets a field to a specified static value or copies a value from another field. } ``` -**Example input/output**: +**Example input**: + +```json +{"user": {"id": 123}, "name": "John"} ``` -Input: {"user": {"id": 123}, "name": "John"} -Output: {"user": {"id": 123}, "name": "John", "userId": 123, "metadata": {"processed_at": "2024-03-15T10:30:00Z"}} + +**Example output**: + +```json +{"user": {"id": 123}, "name": "John", "userId": 123, "metadata": {"processed_at": "2024-03-15T10:30:00Z"}} ``` -### for_each +### to_string -Iterates through array elements and applies a chain of processors to each element. Useful for transforming array elements uniformly, such as when adding missing fields, filtering content, or normalizing data structures. +Converts the input to a JSON string representation. **Parameters**: -- `path` (string, required): The JSONPath expression pointing to the array to iterate over. Must use `[*]` notation for array elements. -- `processors` (array, required): A list of processor configurations to apply to each array element. -**Behavior**: -- Each element is processed independently using the configured processor chain. -- The output of the processor chain replaces the original element. -- If the path doesn't exist or doesn't point to an array, the input is returned unchanged. -- If the processing of an element fails, the original element is kept. +- `escape_json` (Boolean, optional): Whether to escape JSON characters. Default is `false`. **Example configuration**: + ```json { - "type": "for_each", - "path": "$.items[*]", - "processors": [ - { - "type": "set_field", - "path": "$.processed", - "value": true - } - ] + "type": "to_string", + "escape_json": true } ``` -**Example input/output**: +**Example input**: + +```json +{"name": "test", "value": 123} ``` -Input: { - "items": [ - {"name": "item1", "value": 10}, - {"name": "item2", "value": 20} - ] -} -Output: { - "items": [ - {"name": "item1", "value": 10, "processed": true}, - {"name": "item2", "value": 20, "processed": true} - ] -} + +**Example output**: + +```json +"{\"name\":\"test\",\"value\":123}" ``` -### Example usage with agents +## Example usage with agents -**Step 1: Register a flow agent with output processors** +The following example demonstrates using processor chains with agents. + +### Step 1: Register a flow agent with output processors ```json POST /_plugins/_ml/agents/_register @@ -383,8 +474,9 @@ POST /_plugins/_ml/agents/_register ] } ``` +{% include copy-curl.html %} -**Step 2: Execute the agent** +### Step 2: Execute the agent Using the `agent_id` returned in the previous step: @@ -396,29 +488,33 @@ POST /_plugins/_ml/agents/{agent_id}/_execute } } ``` +{% include copy-curl.html %} -Without output processors, the raw `ListIndexTool` would return the following: -``` +Without output processors, the raw `ListIndexTool` returns verbose CSV output with headers and extra columns: + +```cs row,health,status,index,uuid,pri,rep,docs.count,docs.deleted,store.size,pri.store.size 1,green,open,.plugins-ml-model-group,DCJHJc7pQ6Gid02PaSeXBQ,1,0,1,0,12.7kb,12.7kb 2,green,open,.plugins-ml-memory-message,6qVpepfRSCi9bQF_As_t2A,1,0,7,0,53kb,53kb 3,green,open,.plugins-ml-memory-meta,LqP3QMaURNKYDZ9p8dTq3Q,1,0,2,0,44.8kb,44.8kb ``` -With output processors, the agent returns the following: -``` +The output processors transform the verbose CSV output into a clean, readable format by: + +1. **`regex_replace`**: Removing the CSV header row. +2. **`regex_capture`**: Extracting only essential information (row number, health, status, and index name). + +With output processors, the agent returns clean, formatted data with only essential index information: + +```cs 1,green,open,.plugins-ml-model-group 2,green,open,.plugins-ml-memory-message 3,green,open,.plugins-ml-memory-meta ``` -The output processors transform the verbose CSV output into a clean, readable format by: -1. **`regex_replace`**: Removing the CSV header row. -2. **`regex_capture`**: Extracting only essential information (row number, health, status, and index name). - ## Example usage with models -The following examples demonstrate how to use processor chains with models during prediction calls. +The following examples demonstrate how to use processor chains with models during Predict API calls. ### Example: Input processors @@ -440,12 +536,13 @@ POST _plugins/_ml/models/{model_id}/_predict } } ``` +{% include copy-curl.html %} In this example, the `regex_replace` processor modifies the prompt before it's sent to the model, changing "100 words" to "20 words". ### Example: Output processors -This example shows you how to process model output using `output_processors` to extract and format JSON data: +This example shows you how to process model output using `output_processors` to extract and format JSON data. In this example, the output processors first extract the content from the model response using JSONPath. Then they parse and extract the JSON object from the text response: ```json POST _plugins/_ml/models/{model_id}/_predict @@ -485,12 +582,10 @@ POST _plugins/_ml/models/{model_id}/_predict } } ``` +{% include copy-curl.html %} -In this example, the output processors: -1. Extract the content from the model response using JSONPath. -2. Parse and extract the JSON object from the text response. +Without output processors, the raw response contains the full model output with extensive metadata and nested structure: -Without output processors, the raw response would be the following: ```json { "inference_results": [ @@ -542,7 +637,8 @@ Without output processors, the raw response would be the following: } ``` -With output processors, the response becomes the following: +With output processors, the response is simplified to contain only the extracted and parsed JSON data: + ```json { "inference_results": [ From 5fb00b61a3bb88044db50cc6ed9f8a80cdf56279 Mon Sep 17 00:00:00 2001 From: Nathan Bower Date: Mon, 13 Oct 2025 14:25:48 -0400 Subject: [PATCH 12/12] Update _ml-commons-plugin/processor-chain.md Signed-off-by: Nathan Bower --- _ml-commons-plugin/processor-chain.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_ml-commons-plugin/processor-chain.md b/_ml-commons-plugin/processor-chain.md index 59257f6c156..fc4a542ce82 100644 --- a/_ml-commons-plugin/processor-chain.md +++ b/_ml-commons-plugin/processor-chain.md @@ -584,7 +584,7 @@ POST _plugins/_ml/models/{model_id}/_predict ``` {% include copy-curl.html %} -Without output processors, the raw response contains the full model output with extensive metadata and nested structure: +Without output processors, the raw response contains the full model output with extensive metadata and a nested structure: ```json {