[Rag] Add new embedding parameters (openvinotoolkit#2932)

as-suvorov · apaniukov · web-flow · commit 3c2f3e1eeef5 · 2025-10-31T15:05:22.000Z
## Description
&lt;!-- Please include a summary of the change. Also include relevant
motivation and context. --&gt;
Add new embedding pipeline parameters to documentation

&lt;!--- Jira ticket number (e.g., 123). Delete if there's no ticket. Don't
include full link or project name. --&gt;
Ticket: 174069

## Checklist:
- [x] I have made corresponding changes to the documentation

---------

Co-authored-by: Artur Paniukov &lt;chgk1101@gmail.com&gt;
diff --git a/site/docs/use-cases/text-embedding/_sections/_usage_options/index.mdx b/site/docs/use-cases/text-embedding/_sections/_usage_options/index.mdx
@@ -10,13 +10,38 @@ Text embedding models support different pooling strategies to aggregate token em
 
 - `CLS`: Use the first token embedding (default for many models)
 - `MEAN`: Average all token embeddings
+- `LAST_TOKEN`: Use the last token embedding
 
 You can set the pooling strategy via the `pooling_type` parameter.
 
 ### L2 Normalization
 
 L2 normalization can be applied to the output embeddings for improved retrieval performance. Enable it with the `normalize` parameter.
 
+### Input Size and Padding
+
+You can control how input texts are tokenized and padded:
+
+- `max_length`: Maximum length of tokens passed to the embedding model. Longer texts will be truncated.
+- `pad_to_max_length`: If `true`, model input tensors are padded to the maximum length.
+- `padding_side`: Side to use for padding (`"left"` or `"right"`).
+
+### Batch Size Configuration
+
+The `batch_size` parameter is useful for optimizing performance during database population:
+
+- When set, the pipeline fixes the model shape for inference optimization.
+- The number of documents passed to the pipeline must equal `batch_size`.
+- For query embeddings, set `batch_size=1` or leave it unset.
+
+### Fixed Shape Optimization
+
+Setting `batch_size`, `max_length`, and `pad_to_max_length=true` together will fix the model shape for optimal inference performance.
+
+:::info
+Fixed shapes are required for NPU device inference.
+:::
+
 ### Query and Embed Instructions
 
 Some models support special instructions for queries and documents. Use `query_instruction` and `embed_instruction` to provide these if needed.
@@ -32,6 +57,10 @@ Some models support special instructions for queries and documents. Use `query_i
             "CPU",
             pooling_type=ov_genai.TextEmbeddingPipeline.PoolingType.MEAN,
             normalize=True,
+            max_length=512,
+            pad_to_max_length=True,
+            padding_side="left",
+            batch_size=4,
             query_instruction="Represent this sentence for searching relevant passages: ",
             embed_instruction="Represent this passage for retrieval: "
         )
@@ -45,6 +74,10 @@ Some models support special instructions for queries and documents. Use `query_i
             "CPU",
             ov::genai::pooling_type(ov::genai::TextEmbeddingPipeline::PoolingType::MEAN),
             ov::genai::normalize(true),
+            ov::genai::max_length(512),
+            ov::genai::pad_to_max_length(true),
+            ov::genai::padding_side("left"),
+            ov::genai::batch_size(4),
             ov::genai::query_instruction("Represent this sentence for searching relevant passages: "),
             ov::genai::embed_instruction("Represent this passage for retrieval: ")
         );