Skip to content

Commit 3c2f3e1

Browse files
[Rag] Add new embedding parameters (openvinotoolkit#2932)
## Description <!-- Please include a summary of the change. Also include relevant motivation and context. --> Add new embedding pipeline parameters to documentation <!--- Jira ticket number (e.g., 123). Delete if there's no ticket. Don't include full link or project name. --> Ticket: 174069 ## Checklist: - [x] I have made corresponding changes to the documentation --------- Co-authored-by: Artur Paniukov <chgk1101@gmail.com>
1 parent 4013b9d commit 3c2f3e1

File tree

1 file changed

+33
-0
lines changed
  • site/docs/use-cases/text-embedding/_sections/_usage_options

1 file changed

+33
-0
lines changed

site/docs/use-cases/text-embedding/_sections/_usage_options/index.mdx

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,38 @@ Text embedding models support different pooling strategies to aggregate token em
1010

1111
- `CLS`: Use the first token embedding (default for many models)
1212
- `MEAN`: Average all token embeddings
13+
- `LAST_TOKEN`: Use the last token embedding
1314

1415
You can set the pooling strategy via the `pooling_type` parameter.
1516

1617
### L2 Normalization
1718

1819
L2 normalization can be applied to the output embeddings for improved retrieval performance. Enable it with the `normalize` parameter.
1920

21+
### Input Size and Padding
22+
23+
You can control how input texts are tokenized and padded:
24+
25+
- `max_length`: Maximum length of tokens passed to the embedding model. Longer texts will be truncated.
26+
- `pad_to_max_length`: If `true`, model input tensors are padded to the maximum length.
27+
- `padding_side`: Side to use for padding (`"left"` or `"right"`).
28+
29+
### Batch Size Configuration
30+
31+
The `batch_size` parameter is useful for optimizing performance during database population:
32+
33+
- When set, the pipeline fixes the model shape for inference optimization.
34+
- The number of documents passed to the pipeline must equal `batch_size`.
35+
- For query embeddings, set `batch_size=1` or leave it unset.
36+
37+
### Fixed Shape Optimization
38+
39+
Setting `batch_size`, `max_length`, and `pad_to_max_length=true` together will fix the model shape for optimal inference performance.
40+
41+
:::info
42+
Fixed shapes are required for NPU device inference.
43+
:::
44+
2045
### Query and Embed Instructions
2146

2247
Some models support special instructions for queries and documents. Use `query_instruction` and `embed_instruction` to provide these if needed.
@@ -32,6 +57,10 @@ Some models support special instructions for queries and documents. Use `query_i
3257
"CPU",
3358
pooling_type=ov_genai.TextEmbeddingPipeline.PoolingType.MEAN,
3459
normalize=True,
60+
max_length=512,
61+
pad_to_max_length=True,
62+
padding_side="left",
63+
batch_size=4,
3564
query_instruction="Represent this sentence for searching relevant passages: ",
3665
embed_instruction="Represent this passage for retrieval: "
3766
)
@@ -45,6 +74,10 @@ Some models support special instructions for queries and documents. Use `query_i
4574
"CPU",
4675
ov::genai::pooling_type(ov::genai::TextEmbeddingPipeline::PoolingType::MEAN),
4776
ov::genai::normalize(true),
77+
ov::genai::max_length(512),
78+
ov::genai::pad_to_max_length(true),
79+
ov::genai::padding_side("left"),
80+
ov::genai::batch_size(4),
4881
ov::genai::query_instruction("Represent this sentence for searching relevant passages: "),
4982
ov::genai::embed_instruction("Represent this passage for retrieval: ")
5083
);

0 commit comments

Comments
 (0)