Skip to content

Commit a7d2f1e

Browse files
natkeaciddelgado
andauthored
Update docs/genai/howto/migrate.md
Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com>
1 parent 0b706ff commit a7d2f1e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/genai/howto/migrate.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ In summary, the new API adds an `AppendTokens` function to the generator, which
1717

1818
Calling `AppendTokens` outside of the conversation loop can be used to implement system prompt caching.
1919

20-
Note: chat mode and system prompt caching is only supported when running on CPU, NVIDIA GPUs with the CUDA EP, and all GPUs with the Web GPU native EP. It is not supported on NPU or GPUs running with the DirecML EP. For Q&A mode, the migrations described below *are* required.
20+
Note: chat mode and system prompt caching are only supported for batch size 1. Furthermore, they are currently supported on CPU, NVIDIA GPUs with the CUDA EP, and all GPUs with the Web GPU native EP. They are not supported on NPU or GPUs running with the DirecML EP. For Q&A mode, the migrations described below *are* required.
2121

2222
## Python
2323

0 commit comments

Comments
 (0)