Skip to content

Commit 0b706ff

Browse files
natkeaciddelgado
andauthored
Update docs/genai/howto/migrate.md
Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com>
1 parent a7fd0cb commit 0b706ff

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/genai/howto/migrate.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Version 0.6.0 adds support for "chat mode", also known as _continuation_, _conti
1515

1616
In summary, the new API adds an `AppendTokens` function to the generator, which allows for multi-turn conversations. Previously, input was set in `GeneratorParams` prior to the creation of the generator.
1717

18-
Calling `AddTokens` outside of the loop also adds support for system prompt caching.
18+
Calling `AppendTokens` outside of the conversation loop can be used to implement system prompt caching.
1919

2020
Note: chat mode and system prompt caching is only supported when running on CPU, NVIDIA GPUs with the CUDA EP, and all GPUs with the Web GPU native EP. It is not supported on NPU or GPUs running with the DirecML EP. For Q&A mode, the migrations described below *are* required.
2121

0 commit comments

Comments
 (0)