Skip to content

Commit 770ada5

Browse files
committed
Apply suggestions from code review
Co-authored-by: Maroon Ayoub <Maroonay@gmail.com> Signed-off-by: Burak Sekili <32663655+buraksekili@users.noreply.github.com> fix duplicate package Signed-off-by: Burak Sekili <32663655+buraksekili@users.noreply.github.com>
1 parent 07b3f34 commit 770ada5

File tree

1 file changed

+4
-6
lines changed

1 file changed

+4
-6
lines changed

docs/deployment/setup.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,13 @@ This guide provides a complete walkthrough for setting up and testing the exampl
55
By following this guide, you will:
66

77
1. **Deploy the Infrastructure**: Use Helm to set up:
8-
- vLLM servers (default: 4 replicas) serving Llama 3.1 8B Instruct model
8+
- vLLM nodes with LMCache CPU offloading (4 replicas) serving Llama 3.1 8B Instruct model
99
- Redis server
10-
- LMCache
1110
2. **Test with Example Application**: Run a Go application that:
1211
- Connects to your deployed vLLM and Redis infrastructure,
1312
- Demonstrates KV cache indexing by processing a sample prompt
14-
- Shows how to retrieve pod scores
1513

16-
The end result is a working distributed LLM system with intelligent KV cache management that can route requests to pods with relevant cached computations.
14+
The demonstrated KV-cache indexer is utilized for AI-aware routing to accelerate inference across the system through minimizing redundant computation.
1715

1816
## vLLM Deployment
1917

@@ -95,13 +93,13 @@ Ensure you have a running deployment with vLLM and Redis as described above.
9593

9694
The vLLM node can be tested with the prompt found in `examples/kv-cache-index/main.go`.
9795

98-
First, ensure that the tokenizer engine is available locally for our example:
96+
First, download the tokenizer bindings required by the `kvcache.Indexer` for prompt tokenization:
9997

10098
```bash
10199
make download-tokenizer
102100
```
103101

104-
This will download the tokenizer engine.
102+
Then, set the required environment variables and run example:
105103

106104
```bash
107105
export HF_TOKEN=<token>

0 commit comments

Comments
 (0)