Apply suggestions from code review

buraksekili · buraksekili · commit 770ada5305c8 · 2025-06-25T09:49:37.000+03:00
Co-authored-by: Maroon Ayoub &lt;Maroonay@gmail.com&gt;
Signed-off-by: Burak Sekili &lt;32663655+buraksekili@users.noreply.github.com&gt;

fix duplicate package

Signed-off-by: Burak Sekili &lt;32663655+buraksekili@users.noreply.github.com&gt;
diff --git a/docs/deployment/setup.md b/docs/deployment/setup.md
@@ -5,15 +5,13 @@ This guide provides a complete walkthrough for setting up and testing the exampl
 By following this guide, you will:
 
 1. **Deploy the Infrastructure**: Use Helm to set up:
-   - vLLM servers (default: 4 replicas) serving Llama 3.1 8B Instruct model
+   - vLLM nodes with LMCache CPU offloading (4 replicas) serving Llama 3.1 8B Instruct model
    - Redis server
-   - LMCache
 2. **Test with Example Application**: Run a Go application that:
    - Connects to your deployed vLLM and Redis infrastructure,
    - Demonstrates KV cache indexing by processing a sample prompt
-   - Shows how to retrieve pod scores
 
-The end result is a working distributed LLM system with intelligent KV cache management that can route requests to pods with relevant cached computations.
+The demonstrated KV-cache indexer is utilized for AI-aware routing to accelerate inference across the system through minimizing redundant computation.
 
 ## vLLM Deployment
 
@@ -95,13 +93,13 @@ Ensure you have a running deployment with vLLM and Redis as described above.
 
 The vLLM node can be tested with the prompt found in `examples/kv-cache-index/main.go`.
 
-First, ensure that the tokenizer engine is available locally for our example:
+First, download the tokenizer bindings required by the `kvcache.Indexer` for prompt tokenization:
 
 ```bash
 make download-tokenizer
 ```
 
-This will download the tokenizer engine.
+Then, set the required environment variables and run example:
 
 ```bash
 export HF_TOKEN=<token>