Skip to content

Commit 58d6716

Browse files
authored
Add unique random seed to worker (#340)
When using random data generator, it's expected to see almost 0 prefix cache hit rate. But during a p/d benchmark run, I checked the vllm debug log, the prefix cache hit rate is always 90%+ ``` (APIServer pid=1) DEBUG 02-02 23:32:28 [v1/metrics/loggers.py:248] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 92.9%, External prefix cache hit rate: 0.0% (APIServer pid=1) DEBUG 02-02 23:32:38 [v1/metrics/loggers.py:248] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 92.9%, External prefix cache hit rate: 0.0% (APIServer pid=1) DEBUG 02-02 23:32:48 [v1/metrics/loggers.py:248] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 92.9%, External prefix cache hit rate: 0.0% (APIServer pid=1) DEBUG 02-02 23:32:58 [v1/metrics/loggers.py:248] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 92.9%, External prefix cache hit rate: 0.0% ``` A prefix hit rate graph for the benchmark run: <img width="1056" height="702" alt="image" src="https://github.com/user-attachments/assets/428872d1-5baa-4872-a1b6-9f5e9308a11a" /> The fix is to create unique random seed per worker, and add a call to np.random.seed before starting the loop. Validated the fix locally with multiple benchmark runs.
1 parent e6ba4c7 commit 58d6716

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

inference_perf/loadgen/load_generator.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,11 @@ async def schedule_client(
195195
logger.debug(f"[Worker {self.id}] stopped")
196196

197197
def run(self) -> None:
198+
# Seed with current time + worker id to ensure unique random sequences per worker
199+
seed = (int(time.time() * 1000) + self.id) % 2**32
200+
np.random.seed(seed)
201+
logger.debug(f"[Worker {self.id}] seeded numpy with {seed}")
202+
198203
# Ignore SIGINT in workers to prevent multiple calls to SIGINT handler
199204
signal.signal(signal.SIGINT, signal.SIG_IGN)
200205
set_event_loop_policy(uvloop.EventLoopPolicy())

0 commit comments

Comments
 (0)