Add unique random seed to worker (#340)

yangligt2 · web-flow · commit 58d6716b0013 · 2026-02-06T10:09:18.000-08:00
When using random data generator, it's expected to see almost 0 prefix cache hit rate. But during a p/d benchmark run, I checked the vllm debug log, the prefix cache hit rate is always 90%+ ``` (APIServer pid=1) DEBUG 02-02 23:32:28 [v1/metrics/loggers.py:248] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 92.9%, External prefix cache hit rate: 0.0% (APIServer pid=1) DEBUG 02-02 23:32:38 [v1/metrics/loggers.py:248] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 92.9%, External prefix cache hit rate: 0.0% (APIServer pid=1) DEBUG 02-02 23:32:48 [v1/metrics/loggers.py:248] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 92.9%, External prefix cache hit rate: 0.0% (APIServer pid=1) DEBUG 02-02 23:32:58 [v1/metrics/loggers.py:248] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 92.9%, External prefix cache hit rate: 0.0% ``` A prefix hit rate graph for the benchmark run: <img width="1056" height="702" alt="image" src="https://github.com/user-attachments/assets/428872d1-5baa-4872-a1b6-9f5e9308a11a" /> The fix is to create unique random seed per worker, and add a call to np.random.seed before starting the loop. Validated the fix locally with multiple benchmark runs.
diff --git a/inference_perf/loadgen/load_generator.py b/inference_perf/loadgen/load_generator.py
@@ -195,6 +195,11 @@ async def schedule_client(
         logger.debug(f"[Worker {self.id}] stopped")
 
     def run(self) -> None:
+        # Seed with current time + worker id to ensure unique random sequences per worker
+        seed = (int(time.time() * 1000) + self.id) % 2**32
+        np.random.seed(seed)
+        logger.debug(f"[Worker {self.id}] seeded numpy with {seed}")
+
         # Ignore SIGINT in workers to prevent multiple calls to SIGINT handler
         signal.signal(signal.SIGINT, signal.SIG_IGN)
         set_event_loop_policy(uvloop.EventLoopPolicy())