Commit c85e5a4
authored
Fix: requests get duplicated using shared_prefix datagen when multi-turn chat disabled (kubernetes-sigs#293)
### Background
A bug was causing requests to be duplicated when using the shared_prefix
with multi-turn chat disabled. This happened because the load generator
was creating a standalone request queue for each worker, and then
broadcasting each incoming request to all worker queues.
### Fix
The fix ensures that this standalone queue feature is disabled when
multi-turn chat is not active, preventing the duplication of requests.
### Credit
Credit for reporting this bug goes to @diamondburned1 parent 6eb0a5e commit c85e5a4
File tree
1 file changed
+1
-1
lines changed- inference_perf/datagen
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | | - | |
| 63 | + | |
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| |||
0 commit comments