Skip to content

Commit c85e5a4

Browse files
authored
Fix: requests get duplicated using shared_prefix datagen when multi-turn chat disabled (kubernetes-sigs#293)
### Background A bug was causing requests to be duplicated when using the shared_prefix with multi-turn chat disabled. This happened because the load generator was creating a standalone request queue for each worker, and then broadcasting each incoming request to all worker queues. ### Fix The fix ensures that this standalone queue feature is disabled when multi-turn chat is not active, preventing the duplication of requests. ### Credit Credit for reporting this bug goes to @diamondburned
1 parent 6eb0a5e commit c85e5a4

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

inference_perf/datagen/shared_prefix_datagen.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ def is_shared_prefix_supported(self) -> bool:
6060
return True
6161

6262
def is_prefered_worker_requested(self) -> bool:
63-
return True
63+
return True if self.enable_multi_turn_chat else False
6464

6565
def load_lazy_data(self, data: LazyLoadInferenceAPIData) -> InferenceAPIData:
6666
i = data.data_index % len(self.prompts)

0 commit comments

Comments
 (0)