Use multiple logical devices to handle ClientGenerateBatchProcess #1403

dezhiAmd · 2025-05-07T00:16:01Z

Why
Decode process is not using the full capacity of GPU. The goal of this change is to add multiple logical devices so that multiple request/ClientGenerateBatchProcess can be handled by logical devices in a round-robin fashion
How

By setting environment variable "SHORTFIN_AMDGPU_LOGICAL_DEVICES_PER_PHYSICAL_DEVICE" as number of workers specified from input arguments, (For example --workers 2), system can create one logical device for each worker
Add generate_count to class ClientGenerateBatchProcess to cache the number of total requests, use it to control which device to select

Signed-off-by: dezhliao <[email protected]>

This reverts commit b9b74f7.

Signed-off-by: dezhliao <[email protected]>

…n logical device for now Signed-off-by: dezhliao <[email protected]>

…evices Signed-off-by: dezhliao <[email protected]>

Signed-off-by: dezhliao <[email protected]>

…est coming in Signed-off-by: dezhliao <[email protected]>

Signed-off-by: dezhliao <[email protected]>

…n select logical device Signed-off-by: dezhliao <[email protected]>

Signed-off-by: dezhliao <[email protected]>

vinayakdsci

cc @stellaraccident @daveliddell.

@dezhiAmd there is an assumption in the code that we should just always queue kernel invocations on both streams akin to replicating it. I don't think that is what we want to do. Round-robin assignments have the potential to be expensive especially when the server starts receiving many requests at the same time.

There are a lot of ways multiple streams can be used when running on the same physical device, that can make execution much faster than the conventional replicate-and-invoke-identical-work idea.
IMO, the underlying idea behind the patch is correct, but I would re-think the implementation. Ideally it should not make assumptions, and we could build a framework that allows us to do smarter queuing onto the streams.

We also need to have a safety mechanism in place that ensures that we do not cross address boundaries in case a user runs with multiple physical devices visible to the System.

Signed-off-by: dezhliao <[email protected]>

dezhliao and others added 15 commits April 24, 2025 09:35

use number of workers as number of logical device per physical device

0c5e577

Signed-off-by: dezhliao <[email protected]>

import os

a535c2f

Signed-off-by: dezhliao <[email protected]>

fix error Duplicate device in Scheduler

b9b74f7

Signed-off-by: dezhliao <[email protected]>

Merge branch 'main' into logical_device_is_worker

8710589

Revert "fix error Duplicate device in Scheduler"

70f7973

This reverts commit b9b74f7.

including instance_topology_address to uniquely identify a device

f73330f

Signed-off-by: dezhliao <[email protected]>

Merge branch 'logical_device_is_worker' into use_topology

eb3eb00

create multiple logical devices on one physical device but only use o…

22e916d

…n logical device for now Signed-off-by: dezhliao <[email protected]>

add worker_index to select one logical device from multiple logical d…

aca4362

…evices Signed-off-by: dezhliao <[email protected]>

prepare for using devices in a round-robin way

6c0c6e7

Signed-off-by: dezhliao <[email protected]>

log add_to_queue / remove_from_queue when in debug when new http requ…

1611daf

…est coming in Signed-off-by: dezhliao <[email protected]>

add benchmark_client.py to llm application

21a2771

Signed-off-by: dezhliao <[email protected]>

Merge branch 'benchmark_client' into ping_pong

292c48b

switch workers/devices in generate.py

1f0dce5

Signed-off-by: dezhliao <[email protected]>

Add ClientGenerateBatchProcess.generate_count, use this ro round-robi…

0c118bd

…n select logical device Signed-off-by: dezhliao <[email protected]>

dezhiAmd force-pushed the ping_pong branch 2 times, most recently from 9cfade7 to 9b096a3 Compare May 7, 2025 00:30

dezhiAmd added 5 commits May 6, 2025 17:36

fix typo

9b096a3

Signed-off-by: dezhliao <[email protected]>

remove benchmark_client.py

5a19120

Signed-off-by: dezhliao <[email protected]>

fix format

01d4366

Signed-off-by: dezhliao <[email protected]>

Merge branch 'upstream_main' into ping_pong

345cb7d

reformat

83844c4

Signed-off-by: dezhliao <[email protected]>

vinayakdsci reviewed May 7, 2025

View reviewed changes

fix a typo

f91059c

Signed-off-by: dezhliao <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use multiple logical devices to handle ClientGenerateBatchProcess #1403

Use multiple logical devices to handle ClientGenerateBatchProcess #1403

Uh oh!

dezhiAmd commented May 7, 2025

Uh oh!

vinayakdsci left a comment •

edited

Loading

Uh oh!

Uh oh!

Use multiple logical devices to handle ClientGenerateBatchProcess #1403

Are you sure you want to change the base?

Use multiple logical devices to handle ClientGenerateBatchProcess #1403

Uh oh!

Conversation

dezhiAmd commented May 7, 2025

Uh oh!

vinayakdsci left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vinayakdsci left a comment •

edited

Loading