Skip to content

[shortfin][LLM] Support multiple HIP streams #1391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: chai_demo
Choose a base branch
from

Conversation

vinayakdsci
Copy link
Contributor

Adds support for running multiple instances of an LLM service on a distinct HIP stream.

@vinayakdsci vinayakdsci force-pushed the multiple-hip-streams branch from cff2152 to 8a86e90 Compare May 5, 2025 19:28
self.decode_fiber = self.sysman.ls.create_fiber(self.main_worker)
self.main_fiber = self.fiber_pool.fibers[0]
self.prefill_fiber = self.fiber_pool.fibers[0]
self.decode_fiber = self.fiber_pool.fibers[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assigns the same fiber to main, prefill, and decode where previously they were on different fibers. I had a discussion with Stephen and there is a high likelihood that me want to revert the fiber pool work as its current implementation actually slows down processing.

fibers = []
assigned_hal_device = self.sysman.ls.devices[self.service_idx]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to grab a different device instead of a different stream. I assume this is related to the environment variables above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my understanding is that there is no stream object modeled in shortfin directly. Instead, you use multiple streams by creating multiple logical logical devices for a physical device, which you do by turning on that environment variable. Each logical device creates its own stream.

This looks like another place we're assuming we have only one physical device. I hope that's ok. In this case, you'll only see device 0 if logical devices are off, as we're assuming that there can't be a second physical device.

@@ -49,6 +50,10 @@ class ShortfinLlmLifecycleManager:

def __init__(self, args):
# Load server configuration with priority: command line > config file > defaults
if args.enable_multiple_hip_streams:
os.environ["SHORTFIN_AMDGPU_LOGICAL_DEVICES_PER_PHYSICAL_DEVICE"] = str(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include a link to where this variable is used? Having multiple logical devices adds some complexity. We should also note whether this approach fully replicates allocated buffers or if we are able to access resources across streams.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change come from PR authored by me. This link here can answer your question about where this variable is used.

@@ -68,6 +69,12 @@ def parse_args(argv):
parser.add_argument(
"--instances", type=int, default=1, help="Number of shortfin instances to run"
)
parser.add_argument(
"--enable_multiple_hip_streams",
action=BooleanOptionalAction,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use argparse.BooleanOptionalAction rather than including the additional import.

Copy link
Contributor

@daveliddell daveliddell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Comment on lines +81 to +86
if (
args.enable_multiple_hip_streams
and len(sysman.ls.devices) != args.instances
):
raise RuntimeError("Failed to start multiple HIP streams for execution.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this check and error message is intended for internal developers, not users, is that right?

It looks like it's assuming that ROCR_VISIBLE_DEVICES has been filtered to a single physical device. Do we check that anywhere? If you set the number of physical devices > 1, then the error message may be misleading. If you set the number of physical devices to the same as the number of instances and multiple HIP streams fails, you fail to detect that multiple HIP streams failed. Regardless, if a user ever sees this error message, they won't know what to do about it. Maybe we don't care at this point.

fibers = []
assigned_hal_device = self.sysman.ls.devices[self.service_idx]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my understanding is that there is no stream object modeled in shortfin directly. Instead, you use multiple streams by creating multiple logical logical devices for a physical device, which you do by turning on that environment variable. Each logical device creates its own stream.

This looks like another place we're assuming we have only one physical device. I hope that's ok. In this case, you'll only see device 0 if logical devices are off, as we're assuming that there can't be a second physical device.

@@ -49,6 +50,10 @@ class ShortfinLlmLifecycleManager:

def __init__(self, args):
# Load server configuration with priority: command line > config file > defaults
if args.enable_multiple_hip_streams:
Copy link

@dezhiAmd dezhiAmd May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, this if check is not needed.
When you set "SHORTFIN_AMDGPU_LOGICAL_DEVICES_PER_PHYSICAL_DEVICE" more than 1, you automatically created multiple hip streams.

This is what I see:
logical_devices_per_physical_device is set using an env variable or a config file. Refer here

logical_devices_per_physical_device_ is used to create device, refer here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants