-
Notifications
You must be signed in to change notification settings - Fork 49
[shortfin][LLM] Support multiple HIP streams #1391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: chai_demo
Are you sure you want to change the base?
Conversation
cff2152
to
8a86e90
Compare
self.decode_fiber = self.sysman.ls.create_fiber(self.main_worker) | ||
self.main_fiber = self.fiber_pool.fibers[0] | ||
self.prefill_fiber = self.fiber_pool.fibers[0] | ||
self.decode_fiber = self.fiber_pool.fibers[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assigns the same fiber to main
, prefill
, and decode
where previously they were on different fibers. I had a discussion with Stephen and there is a high likelihood that me want to revert the fiber pool work as its current implementation actually slows down processing.
fibers = [] | ||
assigned_hal_device = self.sysman.ls.devices[self.service_idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to grab a different device instead of a different stream. I assume this is related to the environment variables above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my understanding is that there is no stream object modeled in shortfin directly. Instead, you use multiple streams by creating multiple logical logical devices for a physical device, which you do by turning on that environment variable. Each logical device creates its own stream.
This looks like another place we're assuming we have only one physical device. I hope that's ok. In this case, you'll only see device 0 if logical devices are off, as we're assuming that there can't be a second physical device.
@@ -49,6 +50,10 @@ class ShortfinLlmLifecycleManager: | |||
|
|||
def __init__(self, args): | |||
# Load server configuration with priority: command line > config file > defaults | |||
if args.enable_multiple_hip_streams: | |||
os.environ["SHORTFIN_AMDGPU_LOGICAL_DEVICES_PER_PHYSICAL_DEVICE"] = str( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include a link to where this variable is used? Having multiple logical devices adds some complexity. We should also note whether this approach fully replicates allocated buffers or if we are able to access resources across streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -68,6 +69,12 @@ def parse_args(argv): | |||
parser.add_argument( | |||
"--instances", type=int, default=1, help="Number of shortfin instances to run" | |||
) | |||
parser.add_argument( | |||
"--enable_multiple_hip_streams", | |||
action=BooleanOptionalAction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use argparse.BooleanOptionalAction
rather than including the additional import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
if ( | ||
args.enable_multiple_hip_streams | ||
and len(sysman.ls.devices) != args.instances | ||
): | ||
raise RuntimeError("Failed to start multiple HIP streams for execution.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this check and error message is intended for internal developers, not users, is that right?
It looks like it's assuming that ROCR_VISIBLE_DEVICES
has been filtered to a single physical device. Do we check that anywhere? If you set the number of physical devices > 1, then the error message may be misleading. If you set the number of physical devices to the same as the number of instances and multiple HIP streams fails, you fail to detect that multiple HIP streams failed. Regardless, if a user ever sees this error message, they won't know what to do about it. Maybe we don't care at this point.
fibers = [] | ||
assigned_hal_device = self.sysman.ls.devices[self.service_idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my understanding is that there is no stream object modeled in shortfin directly. Instead, you use multiple streams by creating multiple logical logical devices for a physical device, which you do by turning on that environment variable. Each logical device creates its own stream.
This looks like another place we're assuming we have only one physical device. I hope that's ok. In this case, you'll only see device 0 if logical devices are off, as we're assuming that there can't be a second physical device.
@@ -49,6 +50,10 @@ class ShortfinLlmLifecycleManager: | |||
|
|||
def __init__(self, args): | |||
# Load server configuration with priority: command line > config file > defaults | |||
if args.enable_multiple_hip_streams: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, this if check is not needed.
When you set "SHORTFIN_AMDGPU_LOGICAL_DEVICES_PER_PHYSICAL_DEVICE" more than 1, you automatically created multiple hip streams.
This is what I see:
logical_devices_per_physical_device is set using an env variable or a config file. Refer here
logical_devices_per_physical_device_ is used to create device, refer here.
Adds support for running multiple instances of an LLM service on a distinct HIP stream.