-
Notifications
You must be signed in to change notification settings - Fork 192
Refactor providers into separate libraries #1190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Details: Add a DML DeviceInterface and DML DeviceBuffer handler. Remove #if blocks that are doing memory copies between device/cpu memory and use the DeviceSpan interface.
Remove as many #if USE_CUDA/USE_DML as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you enumerate the places where any #ifdefs remain and why they need to be there please
And what impact will the rough edges have and can they be smoothed before you merge this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass through the code
There are some #if USE_CUDA in our tests, this shouldn't be a problem The rough edges are just expected simple bugs we'll find and easily fix that I can't find in advance. |
Before the device interface was introduced in #1190, the dml objects were tied to the model. The device interface abstraction decoupled the device specific objects and the `OgaModel`. For dml, this meant that the dml objects now lived in a global scope (they were previously owned by the `OgaModel` and hence had the Model scope). These dml objects upon instantiation create background threads that retain hardware resources and prevent the driver threads from terminating. Since these are now in a global scope, the background threads continue living beyond the lifetime of the Model and can cause issues since driver threads may be able to terminate correctly leading to issues in application layers. Another pull-request #1378 made it so that device allocators are cached and tied to a global ort session. As a result, this device allocator is also linked to the dml objects. Making it hard to control the lifetime of the dml objects. This pull request special cases the dml device type so that it destroys all linked globally scoped variables when the model is destroyed and re-creates them when a new model is initialized. This way, the dml threads terminate when the model is destroyed and release driver threads so they can do their own thing.
Before the device interface was introduced in #1190, the dml objects were tied to the model. The device interface abstraction decoupled the device specific objects and the `OgaModel`. For dml, this meant that the dml objects now lived in a global scope (they were previously owned by the `OgaModel` and hence had the Model scope). These dml objects upon instantiation create background threads that retain hardware resources and prevent the driver threads from terminating. Since these are now in a global scope, the background threads continue living beyond the lifetime of the Model and can cause issues since driver threads may be able to terminate correctly leading to issues in application layers. Another pull-request #1378 made it so that device allocators are cached and tied to a global ort session. As a result, this device allocator is also linked to the dml objects. Making it hard to control the lifetime of the dml objects. This pull request special cases the dml device type so that it destroys all linked globally scoped variables when the model is destroyed and re-creates them when a new model is initialized. This way, the dml threads terminate when the model is destroyed and release driver threads so they can do their own thing.
This removes most of the #if USE_CUDA and #if USE_DML blocks for the model handling code. Device memory management is also handled through the DeviceSpan structure and now all data copying is done in a device independent manner.
It's a huge change, and there will be some rough edges when submitted. Goal is to unblock other people needing the changes and then to make larger improvements in future prs.