-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: refactor onedal interaction with backend and policies #2168
base: main
Are you sure you want to change the base?
chore: refactor onedal interaction with backend and policies #2168
Conversation
3e22a7e
to
6e6b6ce
Compare
i guess this contrasts the approach i wanted to take in #2158 . what i had not liked about _get_backend was the indirection it was inducing in accessing the pybind11 generated package, and hoped to bring native syntax back. maybe worth further discussion? |
/intelci: run |
a3e1c8e
to
0aa1669
Compare
/intelci: run |
4c87853
to
f0d2da7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial quick comments, will look more in depth later
onedal/spmd/ensemble/__init__.py
Outdated
@@ -14,6 +14,6 @@ | |||
# limitations under the License. | |||
# ============================================================================== | |||
|
|||
from .forest import RandomForestClassifier, RandomForestRegressor | |||
from onedal.ensemble import RandomForestClassifier, RandomForestRegressor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from onedal.ensemble import RandomForestClassifier, RandomForestRegressor | |
from .ensemble import RandomForestClassifier, RandomForestRegressor |
In general we have been using relative paths in cases like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My branch deletes forest.py
and we're using the batch version directly. After removing EstimatorBaseSPMD
there is no need to create child classes from the batch version. This change gets rid of a lot of code that doesn't really do anything and only makes it harder to understand what's happening.
https://github.com/intel/scikit-learn-intelex/blob/main/onedal/spmd/ensemble/forest.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevertheless I have added relative imports consistently (...ensemble
in this case)
f0d2da7
to
79a3108
Compare
79a3108
to
a820e94
Compare
/intelci: run |
|
More checks required for SPMD changes, converting back to draft. But the non-SPMD contributions are still ready for review |
/intelci: run |
1 similar comment
/intelci: run |
60cbc71
to
861ce67
Compare
/intelci: run |
/intelci: run |
/intelci: run |
/intelci: run |
/intelci: run |
/intelci: run |
/intelci: run |
/intelci: run |
Pre-commit only failed in a single spot due to bad luck with RNG. Not gonna re-run. |
…nedal-backend-and-policies
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't dug into the nitty-gritty details yet, this was my first overview look, will provide more reviews in the next days.
/intelci: run |
/intelci: run |
Refined description
Remove
policy
from algorithm implementations.policy
. But it puts it into a single place: The newly addedBackendFunction
class.The device to run on is now controlled via the
queue
keyword. There is now a differentiation between user-facing functions (the API; e.g.model.fit()
,model.predict()
, ...), and internal methods (e.g.self._fit()
,self.compute()
, ...).queue=None
keyword argument, combined with a new decorator@supports_queue
.@supports_queue
decorator implements the logic that will determine the correct queue. If a queue is provided in the keyword argument, it will be used. Otherwise the queue will be determined from theconfig_context
and the provided data, as configured viatarget_offload
. This behavior is unchanged from the previous implementation. It was simply refactored.SyclQueueManager
namespace. The queue can be accessed from any internal method viaSyclQueueManager.get_global_queue()
. And in case local overloading of the queue is necessary, an execution contextSyclQueueManager.manage_global_config(queue_or_None, data_or_None)
is provided.queue
keyword argument. They should rather collect the global queue fromSyclQueueManager
(see above).Rather than searching for module and function names with strings via
self._get_backend(module_name, submodule_name, function_name, *args, **kwargs)
, we explicitly bind methods from the various backends using@bind_default_backend(module_name)
: Use the available default backend (host or dpc).@bind_spmd_backend(module_name)
: Use SPMD backend.The decorators are used to decorate the existing function. For example
The module and submodule name is simply provided as
"kmeans.clustering"
, as it would be in an import. This is not mandatory though and in other places a more genericdef foo(self, *args, **kwargs)
was used.It is useful but not mandatory to provide the correct function prototype with all arguments, as this will be picked up by linters and code completion tools.
In case of a name conflict, the decorator takes a keyword argument. For example
The cleaner solution would obviously be to rename the symbols in the backend to something that doesn't conflict, like
_compute
. This can be done in a future PR.A comment on the global queue: It is an imperfect attempt to reduce coupling between classes. Before refactoring we carefully crafted policies and queues in some classes (for example in the
sklearnex
module), so that the computation wouldn't crash after dispatching toonedal
or evenonedal.spmd
. There was a strong coupling between the modulessklearnex->onedal->onedal.spmd
. The situation improved in the sense that we are now able to perform some of the "temporary overloading of policies/queues" much closer to the part of the code that requires it. That is, rather than crafting a policy insklearnex
, we now just have a modification locally inonedal.spmd
. This is much easier to understand and maintain.In future, we should nevertheless try to eliminate the global queue variable, as it also can produce hard to debug issues, especially when the queue ends up in some kind of messed up state. The existing tests helped a lot to iron out many issues, but there's still room for ugly things to happen.
Original description
@abstractmethod
and then populated using a decorator loaded from_backend.py
.self.infer(...)
rather than the convolutedself._get_backend("X", "Y", "infer" , ...)
.compute()
,infer()
, ...) on backend are unified in_backend.py
.Backend
wrapper class around the backend export inonedal/__init__.py
to reduce the use of global variables.BackendManager
,PolicyManager
,BackendFunction
that simplify interfacing with pybind/C++ backend functionality.I removed unused imports because there were so many in the files I touched. I didn't stop at the files I modified and did the entire repo, which messed up the diff a little. Sorry. I am constantly rebasing so that all functionality is in a single commit and you can look at the diff of this single commit instead if you want to review.Test update (WIP): Enables tests for iris and diabetes SVM algos on GPU. Updated dispatching seems to support GPU just fine. Feedback requested! @samir-nasibliPR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance