DOC: Add visible warning about not using LogisticRegression in parallel threads#2860
DOC: Add visible warning about not using LogisticRegression in parallel threads#2860david-cortes-intel wants to merge 4 commits intouxlfoundation:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
CI failure is a rate limiting issue with wikipedia links. |
|
/azp run |
|
Azure Pipelines successfully started running 3 pipeline(s). |
If you need to trigger only one pipeline, it'd be 'azp run Docs'. This PR only changes documentation BTW. |
|
Should also be extended to LogisticRegressionCV after #2871 is merged. |
| - Sparse data is not supported. | ||
| - Solver ``'newton-cg'`` is **only** available in :doc:`preview mode <preview>`. | ||
| - Solver ``'newton-cg'`` is **only** available in :doc:`preview mode <preview>`. **Important:** this estimator should not be used | ||
| in parallel Python threads - for concurrent fits (e.g. from :obj:`sklearn.model_selection.GridSearchCV`), |
There was a problem hiding this comment.
Can we use other estimators inside GridSearchCV? Should this warning be common for estimators?
There was a problem hiding this comment.
Yes, other estimators can be used in GridSearchCV with any parallelism backend. We have a whole section about it:
https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/doc/sources/parallelism.rst
| :obj:`sklearn.linear_model.LogisticRegressionCV` in particular are not | ||
| suitable for parallel calls in Python threads regardless of the ``n_jobs`` parameter. | ||
| Attempting to fit multiple logistic regression estimator objects in parallel | ||
| might result in crashes and incorrect estimations. |
There was a problem hiding this comment.
I think more likely it will result in bad performance
There was a problem hiding this comment.
LogisticRegression overwrites a global variable during its execution:
Other estimators don't do that.
Description
Logistic regression in sklearnex is implemented differently from the rest. Turns out that it does on-the-fly replacements of sklearn module attributes that have effects beyond calls to sklearnex and are not thread-safe:
scikit-learn-intelex/daal4py/sklearn/linear_model/logistic_path.py
Line 348 in 7925e88
This PR adds a visible warning not to use logistic regression in parallel python threads.
Checklist:
Completeness and readability
Testing