Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions doc/sources/algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -473,6 +473,10 @@ Classification
- ``criterion`` != `'gini'`
- ``oob_score`` = `True`
- ``sample_weight`` != `None`

**Additional parameters:**

- ``local_trees_mode`` (bool, default=False): Enables local trees mode for distributed training. ``n_estimators`` is per rank, with isolated learning occurring on each processor before merging into a single model. This mode is experimental but scales better than default. This parameter is specific to the SPMD implementation and is not present in the standard scikit-learn API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this is not very descriptive.

  • Does it mean that the result has n_estimators*n_ranks trees?
  • Does the data get moved across ranks, or does each rank use the data that it owns?
  • Maybe could also refer to them as 'rank/nodes' as otherwise it might not be immediately clear what a 'rank' here refers to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we could point to oneDAL docs, where this functionality was implemented. @Alexandr-Solovev can we get this documented in oneDAL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a JIRA task for update the docs, because for now we mention this parameter in onedal only in one place here:
https://github.com/uxlfoundation/oneDAL/blob/151df1d4b1e9c41b51997bc20a0544ada2bd51ec/cpp/oneapi/dal/algo/decision_forest/common.hpp#L519
But if it helps, I also can clarify David's questions:
Does it mean that the result has n_estimators*n_ranks trees? - No, oneDAL will split the n_trees across the ranks internally.
Does the data get moved across ranks, or does each rank use the data that it owns? - Each rank uses just local data(data that it owns).

- Multi-output and sparse data are not supported
* - :obj:`sklearn.ensemble.ExtraTreesClassifier`
- All parameters are supported except:
Expand Down Expand Up @@ -525,6 +529,10 @@ Regression
- ``criterion`` != `'mse'`
- ``oob_score`` = `True`
- ``sample_weight`` != `None`

**Additional parameters:**

- ``local_trees_mode`` (bool, default=False): Enables local trees mode for distributed training. `n_estimators` is per rank, with isolated learning occurring on each processor before merging into a single model. This mode is experimental but scales better than default. This parameter is specific to the SPMD implementation and is not present in the standard scikit-learn API.
- Multi-output and sparse data are not supported
* - :obj:`sklearn.ensemble.ExtraTreesRegressor`
- All parameters are supported except:
Expand Down
Loading