doc: local trees parameter documentation#2636
doc: local trees parameter documentation#2636ethanglaser wants to merge 5 commits intouxlfoundation:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
@ethanglaser The only section that I'm aware of where extra parameters are documented is here: The title of the doc section doesn't match at all with the contents, but perhaps you could put it there for now next to the other extra parameters of decision trees, and then later we can revisit the structuring of the docs. |
| increased with the ``max_bins`` parameter. | ||
|
|
||
| Another parameter that can improve performance at large scale for Random Forest, | ||
| specifically the ``sklearnex.spmd.ensemble`` ``RandomForestClassifier`` and |
There was a problem hiding this comment.
Could use links to the sklearn docs of the classes here, as done elsewhere - e.g. :obj:`sklearn.ensemble.RandomForestClassifier`
|
|
||
| **Additional parameters:** | ||
|
|
||
| - ``local_trees_mode`` (bool, default=False): Enables local trees mode for distributed training. ``n_estimators`` is per rank, with isolated learning occurring on each processor before merging into a single model. This mode is experimental but scales better than default. This parameter is specific to the SPMD implementation and is not present in the standard scikit-learn API. |
There was a problem hiding this comment.
I'd say this is not very descriptive.
- Does it mean that the result has
n_estimators*n_rankstrees? - Does the data get moved across ranks, or does each rank use the data that it owns?
- Maybe could also refer to them as 'rank/nodes' as otherwise it might not be immediately clear what a 'rank' here refers to.
There was a problem hiding this comment.
Ideally we could point to oneDAL docs, where this functionality was implemented. @Alexandr-Solovev can we get this documented in oneDAL?
There was a problem hiding this comment.
I will create a JIRA task for update the docs, because for now we mention this parameter in onedal only in one place here:
https://github.com/uxlfoundation/oneDAL/blob/151df1d4b1e9c41b51997bc20a0544ada2bd51ec/cpp/oneapi/dal/algo/decision_forest/common.hpp#L519
But if it helps, I also can clarify David's questions:
Does it mean that the result has n_estimators*n_ranks trees? - No, oneDAL will split the n_trees across the ranks internally.
Does the data get moved across ranks, or does each rank use the data that it owns? - Each rank uses just local data(data that it owns).
Description
Follow-up to #2615 (and uxlfoundation/oneDAL#3139). Adds documentation of additional parameter to SPMD forest estimators. Open to discussion on the best way to do this since I don't believe we have any prior references for this.
Checklist to comply with before moving PR from draft:
PR completeness and readability