Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] add versionadded notes for v4.0.0 features #5948

Merged
merged 8 commits into from
Jul 6, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/Parallel-Learning-Guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,8 @@ You could edit your firewall rules to allow communication between any of the wor
Using Custom Objective Functions with Dask
******************************************

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


It is possible to customize the boosting process by providing a custom objective function written in Python.
See the Dask API's documentation for details on how to implement such functions.

Expand Down
12 changes: 12 additions & 0 deletions docs/Parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,8 @@ Core Parameters

- ``goss``, Gradient-based One-Side Sampling

- *New in 4.0.0*
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- ``data`` :raw-html:`<a id="data" title="Permalink to this parameter" href="#data">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string, aliases: ``train``, ``train_data``, ``train_data_file``, ``data_filename``

- path of training data, LightGBM will train from this data
Expand Down Expand Up @@ -670,6 +672,8 @@ Learning Control Parameters

- **Note**: can be used only with ``device_type = cpu``

- *New in version 4.0.0*
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- ``num_grad_quant_bins`` :raw-html:`<a id="num_grad_quant_bins" title="Permalink to this parameter" href="#num_grad_quant_bins">&#x1F517;&#xFE0E;</a>`, default = ``4``, type = int

- number of bins to quantization gradients and hessians
Expand All @@ -678,6 +682,8 @@ Learning Control Parameters

- **Note**: can be used only with ``device_type = cpu``

- *New in 4.0.0*
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- ``quant_train_renew_leaf`` :raw-html:`<a id="quant_train_renew_leaf" title="Permalink to this parameter" href="#quant_train_renew_leaf">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool

- whether to renew the leaf values with original gradients when quantized training
Expand All @@ -686,10 +692,14 @@ Learning Control Parameters

- **Note**: can be used only with ``device_type = cpu``

- *New in 4.0.0*
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- ``stochastic_rounding`` :raw-html:`<a id="stochastic_rounding" title="Permalink to this parameter" href="#stochastic_rounding">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool

- whether to use stochastic rounding in gradient quantization

- *New in 4.0.0*
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


IO Parameters
-------------

Expand Down Expand Up @@ -908,6 +918,8 @@ Dataset Parameters

- **Note**: ``lightgbm-transform`` is not maintained by LightGBM's maintainers. Bug reports or feature requests should go to `issues page <https://github.com/microsoft/lightgbm-transform/issues>`__

- *New in 4.0.0*
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Predict Parameters
~~~~~~~~~~~~~~~~~~

Expand Down
6 changes: 6 additions & 0 deletions include/LightGBM/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,7 @@ struct Config {
// desc = ``bagging``, Randomly Bagging Sampling
// descl2 = **Note**: ``bagging`` is only effective when ``bagging_freq > 0`` and ``bagging_fraction < 1.0``
// desc = ``goss``, Gradient-based One-Side Sampling
// desc = *New in 4.0.0*
std::string data_sample_strategy = "bagging";

// alias = train, train_data, train_data_file, data_filename
Expand Down Expand Up @@ -598,22 +599,26 @@ struct Config {
// desc = with quantized training, most arithmetics in the training process will be integer operations
// desc = gradient quantization can accelerate training, with little accuracy drop in most cases
// desc = **Note**: can be used only with ``device_type = cpu``
// desc = *New in version 4.0.0*
bool use_quantized_grad = false;

// [no-save]
// desc = number of bins to quantization gradients and hessians
// desc = with more bins, the quantized training will be closer to full precision training
// desc = **Note**: can be used only with ``device_type = cpu``
// desc = *New in 4.0.0*
int num_grad_quant_bins = 4;

// [no-save]
// desc = whether to renew the leaf values with original gradients when quantized training
// desc = renewing is very helpful for good quantized training accuracy for ranking objectives
// desc = **Note**: can be used only with ``device_type = cpu``
// desc = *New in 4.0.0*
bool quant_train_renew_leaf = false;

// [no-save]
// desc = whether to use stochastic rounding in gradient quantization
// desc = *New in 4.0.0*
bool stochastic_rounding = true;

#ifndef __NVCC__
Expand Down Expand Up @@ -777,6 +782,7 @@ struct Config {
// desc = path to a ``.json`` file that specifies customized parser initialized configuration
// desc = see `lightgbm-transform <https://github.com/microsoft/lightgbm-transform>`__ for usage examples
// desc = **Note**: ``lightgbm-transform`` is not maintained by LightGBM's maintainers. Bug reports or feature requests should go to `issues page <https://github.com/microsoft/lightgbm-transform/issues>`__
// desc = *New in 4.0.0*
std::string parser_config_file = "";

#ifndef __NVCC__
Expand Down
33 changes: 33 additions & 0 deletions python-package/lightgbm/basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -931,6 +931,8 @@ def predict(
If True, ensure that the features used to predict match the ones used to train.
Used only if data is pandas DataFrame.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Returns
-------
result : numpy array, scipy.sparse or list of scipy.sparse
Expand Down Expand Up @@ -2840,6 +2842,8 @@ def num_feature(self) -> int:
def feature_num_bin(self, feature: Union[int, str]) -> int:
"""Get the number of bins for a feature.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Parameters
----------
feature : int or str
Expand Down Expand Up @@ -4149,19 +4153,34 @@ def refit(
will use ``leaf_output = decay_rate * old_leaf_output + (1.0 - decay_rate) * new_leaf_output`` to refit trees.
reference : Dataset or None, optional (default=None)
Reference for ``data``.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


weight : list, numpy 1-D array, pandas Series or None, optional (default=None)
Weight for each ``data`` instance. Weights should be non-negative.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


group : list, numpy 1-D array, pandas Series or None, optional (default=None)
Group/query size for ``data``.
Only used in the learning-to-rank task.
sum(group) = n_samples.
For example, if you have a 100-document dataset with ``group = [10, 20, 40, 10, 10, 10]``, that means that you have 6 groups,
where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


init_score : list, list of lists (for multi-class task), numpy array, pandas Series, pandas DataFrame (for multi-class task), or None, optional (default=None)
Init score for ``data``.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


feature_name : list of str, or 'auto', optional (default="auto")
Feature names for ``data``.
If 'auto' and data is pandas DataFrame, data columns names are used.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


categorical_feature : list of str or int, or 'auto', optional (default="auto")
Categorical features for ``data``.
If list of int, interpreted as indices.
Expand All @@ -4172,13 +4191,25 @@ def refit(
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


dataset_params : dict or None, optional (default=None)
Other parameters for Dataset ``data``.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


free_raw_data : bool, optional (default=True)
If True, raw data is freed after constructing inner Dataset for ``data``.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


validate_features : bool, optional (default=False)
If True, ensure that the features used to refit the model match the original ones.
Used only if data is pandas DataFrame.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


**kwargs
Other parameters for refit.
These parameters will be passed to ``predict`` method.
Expand Down Expand Up @@ -4270,6 +4301,8 @@ def set_leaf_output(
) -> 'Booster':
"""Set the output of a leaf.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Parameters
----------
tree_id : int
Expand Down
2 changes: 2 additions & 0 deletions python-package/lightgbm/callback.py
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,8 @@ def early_stopping(stopping_rounds: int, first_metric_only: bool = False, verbos
If float, this single value is used for all metrics.
If list, its length should match the total number of metrics.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Returns
-------
callback : _EarlyStoppingCallback
Expand Down
5 changes: 5 additions & 0 deletions python-package/lightgbm/plotting.py
Original file line number Diff line number Diff line change
Expand Up @@ -656,6 +656,9 @@ def create_tree_digraph(
example_case : numpy 2-D array, pandas DataFrame or None, optional (default=None)
Single row with the same structure as the training data.
If not None, the plot will highlight the path that sample takes through the tree.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


max_category_values : int, optional (default=10)
The maximum number of category values to display in tree nodes, if the number of thresholds is greater than this value, thresholds will be collapsed and displayed on the label tooltip instead.

Expand All @@ -672,6 +675,8 @@ def create_tree_digraph(
graph = lgb.create_tree_digraph(clf, max_category_values=5)
HTML(graph._repr_image_svg_xml())

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


**kwargs
Other parameters passed to ``Digraph`` constructor.
Check https://graphviz.readthedocs.io/en/stable/api.html#digraph for the full list of supported parameters.
Expand Down
8 changes: 8 additions & 0 deletions python-package/lightgbm/sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -484,6 +484,9 @@ def __init__(
threads configured for OpenMP in the system. A value of ``None`` (the default) corresponds
to using the number of physical cores in the system (its correct detection requires
either the ``joblib`` or the ``psutil`` util libraries to be installed).

.. versionchanged:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


importance_type : str, optional (default='split')
The type of feature importance to be filled into ``feature_importances_``.
If 'split', result contains numbers of times the feature is used in a model.
Expand Down Expand Up @@ -968,6 +971,8 @@ def n_estimators_(self) -> int:

This might be less than parameter ``n_estimators`` if early stopping was enabled or
if boosting stopped early due to limits on complexity like ``min_gain_to_split``.

.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
if not self.__sklearn_is_fitted__():
raise LGBMNotFittedError('No n_estimators found. Need to call fit beforehand.')
Expand All @@ -979,6 +984,9 @@ def n_iter_(self) -> int:

This might be less than parameter ``n_estimators`` if early stopping was enabled or
if boosting stopped early due to limits on complexity like ``min_gain_to_split``.

# https://github.com/microsoft/LightGBM/pull/4753
jameslamb marked this conversation as resolved.
Show resolved Hide resolved
.. versionadded:: 4.0.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
if not self.__sklearn_is_fitted__():
raise LGBMNotFittedError('No n_iter found. Need to call fit beforehand.')
Expand Down