diff --git a/ci/environment-docs.yaml b/ci/environment-docs.yaml index 4efabafa6..18d9e75e5 100644 --- a/ci/environment-docs.yaml +++ b/ci/environment-docs.yaml @@ -44,7 +44,6 @@ dependencies: # to allow CI to pass - dask !=2021.3.0 - dask-glm - - dask-xgboost - pip: - dask_sphinx_theme >=1.1.0 - graphviz diff --git a/dask_ml/xgboost.py b/dask_ml/xgboost.py deleted file mode 100644 index 86e841db3..000000000 --- a/dask_ml/xgboost.py +++ /dev/null @@ -1,7 +0,0 @@ -"""Train an XGBoost model on dask arrays or dataframes. - -This may be used for training an XGBoost model on a cluster. XGBoost -will be setup in distributed mode alongside your existing -``dask.distributed`` cluster. -""" -from dask_xgboost import * # noqa diff --git a/docs/source/history.rst b/docs/source/history.rst index e7666c179..17e2be4c2 100644 --- a/docs/source/history.rst +++ b/docs/source/history.rst @@ -6,7 +6,6 @@ focused around particular sub-domains of machine learning. - dask-searchcv_: Scalable model selection - dask-glm_: Generalized Linear Model solvers -- dask-xgboost_: Connection to the XGBoost library - dask-tensorflow_: Connection to the Tensorflow library While these special-purpose libraries were convenient for development, they @@ -20,5 +19,4 @@ future development. .. _dask-searchcv: https://github.com/dask/dask-searchcv .. _dask-glm: https://github.com/dask/dask-glm -.. _dask-xgboost: https://github.com/dask/dask-xgboost .. _dask-tensorflow: https://github.com/dask/dask-tensorflow diff --git a/docs/source/index.rst b/docs/source/index.rst index d653902ef..05d586b88 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -79,13 +79,6 @@ re-implement these systems. Instead, Dask-ML makes it easy to use normal Dask workflows to prepare and set up data, then it deploys XGBoost *alongside* Dask, and hands the data over. -.. code-block:: python - - from dask_ml.xgboost import XGBRegressor - - est = XGBRegressor(...) - est.fit(train, train_labels) - See :doc:`Dask-ML + XGBoost ` for more information. @@ -132,4 +125,4 @@ See :doc:`Dask-ML + XGBoost ` for more information. .. _Dask: https://dask.org/ .. _Scikit-Learn: http://scikit-learn.org/ -.. _XGBoost: https://ml.dask.org/xgboost.html \ No newline at end of file +.. _XGBoost: https://ml.dask.org/xgboost.html diff --git a/docs/source/modules/api.rst b/docs/source/modules/api.rst index 7a2d4d06c..a535b8396 100644 --- a/docs/source/modules/api.rst +++ b/docs/source/modules/api.rst @@ -263,26 +263,6 @@ Classification Metrics metrics.log_loss -:mod:`dask_ml.xgboost`: XGBoost -=============================== - -.. automodule:: dask_ml.xgboost - -.. currentmodule:: dask_ml.xgboost - -.. autosummary:: - :toctree: generated/ - :template: class.rst - - XGBClassifier - XGBRegressor - -.. autosummary:: - :toctree: generated/ - - train - predict - :mod:`dask_ml.datasets`: Datasets ====================================================== diff --git a/docs/source/xgboost.rst b/docs/source/xgboost.rst index 6c8263293..90d19e1ce 100644 --- a/docs/source/xgboost.rst +++ b/docs/source/xgboost.rst @@ -1,76 +1,21 @@ XGBoost & LightGBM ================== -.. currentmodule:: dask_ml.xgboost - XGBoost_ is a powerful and popular library for gradient boosted trees. For larger datasets or faster training XGBoost also provides a distributed computing solution. LightGBM_ is another library similar to XGBoost; it also natively supplies native distributed training for decision trees. -Dask-ML can set up distributed XGBoost or LightGBM for you and hand off data -from distributed dask.dataframes. This automates much of the hassle of -preprocessing and setup while still letting XGBoost/LightGBM do what they do -well. - -Below, we'll refer to an example with XGBoost. Here are the relevant XGBoost -classes/functions: +Both XGBoost or LightGBM provided Dask implementations for distributed +training. These can take Dask objects like Arrays and DataFrames as input. +This allows one to do any initial loading and processing of data with Dask +before handing over to XGBoost/LightGBM to do what they do well. -.. autosummary:: - train - predict - XGBClassifier - XGBRegressor +The XGBoost implementation can be found at https://github.com/dmlc/xgboost and documentation can be found at +https://xgboost.readthedocs.io/en/latest/tutorials/dask.html. The LightGBM implementation can be found at https://github.com/microsoft/LightGBM and documentation can be found at https://lightgbm.readthedocs.io/en/latest/Parallel-Learning-Guide.html#dask. -Example -------- - -.. code-block:: python - - from dask.distributed import Client - client = Client('scheduler-address:8786') - - import dask.dataframe as dd - df = dd.read_parquet('s3://...') - - # Split into training and testing data - train, test = df.random_split([0.8, 0.2]) - - # Separate labels from data - train_labels = train.x > 0 - test_labels = test.x > 0 - - del train['x'] # remove informative column from data - del test['x'] # remove informative column from data - - # from xgboost import XGBRegressor # change import - from dask_ml.xgboost import XGBRegressor - - est = XGBRegressor(...) - est.fit(train, train_labels) - - prediction = est.predict(test) - -How this works --------------- - -Dask sets up XGBoost's master process on the Dask scheduler and XGBoost's worker -processes on Dask's worker processes. Then it moves all of the Dask -dataframes' constituent Pandas dataframes to XGBoost and lets XGBoost train. -Fortunately, because XGBoost has an excellent Python interface, all of this can -happen in the same process without any data transfer. The two distributed -services can operate together on the same data. - -When XGBoost is finished training Dask cleans up the XGBoost infrastructure and -continues on as normal. - -This work was a collaboration with XGBoost and SKLearn maintainers. See -relevant GitHub issue here: `dmlc/xgboost #2032 `_ - -See the ":doc:`Dask-ML examples `" for an example usage. - .. _XGBoost: https://xgboost.readthedocs.io/ .. _LightGBM: https://lightgbm.readthedocs.io/ diff --git a/setup.py b/setup.py index 8f1163561..72304c0d9 100644 --- a/setup.py +++ b/setup.py @@ -35,7 +35,7 @@ "pytest-mock", ] dev_requires = doc_requires + test_requires -xgboost_requires = ["dask-xgboost", "xgboost"] +xgboost_requires = ["xgboost"] complete_requires = xgboost_requires extras_require = {