Quickly test the performance of different models on your structured data for classification or regression.
- Try different models
- Cross validate, hyperoptimize and get performance on validation and test sets
- Different metrics/plots (mix of sklearn and custom metrics/plots) specific to each task (binary-classification, multi-classification, regression)
- Easily add new models, metrics or plots
The projects uses Python 3.9.11. You can use pyenv to manage your Python versions.
Install data-science packages (example commands for mac, similar for linux):
brew install lightgbm
brew install cmake libompInstall poetry:
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | pythonInstall the dependencies:
poetry installMac M1 warning: If you are using a Mac M1, you will currently not be able to download the shap library. Remove it from the pyproject.yaml and remove the import from the project.
Examples are provided for binary-classification, multi-classification and regression in the folder examples.
Run respectively the binary-classification, multi-classification or regression example with the following commands:
PYTHONPATH=. poetry run python examples/binary-classification/run_pipeline.py
PYTHONPATH=. poetry run python examples/multi-classification/run_pipeline.py
PYTHONPATH=. poetry run python examples/regression/run_pipeline.pyThe results will be stored in examples/{task name}/experiments
There are two ways to do it:
- With the CLI, running:
poetry run python generate_config_file.py- Manually, by directly filling in the file /src/config/training_config.yml
Create a script which loads your data in a pandas dataframe and pass it to run_training_pipeline
located in src.training.training_pipeline.run_training_pipeline.py. The yaml created in the previous step is used by
default. C.f. the folder examples.
Import them from the sklearn lib and add them to MODELS_METRICS_PLOTS_PER_TASK located in src.config.models_metrics_plots_per_task.py
- Create the plot / metric logic in
src.custom_metrics_and_plotsin eithercustom_metrics.pyandcustom_plots.py. - Add them to
MODELS_METRICS_PLOTS_PER_TASKlocated insrc.config.models_metrics_plots_per_task.py