Skip to content

Fix model ineffeciency#54

Merged
nikromen merged 4 commits intofedora-copr:mainfrom
nikromen:fix-model-ineffeciency
Mar 4, 2026
Merged

Fix model ineffeciency#54
nikromen merged 4 commits intofedora-copr:mainfrom
nikromen:fix-model-ineffeciency

Conversation

@nikromen
Copy link
Copy Markdown
Member

@nikromen nikromen commented Mar 3, 2026

Fix #45

On each request, all of the thousands strings in categories are
processed. XGBoost does not store this in its memory, thus it is very
inneficient, LightGBM can deal with that. However this should be rather
handled on the side of the tool
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses model inefficiency by overhauling the underlying model architecture and prediction mechanisms. It introduces a more streamlined approach for managing machine learning models, enhancing both training and inference processes. The changes aim to provide better control over model complexity, improve prediction speed, and offer new tools for developer interaction, ultimately leading to more robust and efficient model deployment.

Highlights

  • Model Refactoring and Optimization: The core model handling and prediction pipeline has been significantly refactored, moving away from sklearn.compose.TransformedTargetRegressor to a more direct and optimized approach for native model saving, loading, and prediction, particularly for XGBoost and LightGBM.
  • Model Size Penalization: New configuration options (size_penalty_enabled, size_penalty_lambda) have been introduced for both XGBoost and LightGBM models, allowing for penalization of model complexity during training to prevent overfitting and improve efficiency.
  • Interactive Shell Command: A new shell command has been added to the CLI, enabling developers to start an interactive Python session with the loaded model and configuration for easier debugging and exploration.
  • XGBoost Parameter Update: XGBoost configuration now uses max_bin and max_leaves instead of max_depth, providing finer control over tree structure and potentially improving model performance and efficiency.
  • Prediction Performance Improvements: Optimized prediction methods have been implemented for XGBoost, bypassing expensive Python-side categorical string serialization by encoding features to integer codes and constructing DMatrix directly.
Changelog
  • files/config.toml.example
    • Replaced max_depth with max_bin and max_leaves for XGBoost configuration.
    • Added size_penalty_enabled and size_penalty_lambda parameters for both XGBoost and LightGBM models.
    • Removed explicit objective and tree_method from model parameters.
  • rpmeta/cli/run.py
    • Added code and sys imports.
    • Introduced a new shell command for interactive Python sessions with the loaded model.
  • rpmeta/config.py
    • Updated XGBoostParams to replace max_depth with max_bin and max_leaves fields.
    • Added size_penalty_enabled and size_penalty_lambda fields to XGBoostParams and LightGBMParams.
  • rpmeta/dataset.py
    • Modified InputRecord.to_data_frame to accept an optional category_dtypes argument for pre-created categorical dtypes.
  • rpmeta/model.py
    • Removed joblib and sklearn.compose.TransformedTargetRegressor imports.
    • Added TARGET_FUNC and INVERSE_FUNC as class attributes to the Model abstract base class.
    • Replaced the abstract method _make_regressor with make_regressor.
    • Introduced a new abstract method compute_size_penalty to the Model class.
    • Simplified save_regressor and load_regressor methods to handle only the native model.
    • Added prepare_for_prediction and predict methods to the base Model class for a unified prediction interface.
    • Updated XGBoostModel.make_regressor to set max_depth=0 and grow_policy='lossguide'.
    • Implemented compute_size_penalty and an optimized predict method for XGBoostModel.
    • Implemented compute_size_penalty for LightGBMModel.
  • rpmeta/predictor.py
    • Updated Predictor's __init__ to accept a Model instance and store pre-created _category_dtypes.
    • Modified Predictor.load to call model.prepare_for_prediction after loading.
    • Updated Predictor.predict to pass category_dtypes to input_data.to_data_frame.
    • Changed model_name access in health_check to predictor.model.name.
  • rpmeta/rpmeta.code-workspace
    • Added a new VS Code workspace configuration file.
  • rpmeta/server/api.py
    • Updated the health check endpoint to retrieve the model name from predictor.model.name.
  • rpmeta/trainer/base.py
    • Added sys, collections.abc.Callable, and numpy imports.
    • Introduced a new _TransformedPredictor class for visualizer compatibility.
    • Added a _log_model_size method to log the size of the trained model.
    • Modified the run_study objective to apply TARGET_FUNC to y_train, use INVERSE_FUNC for predictions, and incorporate compute_size_penalty into the optimization objective.
    • Updated BestModelResult to store _TransformedPredictor instances.
    • Modified the run method to apply TARGET_FUNC to y and call _log_model_size.
  • rpmeta/trainer/models.py
    • Updated XGBoostTrainer.param_space and default_params to use max_leaves and max_bin instead of max_depth.
  • test/conftest.py
    • Updated the base_model_subclass fixture to implement make_regressor and compute_size_penalty methods.
  • test/unit/test_model_persistence.py
    • Added a _category_maps_from_df helper function.
    • Updated LightGBM and XGBoost persistence tests to align with the new Model interface, including make_regressor, TARGET_FUNC, INVERSE_FUNC, load_regressor, prepare_for_prediction, and predict.
    • Replaced test_transformed_target_regressor_preserved with test_target_transform_functions to verify target transformations.
  • test/unit/test_predictor.py
    • Updated test_predictor_load to mock XGBoostModel methods and verify calls to load_regressor and prepare_for_prediction.
    • Adjusted Predictor instantiation in tests to remove the model_name argument.
Activity
  • The pull request was opened by nikromen to address issue predicting on xgboost is too inefficient #45, which likely pertains to model inefficiency.
  • Significant refactoring was performed across multiple files, indicating a substantial change in the model's architecture and operational flow.
  • New configuration parameters were introduced, suggesting an expansion of tunable model properties.
  • New developer tooling, such as an interactive shell, was added to improve the development and debugging experience.
  • Extensive updates to unit tests were made to ensure the correctness and integrity of the refactored model persistence and prediction logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and valuable refactoring to improve model efficiency and reduce dependencies. Key changes include removing the scikit-learn TransformedTargetRegressor wrapper in favor of direct handling of target transformations, and implementing an optimized prediction path for XGBoost that bypasses pandas DataFrame overhead. The introduction of model size penalization in hyperparameter tuning is also a great addition for creating more efficient models. I've identified a critical issue with a user-specific file being added to the repository and a couple of medium-severity issues related to model size calculation and minor code cleanup. Overall, this is a high-quality contribution that improves the performance and maintainability of the model handling code.

Comment thread rpmeta/rpmeta.code-workspace Outdated
Comment on lines +1 to +17
{
"folders": [
{
"path": ".."
},
{
"path": "../../copr"
},
{
"path": "../../../../pagure/ansible"
},
{
"path": "../../resalloc"
}
],
"settings": {}
} No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This VSCode workspace file appears to be user-specific and should not be part of the repository. It contains relative paths like ../../copr which are unlikely to be valid for other developers. Please remove this file from the pull request and add *.code-workspace to the .gitignore file to prevent this from happening in the future.

Comment thread rpmeta/trainer/base.py Outdated
Comment on lines +73 to +75
def _log_model_size(self, regressor: Any) -> None:
size_bytes = sys.getsizeof(regressor)
logger.info("Model size: %.2f MB", size_bytes / (1024 * 1024))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of sys.getsizeof(regressor) to determine the model size is likely inaccurate. For models from libraries like XGBoost or LightGBM, this function often returns only the size of the Python wrapper object, not the memory occupied by the actual model in the underlying C++ implementation. This can be misleading. A more accurate approach would be to measure the size of the model file on disk, which is saved just before this method is called.

Suggested change
def _log_model_size(self, regressor: Any) -> None:
size_bytes = sys.getsizeof(regressor)
logger.info("Model size: %.2f MB", size_bytes / (1024 * 1024))
def _log_model_size(self, regressor: Any) -> None:
model_file = self._model_directory / self.native_model_filename
size_bytes = model_file.stat().st_size
logger.info("Model size on disk: %.2f MB", size_bytes / (1024 * 1024))

Comment thread rpmeta/model.py Outdated
Comment on lines +265 to +274
data[:, i] = col.map(
self._cat_encoders[feat],
).to_numpy(dtype=np.float32)
else:
data[:, i] = col.to_numpy(dtype=np.float32)

dmatrix = self.xgb.DMatrix(
data,
feature_names=list(ALL_FEATURES),
feature_types=self._feature_types,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are a couple of minor code style issues in the predict method of XGBoostModel that could be cleaned up for better clarity and to remove minor inefficiencies:

  1. The trailing comma in col.map(self._cat_encoders[feat],) is unnecessary.
  2. The list() call in feature_names=list(ALL_FEATURES) is redundant, as ALL_FEATURES is already a list.
Suggested change
data[:, i] = col.map(
self._cat_encoders[feat],
).to_numpy(dtype=np.float32)
else:
data[:, i] = col.to_numpy(dtype=np.float32)
dmatrix = self.xgb.DMatrix(
data,
feature_names=list(ALL_FEATURES),
feature_types=self._feature_types,
data[:, i] = col.map(
self._cat_encoders[feat]
).to_numpy(dtype=np.float32)
else:
data[:, i] = col.to_numpy(dtype=np.float32)
dmatrix = self.xgb.DMatrix(
data,
feature_names=ALL_FEATURES,
feature_types=self._feature_types,

@nikromen
Copy link
Copy Markdown
Member Author

nikromen commented Mar 3, 2026

whoah this is still proof of concept but from 8GB RAM of service to 1GB, from 2 requests per second to 164 requests per second! this is now really lightweight

and all of that from couple of lines of code :) (the rest is basically just removing unneeded features anymore + adding nice debugging feature)

@nikromen
Copy link
Copy Markdown
Member Author

nikromen commented Mar 3, 2026

# Overhead  Command          Shared Object                                      Symbol                                                                                                                                                                                                     >
# ........  ...............  .................................................  ...........................................................................................................................................................................................................>
#
    16.97%  AnyIO worker th  libpython3.14.so.1.0                               [.] _PyEval_EvalFrameDefault
     7.74%  AnyIO worker th  libxgboost.so                                      [.] xgboost::RegTree::MaxDepth(int) const
     4.25%  AnyIO worker th  libxgboost.so                                      [.] xgboost::common::Decision(xgboost::common::Span<unsigned int const, 18446744073709551615ul>, float)
     3.69%  nginx            libcrypto.so.3.5.4                                 [.] ossl_rsaz_amm52x20_x2_ifma256
     2.95%  AnyIO worker th  libxgboost.so                                      [.] float xgboost::predictor::scalar::PredValueByOneTree<true>(xgboost::RegTree::FVec const&, xgboost::RegTree const&, xgboost::RegTree::CategoricalSplitMatrix const&, int)
     2.89%  rpmeta           libpython3.14.so.1.0                               [.] _PyEval_EvalFrameDefault
     2.15%  AnyIO worker th  libpython3.14.so.1.0                               [.] _PyObject_GenericGetAttrWithDict
     1.81%  AnyIO worker th  libpython3.14.so.1.0                               [.] list_contains.lto_priv.0
     1.72%  AnyIO worker th  ld-linux-x86-64.so.2                               [.] _dl_tlsdesc_return
     1.56%  AnyIO worker th  libpython3.14.so.1.0                               [.] PyUnicode_RichCompare
     1.41%  AnyIO worker th  libpython3.14.so.1.0                               [.] _PyEval_FrameClearAndPop
     1.38%  AnyIO worker th  libpython3.14.so.1.0                               [.] _Py_dict_lookup
     1.37%  AnyIO worker th  libxgboost.so                                      [.] xgboost::RegTree::MaxDepth(int) const [clone .constprop.0]
     1.04%  AnyIO worker th  [kernel.kallsyms]                                  [k] asm_sysvec_apic_timer_interrupt
     0.87%  AnyIO worker th  libpython3.14.so.1.0                               [.] initialize_locals.lto_priv.0
     0.77%  nginx            [kernel.kallsyms]                                  [k] __irqentry_text_start
     0.72%  AnyIO worker th  libpython3.14.so.1.0                               [.] _PyType_AllocNoTrack
     0.64%  AnyIO worker th  libpython3.14.so.1.0                               [.] tuple_dealloc.lto_priv.0
     0.64%  AnyIO worker th  libpython3.14.so.1.0                               [.] PyTraceMalloc_Untrack
     0.63%  AnyIO worker th  [kernel.kallsyms]                                  [k] __irqentry_text_start
     0.62%  AnyIO worker th  libpython3.14.so.1.0                               [.] _PyObject_MakeTpCall
     0.53%  AnyIO worker th  libpython3.14.so.1.0                               [.] getset_get.lto_priv.0
     0.50%  AnyIO worker th  libpython3.14.so.1.0                               [.] _Py_Dealloc
     0.44%  AnyIO worker th  libpython3.14.so.1.0                               [.] PyTraceMalloc_Track
     0.43%  AnyIO worker th  libpython3.14.so.1.0                               [.] PyFunction_NewWithQualName
     0.37%  AnyIO worker th  libpython3.14.so.1.0                               [.] PyObject_Vectorcall
     0.37%  AnyIO worker th  libpython3.14.so.1.0                               [.] PyObject_GetAttr
     0.37%  nginx            libc.so.6                                          [.] _int_malloc
     0.36%  AnyIO worker th  libpython3.14.so.1.0                               [.] PyObject_GC_Del
     0.34%  AnyIO worker th  libpython3.14.so.1.0                               [.] list_dealloc.lto_priv.0
     0.32%  AnyIO worker th  libpython3.14.so.1.0                               [.] dict_dealloc.lto_priv.0
     0.32%  AnyIO worker th  libpython3.14.so.1.0                               [.] object_isinstance.lto_priv.0
     0.32%  nginx            libcrypto.so.3.5.4                                 [.] scalar_inverse_ntt
     0.31%  AnyIO worker th  libpython3.14.so.1.0                               [.] subtype_dealloc.lto_priv.0
     0.30%  AnyIO worker th  libpython3.14.so.1.0                               [.] _PyTuple_FromStackRefStealOnSuccess
     0.30%  nginx            libcrypto.so.3.5.4                                 [.] sha512_block_data_order_avx2
     0.30%  AnyIO worker th  libpython3.14.so.1.0                               [.] PyDict_GetItemRef
     0.27%  AnyIO worker th  libc.so.6                                          [.] pthread_mutex_lock@@GLIBC_2.2.5
     0.27%  AnyIO worker th  libc.so.6                                          [.] __memcmp_evex_movbe
     0.27%  AnyIO worker th  libpython3.14.so.1.0                               [.] unicode_dealloc.lto_priv.0
     0.27%  AnyIO worker th  libpython3.14.so.1.0                               [.] PyType_IsSubtype
     0.26%  AnyIO worker th  libpython3.14.so.1.0                               [.] unicode_from_format.lto_priv.0
     0.24%  AnyIO worker th  libpython3.14.so.1.0                               [.] _PyFunction_Vectorcall
     0.21%  nginx            libcrypto.so.3.5.4                                 [.] ossl_extract_multiplier_2x20_win5
     0.21%  rpmeta           libpython3.14.so.1.0                               [.] subtype_dealloc.lto_priv.0
     0.21%  rpmeta           libpython3.14.so.1.0                               [.] _PyEval_FrameClearAndPop
     0.21%  AnyIO worker th  libpython3.14.so.1.0                               [.] get_exception_handler.lto_priv.0
     0.21%  nginx            libcrypto.so.3.5.4                                 [.] cmov
     0.21%  AnyIO worker th  libpython3.14.so.1.0                               [.] _Py_type_getattro
     0.21%  AnyIO worker th  libpython3.14.so.1.0                               [.] _PyUnicode_JoinArray
     0.21%  AnyIO worker th  libpython3.14.so.1.0                               [.] wrapperdescr_call.lto_priv.0

from 95% memcpy overhead to basically 0 so the problem is totally gone. Yes there is some 16% python overhead but that is expected. The rest is mainly nginx or model related work, exactly what we want

@nikromen nikromen force-pushed the fix-model-ineffeciency branch from 86fb31d to 056f74a Compare March 4, 2026 12:01
@nikromen
Copy link
Copy Markdown
Member Author

nikromen commented Mar 4, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and valuable refactoring of the model handling, training, and prediction logic to improve efficiency. Key improvements include removing the dependency on scikit-learn's TransformedTargetRegressor, optimizing the prediction path (especially for XGBoost), and introducing a model size penalty in hyperparameter tuning. The code quality is high, and the changes are consistent across the project. I've identified one high-severity issue in the training logic when early stopping is used, which could lead to suboptimal models. Please see the detailed comment.

Comment thread rpmeta/trainer/base.py
Comment on lines +136 to +137
best_regressor = self.make_regressor(study.best_trial.params)
best_regressor.fit(X_train, y_train_t)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When refitting the best model after an Optuna study with early stopping enabled, the _...EarlyStopping wrapper class is used. This class's fit method splits the provided training data into a new training and validation set. Consequently, the final model is trained on only a subset of the X_train data, which is suboptimal. The final model should be trained on the entire training dataset (X_train, y_train_t) to maximize its performance.

A common practice is to:

  1. During the Optuna trial, capture the optimal number of boosting rounds (best_iteration) from the regressor when early stopping is triggered. This can be stored as a user attribute on the trial.
  2. For the final refit, create a new instance of the base regressor (without the early stopping wrapper).
  3. Set its n_estimators parameter to the captured best_iteration.
  4. Fit this new regressor on the complete X_train and y_train_t datasets.

This would require some refactoring to separate the creation of the 'tuning' regressor from the 'final' regressor.

@nikromen nikromen merged commit 2d28644 into fedora-copr:main Mar 4, 2026
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

predicting on xgboost is too inefficient

1 participant