Lightweight save/load #673

EdoAlvarezR · 2024-11-06T23:53:38Z

EdoAlvarezR
Nov 6, 2024

Hi! I'm trying to save (pickle) a set of about 10,000 small KRG surrogate models (~100 training points each), however, the size on disk is huge (>10 GB).

Is there a way of lightweighting a surrogate model object (e.g., get rid of the training data) to reduce the size on disk?

Training this batch of models takes about 5 hours, so the surrogate is not very useful unless there is a way of saving/loading the models in a way that is shareable.

- Ed Alvarez

Answered by ViniciusTxc3

May 9, 2025

I was also having this problem, I saw this discussion topic and now that I managed to solve it, I will share it below for the record. I did an analysis of the variables after training and managed to go from 1.1Gb to 1.2Mb the size of the model for saving, for that I deleted the following variables, sorry for not being written in the most optimized way.

sm = KRG()
sm.set_training_values(X_train, y_train)
sm.train()

sm.D = np.array([])
sm.theta0 = np.array([])
sm.optimal_par["sigma2"] = np.array([])
sm.F = np.array([])
sm.optimal_par["C"] = np.array([])
sm.optimal_par["Q"] = np.array([])
sm.optimal_par["Ft"] = np.array([])
sm.F = np.array([])
sm.training_points = {}
sm._correlation_class = …

View full answer

relf · 2024-11-07T10:17:04Z

relf
Nov 7, 2024
Maintainer

Hi! Indeed training data are pickled but not used in prediction, so you can do something like:

sm = KRG()
sm.set_training_values(X_train, y_train)
sm.train()
sm.training_points = {}  # hack: remove training data not used in prediction
with open("krg.pickle", "wb") as handle:
    pickle.dump(sm, handle)

Let me know if it works for you and how much it decreases the size on disk

1 reply

EdoAlvarezR Nov 7, 2024
Author

That indeed got rid of the training points, but it reduced the size on disk by only less than 1% 😕.

Is there any other data in the KRG object not used for prediction that I could get rid of?

relf · 2024-11-08T07:38:21Z

relf
Nov 8, 2024
Maintainer

Actually I should have done the maths, kriging memory cost is in N^2 where N is the number of training points. In your case it explains the 1% decrease as you have just removed ~N (=100).
100 * 100 * 8 bytes = 80kb per surrogate, so something like 800Mo (for the correlation matrix only, you found ~10x more) for the overall set.
So not sure there is a solution here. If we imagine using float32 (provided it still works, not sure at all) you'll still have at least a size > 200Mo.

1 reply

EdoAlvarezR Nov 8, 2024
Author

Ah, that makes sense. I found a way of reducing the number of surrogates that I needed to go from 10,000 down to ~50, so that brought the size of the pickle files down to something more manageable.

Thanks for your help!

relf · 2024-11-13T08:48:20Z

relf
Nov 13, 2024
Maintainer

You can also try GPX which is an equivalent of KRG but it should show better performances.
Behind the scene GPX is a wrapper for a Rust implementation of KRG, Gpx from the egobox library.
GPX cannot be pickled but has native save/load methods.

Once you have installed egobox with pip install egobox, you can run something like:

from smt.surrogate_models import GPX 

sm = GPX()
sm.set_training_values(xt, yt)
sm.train()
sm.save("sm.bin")  

sm2 = GPX.load("sm.bin")
ynew = sm2.predict_values(xnew)

0 replies

ViniciusTxc3 · 2025-05-09T12:13:50Z

ViniciusTxc3
May 9, 2025

I was also having this problem, I saw this discussion topic and now that I managed to solve it, I will share it below for the record. I did an analysis of the variables after training and managed to go from 1.1Gb to 1.2Mb the size of the model for saving, for that I deleted the following variables, sorry for not being written in the most optimized way.

sm = KRG()
sm.set_training_values(X_train, y_train)
sm.train()

sm.D = np.array([])
sm.theta0 = np.array([])
sm.optimal_par["sigma2"] = np.array([])
sm.F = np.array([])
sm.optimal_par["C"] = np.array([])
sm.optimal_par["Q"] = np.array([])
sm.optimal_par["Ft"] = np.array([])
sm.F = np.array([])
sm.training_points = {}
sm._correlation_class = {}
sm._train = {}
sm._new_train = {}
sm.corr._abc_impl = {}
sm._abc_impl = {}
sm._check_F = {}
sm.ij = np.array([])
sm._check_param = {}
sm._compute_sigma2= {}
sm._correct_distances_cat_decreed = {}
sm._final_initialize = {}
sm._initialize = {}
sm._initialize_theta = {}
sm._internal_predict_variance = {}
sm._matrix_data_corr = {}
sm._optimize_hyperparam = {}
sm._post_predict = {}
sm._pre_predict = {}
sm._predict_derivatives = {}
sm._predict_output_derivatives = {}
sm._predict_variance_derivatives = {}
sm._predict_variance_gradient = {}
sm._predict_variances = {}
sm._reduced_likelihood_function = {}
sm._reduced_likelihood_gradient = {}
sm._reduced_likelihood_hessian = {}
sm._thetaMemory = np.array([])
sm.design_space._configs_to_x = {}
sm.design_space._correct_get_acting = {}
sm.design_space._cs_denormalize_x = {}
sm.design_space._cs_denormalize_x_ordered = {}
sm.design_space._design_variables = np.array([])
sm.design_space._get_correct_config = {}
sm.design_space._get_design_variables = {}
sm.design_space._get_n_dim_unfolded = {}
sm.design_space._get_param = {}
sm.design_space._get_param2 = {}
sm.design_space._impute_non_acting = {}
sm.design_space._is_conditionally_acting = {}
sm.design_space._normalize_x = {}
sm.design_space._normalize_x_no_integer = {}
sm.design_space._round_equally_distributed = {}
sm.design_space._sample_valid_x = {}
sm.design_space._to_seed = {}

with open("krg.pickle", "wb") as handle:
    pickle.dump(sm, handle)

Maybe that would be enough for you @EdoAlvarezR

1 reply

EdoAlvarezR May 9, 2025
Author

That's great! Thanks for sharing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lightweight save/load #673

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Lightweight save/load #673

Uh oh!

Uh oh!

EdoAlvarezR Nov 6, 2024

Replies: 4 comments · 3 replies

Uh oh!

relf Nov 7, 2024 Maintainer

Uh oh!

EdoAlvarezR Nov 7, 2024 Author

Uh oh!

relf Nov 8, 2024 Maintainer

Uh oh!

EdoAlvarezR Nov 8, 2024 Author

Uh oh!

relf Nov 13, 2024 Maintainer

Uh oh!

ViniciusTxc3 May 9, 2025

Uh oh!

EdoAlvarezR May 9, 2025 Author

EdoAlvarezR
Nov 6, 2024

Replies: 4 comments 3 replies

relf
Nov 7, 2024
Maintainer

EdoAlvarezR Nov 7, 2024
Author

relf
Nov 8, 2024
Maintainer

EdoAlvarezR Nov 8, 2024
Author

relf
Nov 13, 2024
Maintainer

ViniciusTxc3
May 9, 2025

EdoAlvarezR May 9, 2025
Author