Lightweight save/load #673
-
Hi! I'm trying to save (pickle) a set of about 10,000 small Is there a way of lightweighting a surrogate model object (e.g., get rid of the training data) to reduce the size on disk? Training this batch of models takes about 5 hours, so the surrogate is not very useful unless there is a way of saving/loading the models in a way that is shareable. - Ed Alvarez |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 3 replies
-
Hi! Indeed training data are pickled but not used in prediction, so you can do something like: sm = KRG()
sm.set_training_values(X_train, y_train)
sm.train()
sm.training_points = {} # hack: remove training data not used in prediction
with open("krg.pickle", "wb") as handle:
pickle.dump(sm, handle) Let me know if it works for you and how much it decreases the size on disk |
Beta Was this translation helpful? Give feedback.
-
Actually I should have done the maths, kriging memory cost is in N^2 where N is the number of training points. In your case it explains the 1% decrease as you have just removed ~N (=100). |
Beta Was this translation helpful? Give feedback.
-
You can also try Once you have installed from smt.surrogate_models import GPX
sm = GPX()
sm.set_training_values(xt, yt)
sm.train()
sm.save("sm.bin")
sm2 = GPX.load("sm.bin")
ynew = sm2.predict_values(xnew) |
Beta Was this translation helpful? Give feedback.
-
I was also having this problem, I saw this discussion topic and now that I managed to solve it, I will share it below for the record. I did an analysis of the variables after training and managed to go from 1.1Gb to 1.2Mb the size of the model for saving, for that I deleted the following variables, sorry for not being written in the most optimized way. sm = KRG()
sm.set_training_values(X_train, y_train)
sm.train()
sm.D = np.array([])
sm.theta0 = np.array([])
sm.optimal_par["sigma2"] = np.array([])
sm.F = np.array([])
sm.optimal_par["C"] = np.array([])
sm.optimal_par["Q"] = np.array([])
sm.optimal_par["Ft"] = np.array([])
sm.F = np.array([])
sm.training_points = {}
sm._correlation_class = {}
sm._train = {}
sm._new_train = {}
sm.corr._abc_impl = {}
sm._abc_impl = {}
sm._check_F = {}
sm.ij = np.array([])
sm._check_param = {}
sm._compute_sigma2= {}
sm._correct_distances_cat_decreed = {}
sm._final_initialize = {}
sm._initialize = {}
sm._initialize_theta = {}
sm._internal_predict_variance = {}
sm._matrix_data_corr = {}
sm._optimize_hyperparam = {}
sm._post_predict = {}
sm._pre_predict = {}
sm._predict_derivatives = {}
sm._predict_output_derivatives = {}
sm._predict_variance_derivatives = {}
sm._predict_variance_gradient = {}
sm._predict_variances = {}
sm._reduced_likelihood_function = {}
sm._reduced_likelihood_gradient = {}
sm._reduced_likelihood_hessian = {}
sm._thetaMemory = np.array([])
sm.design_space._configs_to_x = {}
sm.design_space._correct_get_acting = {}
sm.design_space._cs_denormalize_x = {}
sm.design_space._cs_denormalize_x_ordered = {}
sm.design_space._design_variables = np.array([])
sm.design_space._get_correct_config = {}
sm.design_space._get_design_variables = {}
sm.design_space._get_n_dim_unfolded = {}
sm.design_space._get_param = {}
sm.design_space._get_param2 = {}
sm.design_space._impute_non_acting = {}
sm.design_space._is_conditionally_acting = {}
sm.design_space._normalize_x = {}
sm.design_space._normalize_x_no_integer = {}
sm.design_space._round_equally_distributed = {}
sm.design_space._sample_valid_x = {}
sm.design_space._to_seed = {}
with open("krg.pickle", "wb") as handle:
pickle.dump(sm, handle) Maybe that would be enough for you @EdoAlvarezR |
Beta Was this translation helpful? Give feedback.
I was also having this problem, I saw this discussion topic and now that I managed to solve it, I will share it below for the record. I did an analysis of the variables after training and managed to go from 1.1Gb to 1.2Mb the size of the model for saving, for that I deleted the following variables, sorry for not being written in the most optimized way.