ColdSnap: Freeze ML models and their training/testing data

The ColdSnap framework allows for training/testing data as well as machine learning models to be "frozen" aka serialized to disk.

Machine learning projects often require careful tracking and storage of not only model architectures and parameters but also the datasets they were trained on. Having a robust mechanism for storing both models and their associated data snapshots is essential for reproducibility, version control, and long-term evaluation of model performance. ColdSnap was created to address these needs by providing a unified framework where machine learning models and their corresponding datasets can be seamlessly stored, serialized, and evaluated. By preserving both the model and data as a single unit, ColdSnap enables consistent evaluation across iterations, aids in model comparisons, and ensures that all aspects of a model’s creation—data transformations, training splits, and performance metrics—are easily retrievable, facilitating high-quality machine learning workflows.

Installation

How to use ColdSnap

The code below can be found in ./docs/example, in a nutshell you create a Data object, which contains your training and testing data, that data is added to a model along with the classifier to use and that can be serialized to disk. This model, along with the data, can be loaded again from another script/notebook. The create_overview function can summarize a list of models.

Creating Snapshot of Data and Models

The code below shows how to create and store Data and Models.

from coldsnap import Data, Model

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import os

iris = datasets.load_iris(as_frame=True)
iris_df = pd.merge(
    iris.data, iris.target, how="inner", left_index=True, right_index=True
)

if __name__ == "__main__":
    try:
        os.mkdir("./tmp/")
    except FileExistsError:
        pass

    cs_data = Data.from_df(
        iris_df, "target", random_state=1910, description="Iris Dataset"
    )

    cs_data.to_pickle("./tmp/iris_data.pkl.gz")

    # Create random forest classifier
    clf = RandomForestClassifier(random_state=1910)

    cs_model = Model(
        data=cs_data,
        clf=clf,
        description="RandomForestClassifier, default params on Iris dataset",
    )
    cs_model.fit()

    cs_model.to_pickle("./tmp/iris_model.pkl.gz")

Loading a Model

Once a model has been stored, it can easily be loaded using Model.from_pickle(path). Once loaded, details on the model and its performance can be retrieved using .summary.

from coldsnap import Model

if __name__ == "__main__":
    try:
        cs_model = Model.from_pickle("./tmp/iris_model.pkl.gz")
    except OSError:
        print("Model not found, run the script to create models first !")
        quit()

    print(cs_model.summary())

Creating an Overview of Your Models

To quickly compare a number of models the function create_overview can be used as shown below.

from coldsnap.utils import create_overview

if __name__ == "__main__":
    paths = [
        "./tmp/iris_model.pkl.gz",
        "./tmp/iris_model_svc.pkl.gz",
        "./tmp/iris_model_dt.pkl.gz",
    ]

    overview_df = create_overview(paths)

    print(overview_df.to_markdown())

The table below shows the output, you get for each model in the input list the summary and evaluation criteria.

	path	model_code	model_description	model_hash	data_code	data_description	data_hash	num_features	features	num_classes	classes	accuracy	precision	recall	f1	roc_auc
0	./tmp/iris_model.pkl.gz	RF01	RandomForestClassifier, default params on Iris dataset	b3f8665bce0ee979b51c9729019ae76d7ed3b83522024b9fb3375e1b96a3dc11	IrD	Iris Dataset	975cdbb5f836a810ad019751a998b18683437093f372f4545fd00be5335d5e4b	4	sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)	3	0, 1, 2	0.973684	0.975564	0.973684	0.973545	0.997973
1	./tmp/iris_model_svc.pkl.gz	SVC01	SVC (with probabilities) on Iris dataset	280f5c4ca76b77144bbe7e9768bfc663b45fdafe61be3bbdc793458597f75e07	IrD	Iris Dataset	975cdbb5f836a810ad019751a998b18683437093f372f4545fd00be5335d5e4b	4	sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)	3	0, 1, 2	0.973684	0.975564	0.973684	0.973545	0.997973
2	./tmp/iris_model_dt.pkl.gz	DT01	DecisionTreeClassifier (max_depth=2) on Iris dataset	3814de3d290288de03f1b2388897c964b967c7f8ffa44303c61c41575da5d856	IrD	Iris Dataset	975cdbb5f836a810ad019751a998b18683437093f372f4545fd00be5335d5e4b	4	sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)	3	0, 1, 2	0.947368	0.947368	0.947368	0.947368	0.975673

Evaluating Model performance

There are a few common metrics built into ColdSnap. See the example below (which assumes a model is loaded in cs_model).

print(cs_model.evaluate())

# Confusion matrix
print(cs_model.confusion_matrix())

fig, ax = plt.subplots()
disp = cs_model.display_confusion_matrix(ax=ax, cmap="Blues")
plt.show()

# ROC curve
fig, ax = plt.subplots()

roc_disp = cs_model.display_roc_curve(ax=ax)

plt.show()

# SHAP beeswarm
cs_model.display_shap_beeswarm()

Contributing

Any contributions you make are greatly appreciated.

Found a bug or have some suggestions? Open an issue.
Pull requests are welcome! Though open an issue first to discuss which features/changes you wish to implement.

Contact

ColdSnap was developed by Sebastian Proost at the RaesLab. ColdSnap is available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

For commercial access inquiries, please contact Jeroen Raes.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
docs		docs
example		example
src/coldsnap		src/coldsnap
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pytest.ini		pytest.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ColdSnap: Freeze ML models and their training/testing data

Installation

How to use ColdSnap

Creating Snapshot of Data and Models

Loading a Model

Creating an Overview of Your Models

Evaluating Model performance

Contributing

Contact

About

Releases

Packages

Languages

License

raeslab/ColdSnap

Folders and files

Latest commit

History

Repository files navigation

ColdSnap: Freeze ML models and their training/testing data

Installation

How to use ColdSnap

Creating Snapshot of Data and Models

Loading a Model

Creating an Overview of Your Models

Evaluating Model performance

Contributing

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages